taylor is an R package for accessing and exploring data related to Taylor Swift’s discography. It provides built in data sets containing information on the audio characteristics and lyrics of Taylor’s songs. Additionally, taylor offers some helper functions for creating Taylor Swift-themed data visualizations.
This document introduces you to taylor’s functionality, and shows you how to use them to learn about Taylor Swift’s music.
I trace the evidence
The main data set is taylor_all_songs
. This data set
contains audio features from Spotify and lyrics from Genius for each of
Taylor Swift’s songs.
taylor_all_songs
#> # A tibble: 356 × 29
#> album_name ep album_release track_number track_name artist featuring
#> <chr> <lgl> <date> <int> <chr> <chr> <chr>
#> 1 Taylor Swift FALSE 2006-10-24 1 Tim McGraw Taylo… NA
#> 2 Taylor Swift FALSE 2006-10-24 2 Picture To Bu… Taylo… NA
#> 3 Taylor Swift FALSE 2006-10-24 3 Teardrops On … Taylo… NA
#> 4 Taylor Swift FALSE 2006-10-24 4 A Place In Th… Taylo… NA
#> 5 Taylor Swift FALSE 2006-10-24 5 Cold As You Taylo… NA
#> 6 Taylor Swift FALSE 2006-10-24 6 The Outside Taylo… NA
#> 7 Taylor Swift FALSE 2006-10-24 7 Tied Together… Taylo… NA
#> 8 Taylor Swift FALSE 2006-10-24 8 Stay Beautiful Taylo… NA
#> 9 Taylor Swift FALSE 2006-10-24 9 Should've Sai… Taylo… NA
#> 10 Taylor Swift FALSE 2006-10-24 10 Mary's Song (… Taylo… NA
#> # ℹ 346 more rows
#> # ℹ 22 more variables: bonus_track <lgl>, promotional_release <date>,
#> # single_release <date>, track_release <date>, danceability <dbl>,
#> # energy <dbl>, key <int>, loudness <dbl>, mode <int>, speechiness <dbl>,
#> # acousticness <dbl>, instrumentalness <dbl>, liveness <dbl>, valence <dbl>,
#> # tempo <dbl>, time_signature <int>, duration_ms <int>, explicit <lgl>,
#> # key_name <chr>, mode_name <chr>, key_mode <chr>, lyrics <list>
The audio features include the danceability, energy, and valence of
each track, which are described in the documentation
for the Spotify API. The data set also includes meta data for each
track such as the key, tempo, time signature, and duration. Finally, the
lyrics for each track are included in a nested list column. The lyrics
can be accessed by using tidyr::unnest()
, or by using
purrr::map()
to apply a function to each set of lyrics. For
a detailed description of accessing lyrics, see
vignette("lyrics")
.
A related data set is taylor_album_songs
. This data set
contains all of the same information as taylor_all_songs
,
but is filtered to only include tracks that are on official studio
albums. This means that standalone singles (e.g., “Only The Young”) and
features (e.g., Big Red Machine’s “Renegade”) are not included. We also
exclude albums Taylor doesn’t own, but for which a Taylor’s
Version has been released. For example, 1989 is excluded
in favor of 1989 (Taylor’s Version), but Taylor Swift
(debut) is included because a Taylor’s Version of that album
has not been released.
taylor also include a small data set called
taylor_albums
. This data set includes the release date for
each album, as well as critic and user ratings from Metacritic.
taylor_albums
#> # A tibble: 17 × 5
#> album_name ep album_release metacritic_score user_score
#> <chr> <lgl> <date> <int> <dbl>
#> 1 Taylor Swift FALSE 2006-10-24 67 8.4
#> 2 The Taylor Swift Holiday Col… TRUE 2007-10-14 NA NA
#> 3 Beautiful Eyes TRUE 2008-07-15 NA NA
#> 4 Fearless FALSE 2008-11-11 73 8.4
#> 5 Speak Now FALSE 2010-10-25 77 8.6
#> 6 Red FALSE 2012-10-22 77 8.6
#> 7 1989 FALSE 2014-10-27 76 8.3
#> 8 reputation FALSE 2017-11-10 71 8.3
#> 9 Lover FALSE 2019-08-23 79 8.4
#> 10 folklore FALSE 2020-07-24 88 9
#> 11 evermore FALSE 2020-12-11 85 8.9
#> 12 Fearless (Taylor's Version) FALSE 2021-04-09 82 8.9
#> 13 Red (Taylor's Version) FALSE 2021-11-12 91 8.9
#> 14 Midnights FALSE 2022-10-21 85 8.3
#> 15 Speak Now (Taylor's Version) FALSE 2023-07-07 81 9.2
#> 16 1989 (Taylor's Version) FALSE 2023-10-27 90 NA
#> 17 THE TORTURED POETS DEPARTMENT FALSE 2024-04-19 76 NA
Finally, there is a data set dedicated to The Eras Tour, specifically
the surprise songs that Taylor plays at each show. The data set,
eras_tour_surprise
, contains the date and location of each
show, the color dress Taylor wore during the acoustic set, and the song
that was performed on each instrument (piano and guitar). The data set
also includes information on any additional songs that were performed as
mashups and guests that Taylor brought out for a performance.
eras_tour_surprise
#> # A tibble: 166 × 9
#> leg date city night dress instrument song mashup guest
#> <chr> <date> <chr> <int> <chr> <chr> <chr> <chr> <chr>
#> 1 North America (Le… 2023-03-17 Glen… 1 red guitar mirr… NA NA
#> 2 North America (Le… 2023-03-17 Glen… 1 red piano Tim … NA NA
#> 3 North America (Le… 2023-03-18 Glen… 2 green guitar this… NA NA
#> 4 North America (Le… 2023-03-18 Glen… 2 green piano Stat… NA NA
#> 5 North America (Le… 2023-03-24 Las … 1 red guitar Our … NA NA
#> 6 North America (Le… 2023-03-24 Las … 1 red piano Snow… NA NA
#> 7 North America (Le… 2023-03-25 Las … 2 green guitar cowb… NA Marc…
#> 8 North America (Le… 2023-03-25 Las … 2 green piano Whit… NA NA
#> 9 North America (Le… 2023-03-31 Arli… 1 green guitar Sad … NA NA
#> 10 North America (Le… 2023-03-31 Arli… 1 green piano Ours… NA NA
#> # ℹ 156 more rows
Just another picture to burn
Often as we explore data, we want to create data visualizations. Naturally, if we’re exploring data for Taylor Swift, we need Taylor Swift-themed visualizations. taylor includes several color palettes and helper functions for ggplot2 to facilitate these visualizations.
First, there are color palettes inspired by each album stored in
album_palettes
. For example, we can look at a color palette
based on the cover art for the Lover album.
album_palettes$lover
#> <color_palette[5]>
#> #76BAE0
#> #8C4F66
#> #B8396B
#> #EBBED3
#> #FFF5CC
There is also a color palette that contains one color for each album,
which is useful when comparing albums to each other. For a complete
description of color palette functionality in taylor, see
vignette("palettes")
.
album_compare
#> <color_palette[15]>
#> taylor_swift
#> fearless
#> fearless_tv
#> speak_now
#> speak_now_tv
#> red
#> red_tv
#> 1989
#> 1989_tv
#> reputation
#> lover
#> folklore
#> evermore
#> midnights
#> tortured_poets
taylor also includes several functions for using the built-in
palettes for color and fill scales with ggplot2. As an example,
scale_color_albums()
to map the album_compare
palette to geometries that have color mapped to the album name. In the
following plot, we display the cumulative number of surprise songs
played from each album and use scale_color_albums()
to
highlight each album within its respective facet.
Plot code
library(dplyr)
library(tidyr)
library(ggplot2)
surprise_song_count <- eras_tour_surprise %>%
nest(dat = -c(leg, date, city, night)) %>%
arrange(date) %>%
mutate(leg = factor(leg, levels = unique(eras_tour_surprise$leg),
labels = c("North America\n(Leg 1)",
"South\nAmerica",
"Asia-Pacific"))) %>%
mutate(show_number = seq_len(n()), .after = night) %>%
unnest(dat) %>%
left_join(distinct(taylor_album_songs, track_name, album_name),
join_by(song == track_name),
relationship = "many-to-one") %>%
count(leg, date, city, night, show_number, album_name) %>%
complete(nesting(leg, date, city, night, show_number), album_name) %>%
mutate(n = replace_na(n, 0)) %>%
arrange(album_name, date, night) %>%
mutate(surprise_count = cumsum(n), .by = album_name) %>%
mutate(album_name = replace_na(album_name, "Other"),
album_name = factor(album_name, c(album_levels, "Other")),
album_group = album_name)
ggplot(surprise_song_count) +
facet_wrap(~ album_name, ncol = 3) +
geom_line(data = ~select(.x, -album_name),
aes(x = show_number, y = surprise_count, group = album_group),
color = "grey80", na.rm = TRUE) +
geom_line(aes(x = show_number, y = surprise_count, color = album_name),
show.legend = FALSE, linewidth = 2, na.rm = TRUE) +
scale_color_albums(na.value = "grey80") +
labs(x = "Show", y = "Songs Played") +
theme_minimal() +
theme(strip.text.x = element_text(hjust = 0, size = 10),
axis.title = element_text(size = 9))
Or we can take a closer look at 1989 (Taylor’s Version). In
this figure we can see that from early June to August, Taylor took a
long break between playing songs from this album. The break ended when
Taylor resumed playing songs leading up to the announcement of 1989
(Taylor’s Version) at in Los Angeles at the end of the first U.S.
leg of the tour. For more details on ggplot2 scales provided by taylor,
see vignette("plotting")
.
Plot code
missing_firsts <- tibble(date = as.Date(c("2023-11-01",
"2024-02-01")))
day_ones <- surprise_song_count %>%
slice_min(date, by = c(leg, album_name)) %>%
select(leg, date, album_name) %>%
mutate(date = date - 1)
surprise_song_count %>%
bind_rows(missing_firsts) %>%
arrange(date) %>%
fill(leg, .direction = "up") %>%
bind_rows(day_ones) %>%
arrange(album_name, date) %>%
group_by(album_name) %>%
fill(surprise_count, .direction = "down") %>%
ggplot() +
facet_grid(cols = vars(leg), scales = "free_x", space = "free_x") +
geom_line(aes(x = date, y = surprise_count, group = album_name),
color = "grey80", na.rm = TRUE) +
geom_line(data = ~filter(.x, album_name == "1989 (Taylor's Version)"),
aes(x = date, y = surprise_count, color = album_name),
show.legend = FALSE, size = 2, na.rm = TRUE) +
scale_color_albums() +
scale_x_date(breaks = "month", date_labels = "%b %Y", expand = c(.02, .02)) +
labs(x = NULL, y = "Songs Played") +
theme_minimal() +
theme(strip.text.x = element_text(hjust = 0, size = 10),
axis.title = element_text(size = 9))