Whats Covered
- Mutating joins
- Filtering joins and set operations
- Assembling data
- Advanced joining
- Case Study
Aditional Resources
Advanced joining
What can go wrong?
- This chapter is about becoming a master at joining data
- The main problems with joins come from missing or duplicate key values or columns
- NA in the key column?
- Probably just need to remove these rows
- Missing key column?
- it might be in the row name. use
rownames_to_column
intibble
package
- it might be in the row name. use
- duplicate keys?
- Might just need to include another column in the key criteria
- If you really have duplicates be careful as you will get every combination of key value matches across datasets.
Spot the key
R’s data frames can store important information in the row.names
attribute. This is not a tidy way to store data, but it does happen quite commonly. If the primary key of your dataset is stored in row.names
, you will have trouble joining it to other datasets.
For example, stage_songs
contains information about songs that appear in musicals. However, it stores the primary key (song name) in the row.names
attribute. As a result, you cannot access the key with a join function.
One way to remedy this problem is to use the function rownames_to_column()
from the tibble
package. rownames_to_column()
returns a copy of a dataset with the row names added to the data as a column.
# Whoops, the primary key is in the row names for stage_songs
# We need to fix that before joining to stage_writers
stage_songs
## musical year
## Children Will Listen Into the Woods 1986
## Maria West Side Story 1957
## Memory Cats 1981
## The Music of the Night Phantom of the Opera 1986
stage_writers
## song composer
## 1 Children Will Listen Stephen Sondheim
## 2 Maria Leonard Bernstein
## 3 Memory Andrew Lloyd Webber
## 4 The Music of the Night Andrew Lloyd Webber
# Load the tibble package
library(tibble)
stage_songs %>%
# Add row names as a column named song
rownames_to_column("song") %>%
# Left join stage_writers to stage_songs
left_join(stage_writers, by = c("song")) %>%
# Create pretty table
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, position = "left", font_size = 11) %>%
row_spec(0, bold = T, color = "white", background = "#3f7689")
song | musical | year | composer |
---|---|---|---|
Children Will Listen | Into the Woods | 1986 | Stephen Sondheim |
Maria | West Side Story | 1957 | Leonard Bernstein |
Memory | Cats | 1981 | Andrew Lloyd Webber |
The Music of the Night | Phantom of the Opera | 1986 | Andrew Lloyd Webber |
The tibble package contains many functions that began in dplyr and have since been moved to tibble
Non-unique keys
shows
and composers
provide some more information about songs that appear in musicals.
You can join the two datasets by the musical column, which appears in both the datasets. However, you will run into a glitch: the values of musical do not uniquely identify the rows of composers. For example, two rows have the value “The Sound of Music” for musical and two other rows have the value “The King and I”.
How many entries (rows) will dplyr create for The Sound of Music if you left join composers
to shows
by musical?
- Non unique keys can be okay if you want the info for each entry and don’t mind it being duplicated.
- like years will get duplicated here because there are non unique keys in the musical column of composers dataset
shows
## # A tibble: 6 x 2
## musical year
## <chr> <int>
## 1 The Sound of Music 1959
## 2 Into the Woods 1986
## 3 The King and I 1951
## 4 West Side Story 1957
## 5 Cats 1981
## 6 Phantom of the Opera 1986
# There are more than one composer for some musicals, so the musical column has duplicate key values.
composers
## # A tibble: 8 x 2
## musical composer
## <chr> <chr>
## 1 Cats Andrew Lloyd Webber
## 2 Into the Woods Stephen Sondheim
## 3 Phantom of the Opera Andrew Lloyd Webber
## 4 The King and I Richard Rogers
## 5 The King and I Oscar Hammerstein II
## 6 The Sound of Music Richard Rogers
## 7 The Sound of Music Oscar Hammerstein II
## 8 West Side Story Leonard Bernstein
# join by musical
shows %>% left_join(composers, by = "musical") %>%
# Create pretty table
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, position = "left", font_size = 11) %>%
row_spec(0, bold = T, color = "white", background = "#3f7689")
musical | year | composer |
---|---|---|
The Sound of Music | 1959 | Richard Rogers |
The Sound of Music | 1959 | Oscar Hammerstein II |
Into the Woods | 1986 | Stephen Sondheim |
The King and I | 1951 | Richard Rogers |
The King and I | 1951 | Oscar Hammerstein II |
West Side Story | 1957 | Leonard Bernstein |
Cats | 1981 | Andrew Lloyd Webber |
Phantom of the Opera | 1986 | Andrew Lloyd Webber |
If a row in the primary dataset contains multiple matches in the secondary dataset, left_join() will duplicate the row once for every match.
Two non-unique keys
You saw in the last exercise that if a row in the primary dataset contains multiple matches in the secondary dataset, left_join()
will duplicate the row once for every match. This is true for all of dplyr’s join functions.
Now, let’s see how this rule would apply when the primary dataset contains duplicate key values.
show_songs
contains songs that appear in the musicals written by the composers. You can join the two by the musical
column, but like composers
, show_songs
has two rows where musical == "The Sound of Music"
.
How many entries (rows) will exist for The Sound of Music if you left join composers
to show_songs
by musical
?
show_songs
## # A tibble: 8 x 3
## song musical year
## <chr> <chr> <int>
## 1 A Few of My Favorite Things The Sound of Music 1959
## 2 Children Will Listen Into the Woods 1986
## 3 Edelweiss The Sound of Music 1959
## 4 Getting to Know You The King and I 1951
## 5 Maria West Side Story 1957
## 6 Memory Cats 1981
## 7 Shall We Dance? The King and I 1951
## 8 The Music of the Night Phantom of the Opera 1986
show_songs %>%
left_join(composers, by = "musical") %>%
arrange(musical) %>%
head(12) %>%
# Create pretty table
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, position = "left", font_size = 11) %>%
row_spec(0, bold = T, color = "white", background = "#3f7689")
song | musical | year | composer |
---|---|---|---|
Memory | Cats | 1981 | Andrew Lloyd Webber |
Children Will Listen | Into the Woods | 1986 | Stephen Sondheim |
The Music of the Night | Phantom of the Opera | 1986 | Andrew Lloyd Webber |
Getting to Know You | The King and I | 1951 | Richard Rogers |
Getting to Know You | The King and I | 1951 | Oscar Hammerstein II |
Shall We Dance? | The King and I | 1951 | Richard Rogers |
Shall We Dance? | The King and I | 1951 | Oscar Hammerstein II |
A Few of My Favorite Things | The Sound of Music | 1959 | Richard Rogers |
A Few of My Favorite Things | The Sound of Music | 1959 | Oscar Hammerstein II |
Edelweiss | The Sound of Music | 1959 | Richard Rogers |
Edelweiss | The Sound of Music | 1959 | Oscar Hammerstein II |
Maria | West Side Story | 1957 | Leonard Bernstein |
This time left_join()
duplicates each row in show_songs
that contains the Sound of Music twice, once for each row of composers
that contains the Sound of Music. The result is four rows that contain the Sound of Music.
Missing keys
Julie Andrews is a famous singer who performed in The Sound of Music, which is documented in the singers
data frame. The two_songs
data frame records a couple of songs that she sang during her career.
Unfortunately, when you join the datasets together, something odd happens. According to the result, who sings A Spoonful of Sugar? Why does this happen?
You can avoid this outcome by removing rows that contain NAs
in the keys before joining.
- Missing keys can mistakenly join two rows that do not belong toegether
singers
## # A tibble: 2 x 2
## movie singer
## <chr> <chr>
## 1 <NA> Arnold Schwarzenegger
## 2 The Sound of Music Julie Andrews
two_songs
## # A tibble: 2 x 2
## song movie
## <chr> <chr>
## 1 Do-Re-Mi The Sound of Music
## 2 A Spoonful of Sugar <NA>
# Examine the result of joining singers to two_songs
two_songs %>% inner_join(singers, by = "movie") %>%
# Create pretty table
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, position = "left", font_size = 11) %>%
row_spec(0, bold = T, color = "white", background = "#3f7689")
song | movie | singer |
---|---|---|
Do-Re-Mi | The Sound of Music | Julie Andrews |
A Spoonful of Sugar | NA | Arnold Schwarzenegger |
# Remove NA's from key before joining
two_songs %>%
filter(!is.na(movie)) %>%
inner_join(singers, by = "movie") %>%
# Create pretty table
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, position = "left", font_size = 11) %>%
row_spec(0, bold = T, color = "white", background = "#3f7689")
song | movie | singer |
---|---|---|
Do-Re-Mi | The Sound of Music | Julie Andrews |
This is the type of detail that dplyr is likely to handle itself in the future.
Defining the keys
- If you do not set a
by
argument dplyr will join on all columns that match across the two datasets - You can use named vetors to match column with different names
by = c("name1"" = "name2")
- If there are non-key columns with the same names they wil show up as
name.x
andname.y
- You can change this by adding the
suffix
argument.suffix = c("name1","name2")
- You can change this by adding the
Which keys?
Frank Sinatra and Bing Crosby each crossed the boundary from music to acting, singing famous songs in well-known movies. movie_years
and movie_songs
describe some of their work.
Examine movie_years
and movie_songs
to determine which keys dplyr would use to join the datasets if you did not set the by
argument. In this case, it would not matter which type of join you use, or in which order you call the datasets.
movie_years
## # A tibble: 10 x 3
## movie name year
## <chr> <chr> <int>
## 1 The Road to Morocco Bing Crosby 1942
## 2 Going My Way Bing Crosby 1944
## 3 Anchors Aweigh Frank Sinatra 1945
## 4 Till the Clouds Roll By Frank Sinatra 1946
## 5 White Christmas Bing Crosby 1954
## 6 The Tender Trap Frank Sinatra 1955
## 7 High Society Bing Crosby 1956
## 8 The Joker is Wild Frank Sinatra 1957
## 9 Pal Joey Frank Sinatra 1957
## 10 Can-Can Frank Sinatra 1960
movie_songs
## # A tibble: 11 x 3
## song movie name
## <chr> <chr> <chr>
## 1 Moonlight Becomes You The Road to Morocco Bing Crosby
## 2 Swinging on a Star Going My Way Bing Crosby
## 3 Too-Ra-Loo-Ra-Loo-Ral Going My Way Bing Crosby
## 4 I Fall in Love Too Easily Anchors Aweigh Frank Sinatra
## 5 Ol' Man River Till the Clouds Roll By Frank Sinatra
## 6 White Christmas White Christmas Bing Crosby
## 7 (Love Is) The Tender Trap The Tender Trap Frank Sinatra
## 8 Well, Did you Evah? High Society Bing Crosby
## 9 All the Way The Joker is Wild Frank Sinatra
## 10 The Lady is a Tramp Pal Joey Frank Sinatra
## 11 I Love Paris Can-Can Frank Sinatra
movie_years %>%
# Inner join movie_years to movie_songs
inner_join(movie_songs) %>%
# Create pretty table
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, position = "left", font_size = 11) %>%
row_spec(0, bold = T, color = "white", background = "#3f7689")
## Joining, by = c("movie", "name")
movie | name | year | song |
---|---|---|---|
The Road to Morocco | Bing Crosby | 1942 | Moonlight Becomes You |
Going My Way | Bing Crosby | 1944 | Swinging on a Star |
Going My Way | Bing Crosby | 1944 | Too-Ra-Loo-Ra-Loo-Ral |
Anchors Aweigh | Frank Sinatra | 1945 | I Fall in Love Too Easily |
Till the Clouds Roll By | Frank Sinatra | 1946 | Ol’ Man River |
White Christmas | Bing Crosby | 1954 | White Christmas |
The Tender Trap | Frank Sinatra | 1955 | (Love Is) The Tender Trap |
High Society | Bing Crosby | 1956 | Well, Did you Evah? |
The Joker is Wild | Frank Sinatra | 1957 | All the Way |
Pal Joey | Frank Sinatra | 1957 | The Lady is a Tramp |
Can-Can | Frank Sinatra | 1960 | I Love Paris |
By default, dplyr will join by the combination of all column names that appear in both datasets.
A subset of keys
Often the same column name will be used by two datasets to refer to different things. For example, the data frame movie_studios
uses name to refer to the name of a movie studio. movie_years
uses name to refer to the name of an actor.
You could join these datasets (they describe the same movies), but you wouldn’t want to use the name column to do so!
dplyr
will ignore duplicate column names if you set the by
argument and do not include the duplicated name in the argument. When you do this, dplyr will treat the columns in the normal fashion, but it will add .x
and .y
to the duplicated names to help you tell the columns apart.
movie_years %>%
# Left join movie_studios to movie_years
left_join(movie_studios, by = c("movie")) %>%
# Rename the columns: artist and studio
rename(artist = name.x, studio = name.y) %>%
# Create pretty table
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, position = "left", font_size = 11) %>%
row_spec(0, bold = T, color = "white", background = "#3f7689")
movie | artist | year | studio |
---|---|---|---|
The Road to Morocco | Bing Crosby | 1942 | Paramount Pictures |
Going My Way | Bing Crosby | 1944 | Paramount Pictures |
Anchors Aweigh | Frank Sinatra | 1945 | Metro-Goldwyn-Mayer |
Till the Clouds Roll By | Frank Sinatra | 1946 | Metro-Goldwyn-Mayer |
White Christmas | Bing Crosby | 1954 | Paramount Pictures |
The Tender Trap | Frank Sinatra | 1955 | Metro-Goldwyn-Mayer |
High Society | Bing Crosby | 1956 | Metro-Goldwyn-Mayer |
The Joker is Wild | Frank Sinatra | 1957 | Paramount Pictures |
Pal Joey | Frank Sinatra | 1957 | Columbia Pictures |
Can-Can | Frank Sinatra | 1960 | Twentieth-Century Fox |
Mis-matched key names
Just as the same name can refer to different things in different datasets, different names can refer to the same thing. For example, elvis_movies
and elvis_songs
both describe movies starring Elvis Presley, but each uses a different column name to describe the name of the movie.
This type of inconsistency can be frustrating when you wish to join data based on the inconsistently named variable.
To make the join, set by
to a named vector. The names of the vector will refer to column names in the primary dataset (x). The values of the vector will correspond to the column names in the secondary dataset (y), e.g.
x %>% left_join(y, by = c(“x.name1” = “y.name2”))
dplyr will make the join and retain the names in the primary dataset.
# Identify the key column
elvis_songs
## # A tibble: 5 x 2
## name movie
## <chr> <chr>
## 1 (You're So Square) Baby I Don't Care Jailhouse Rock
## 2 I Can't Help Falling in Love Blue Hawaii
## 3 Jailhouse Rock Jailhouse Rock
## 4 Viva Las Vegas Viva Las Vegas
## 5 You Don't Know Me Clambake
elvis_movies
## # A tibble: 4 x 2
## name year
## <chr> <int>
## 1 Jailhouse Rock 1957
## 2 Blue Hawaii 1961
## 3 Viva Las Vegas 1963
## 4 Clambake 1967
elvis_movies %>%
# Left join elvis_songs to elvis_movies by this column
left_join(elvis_songs, by = c("name" = "movie")) %>%
# Rename columns
rename(movie = name, song = name.y) %>%
# Create pretty table
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, position = "left", font_size = 11) %>%
row_spec(0, bold = T, color = "white", background = "#3f7689")
movie | year | song |
---|---|---|
Jailhouse Rock | 1957 | (You’re So Square) Baby I Don’t Care |
Jailhouse Rock | 1957 | Jailhouse Rock |
Blue Hawaii | 1961 | I Can’t Help Falling in Love |
Viva Las Vegas | 1963 | Viva Las Vegas |
Clambake | 1967 | You Don’t Know Me |
More mismatched names
movie_years
and movie_directors
both describe movies that feature Frank Sinatra and Bing Crosby. However, each data frame uses different column names to refer to the key variables that connect the datasets.
Can you join the data?
# Identify the key columns
movie_directors
## # A tibble: 10 x 3
## name director studio
## <chr> <chr> <chr>
## 1 Anchors Aweigh George Sidney Metro-Goldwyn-Mayer
## 2 Can-Can Walter Lang Twentieth-Century Fox
## 3 Going My Way Leo McCarey Paramount Pictures
## 4 High Society Charles Walters Metro-Goldwyn-Mayer
## 5 Pal Joey George Sidney Columbia Pictures
## 6 The Joker is Wild Charles Vidor Paramount Pictures
## 7 The Road to Morocco David Butler Paramount Pictures
## 8 The Tender Trap Charles Walters Metro-Goldwyn-Mayer
## 9 Till the Clouds Roll By Richard Whorf Metro-Goldwyn-Mayer
## 10 White Christmas Michael Curtiz Paramount Pictures
movie_years
## # A tibble: 10 x 3
## movie name year
## <chr> <chr> <int>
## 1 The Road to Morocco Bing Crosby 1942
## 2 Going My Way Bing Crosby 1944
## 3 Anchors Aweigh Frank Sinatra 1945
## 4 Till the Clouds Roll By Frank Sinatra 1946
## 5 White Christmas Bing Crosby 1954
## 6 The Tender Trap Frank Sinatra 1955
## 7 High Society Bing Crosby 1956
## 8 The Joker is Wild Frank Sinatra 1957
## 9 Pal Joey Frank Sinatra 1957
## 10 Can-Can Frank Sinatra 1960
movie_years %>%
# Left join movie_directors to movie_years
left_join(movie_directors, by = c("movie" = "name")) %>%
# Arrange the columns using select()
select(year, movie, artist = name, director, studio) %>%
# Create pretty table
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, position = "left", font_size = 11) %>%
row_spec(0, bold = T, color = "white", background = "#3f7689")
year | movie | artist | director | studio |
---|---|---|---|---|
1942 | The Road to Morocco | Bing Crosby | David Butler | Paramount Pictures |
1944 | Going My Way | Bing Crosby | Leo McCarey | Paramount Pictures |
1945 | Anchors Aweigh | Frank Sinatra | George Sidney | Metro-Goldwyn-Mayer |
1946 | Till the Clouds Roll By | Frank Sinatra | Richard Whorf | Metro-Goldwyn-Mayer |
1954 | White Christmas | Bing Crosby | Michael Curtiz | Paramount Pictures |
1955 | The Tender Trap | Frank Sinatra | Charles Walters | Metro-Goldwyn-Mayer |
1956 | High Society | Bing Crosby | Charles Walters | Metro-Goldwyn-Mayer |
1957 | The Joker is Wild | Frank Sinatra | Charles Vidor | Paramount Pictures |
1957 | Pal Joey | Frank Sinatra | George Sidney | Columbia Pictures |
1960 | Can-Can | Frank Sinatra | Walter Lang | Twentieth-Century Fox |
Joining multiple tables
- To join multiple tables we could do a bunch of
left_join
s in a row - The
purrr
package has a functionreduce()
that lets us apply a function recursively to a list of datasets purrr
is also written by hadley wickham so it will work well withdplyr
functions- the purpose of
purrr
is to help us apply R functions to data in efficient ways
Join multiple tables
- Using reduce to do a left_join quick of all tables
purrr
’s reduce()
function is very useful for joining together multiple datasets. Let’s practice using it with three new datasets.
supergroups lists the members of several “supergroup” bands, which are bands made from the members of other famous bands. more_bands lists the original band membership of many musicians, including those in supergroups. And more_artists lists the instruments that various artists played.
Can you join these datasets into a single, coherent dataset?
# Load the purrr library
library(purrr)
# Place supergroups, more_bands, and more_artists into a list
list(supergroups, more_bands, more_artists) %>%
# Use reduce to join together the contents of the list
reduce(left_join, by = c("first", "last")) %>%
# Create pretty table
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, position = "left", font_size = 11) %>%
row_spec(0, bold = T, color = "white", background = "#3f7689")
supergroup | first | last | band | instrument |
---|---|---|---|---|
Cream | Ginger | Baker | Graham Bond Organisation | drums |
Cream | Jack | Bruce | Graham Bond Organisation | bass |
Cream | Jack | Bruce | Graham Bond Organisation | bass |
Cream | Jack | Bruce | Manfred Mann | bass |
Cream | Jack | Bruce | Manfred Mann | bass |
Cream | Eric | Clapton | The Yardbirds | guitar |
Crosby, Stills, Nash, and Young | David | Crosby | The Byrds | guitar |
Crosby, Stills, Nash, and Young | Graham | Nash | The Hollies | guitar |
Crosby, Stills, Nash, and Young | Stephen | Stills | Buffalo Springfield | guitar |
Crosby, Stills, Nash, and Young | Neil | Young | Buffalo Springfield | guitar |
The Highwaymen | Johnny | Cash | NA | guitar |
The Highwaymen | Waylon | Jennings | NA | guitar |
The Highwaymen | Kris | Kristofferson | NA | guitar |
The Highwaymen | Willie | Nelson | NA | guitar |
The Traveling Wilburys | Bob | Dylan | NA | guitar |
The Traveling Wilburys | George | Harrison | The Beatles | guitar |
The Traveling Wilburys | Jeff | Lynne | Electric Light Orchestra | vocals |
The Traveling Wilburys | Roy | Orbison | NA | guitar |
The Traveling Wilburys | Tom | Petty | Tom Petty and the Heartbreakers | vocals |
Filter multiple tables
Nice work! The job of reduce()
is to apply a function in an iterative fashion to many datasets. As a result, reduce()
works well with all of the dplyr join functions.
For example, you can use reduce()
to filter observations with a filtering join.
list(more_artists, more_bands, supergroups) %>%
# Return rows of more_artists in all three datasets
reduce(semi_join, by = c("first", "last")) %>%
# Create pretty table
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, position = "left", font_size = 11) %>%
row_spec(0, bold = T, color = "white", background = "#3f7689")
first | last | instrument |
---|---|---|
David | Crosby | guitar |
Eric | Clapton | guitar |
George | Harrison | guitar |
Ginger | Baker | drums |
Graham | Nash | guitar |
Jack | Bruce | bass |
Jack | Bruce | bass |
Jeff | Lynne | vocals |
Neil | Young | guitar |
Stephen | Stills | guitar |
Tom | Petty | vocals |
The reduce idiom is a powerful tool and well worth knowing. In fact, it is a major component of ‘MapReduce’ in the Hadoop world.
Other implementations
- We can do the same joins as we have been using here wih the
merge
function, but the syntax is more confusing- I will always just use
dplyr
joins
- I will always just use
- The
dplyr
joins translate to sql joins as show- These are good to know as often I am exploring a sql database before pulling data into R. It happens.
- These are good to know as often I am exploring a sql database before pulling data into R. It happens.
- These are the DBMSs (Database management systems) you can easily connect to from R
- Once you have connected you can write
dplyr
code and it will be translated to sql and run in the DBMS - This is incredible useful. Its worthy of its own corse.
- checkout
vignette("dbplyr")
to see more aboutdplyr
’s database capabilities - We can use
collect
to pull the data into R when we are ready ordata.frame
will also pull it in.
- Once you have connected you can write
SQL
- Each
dplyr
join has an SQL equivalent, and when you apply a dplyr join to a SQL data source (instead of a data frame),dplyr
automatically converts your join to its SQL equivalent and runs it in the database.
Base R
dplyr joins
are also similar to the merge()
function in base R.
For example, the code in the editor uses merge()
and arrange()
to perform a left_join()
on bands
and artists
.
# Alter the code to perform the join with a dplyr function
merge(bands, artists, by = c("first", "last"), all.x = TRUE) %>%
arrange(band)
## first last band instrument
## 1 Jimmy Page Led Zeppelin Guitar
## 2 John Bonham Led Zeppelin <NA>
## 3 John Paul Jones Led Zeppelin <NA>
## 4 Robert Plant Led Zeppelin <NA>
## 5 George Harrison The Beatles Guitar
## 6 John Lennon The Beatles Guitar
## 7 Paul McCartney The Beatles Bass
## 8 Ringo Starr The Beatles Drums
## 9 Jimmy Buffett The Coral Reefers Guitar
## 10 Charlie Watts The Rolling Stones <NA>
## 11 Keith Richards The Rolling Stones Guitar
## 12 Mick Jagger The Rolling Stones Vocals
## 13 Ronnie Wood The Rolling Stones <NA>
bands %>% left_join(artists, by = c("first", "last"))
## # A tibble: 13 x 4
## first last band instrument
## <chr> <chr> <chr> <chr>
## 1 John Bonham Led Zeppelin <NA>
## 2 John Paul Jones Led Zeppelin <NA>
## 3 Jimmy Page Led Zeppelin Guitar
## 4 Robert Plant Led Zeppelin <NA>
## 5 George Harrison The Beatles Guitar
## 6 John Lennon The Beatles Guitar
## 7 Paul McCartney The Beatles Bass
## 8 Ringo Starr The Beatles Drums
## 9 Jimmy Buffett The Coral Reefers Guitar
## 10 Mick Jagger The Rolling Stones Vocals
## 11 Keith Richards The Rolling Stones Guitar
## 12 Charlie Watts The Rolling Stones <NA>
## 13 Ronnie Wood The Rolling Stones <NA>
Session info
sessionInfo()
## R version 3.5.2 (2018-12-20)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 16299)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=German_Switzerland.1252 LC_CTYPE=German_Switzerland.1252
## [3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
## [5] LC_TIME=German_Switzerland.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] purrr_0.3.0 tibble_2.0.1 ggplot2_3.1.0 dplyr_0.8.0.1
## [5] gapminder_0.3.0 kableExtra_1.0.1 knitr_1.21
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.0 highr_0.7 plyr_1.8.4
## [4] pillar_1.3.1 compiler_3.5.2 prettydoc_0.2.1
## [7] tools_3.5.2 digest_0.6.18 gtable_0.2.0
## [10] evaluate_0.12 viridisLite_0.3.0 pkgconfig_2.0.2
## [13] rlang_0.3.1 cli_1.0.1 rstudioapi_0.9.0
## [16] yaml_2.2.0 xfun_0.4 withr_2.1.2
## [19] httr_1.4.0 stringr_1.4.0 xml2_1.2.0
## [22] hms_0.4.2 webshot_0.5.1 grid_3.5.2
## [25] tidyselect_0.2.5 glue_1.3.0 R6_2.4.0
## [28] fansi_0.4.0 rmarkdown_1.11 readr_1.3.1
## [31] magrittr_1.5 codetools_0.2-15 scales_1.0.0
## [34] htmltools_0.3.6 assertthat_0.2.0 rvest_0.3.2
## [37] colorspace_1.4-0 utf8_1.1.4 stringi_1.3.1
## [40] lazyeval_0.2.1 munsell_0.5.0 crayon_1.3.4