General principle

Connect to a workbook

When working with XLConnect, the first step will be to load a workbook in your R session with loadWorkbook(); this function will build a "bridge" between your Excel file and your R session.

# Load the XLConnect package
library(XLConnect)

# Build connection to urbanpop.xlsx: my_book
my_book <- loadWorkbook("../xDatasets/urbanpop.xlsx")

# Print out the class of my_book
class(my_book)
## [1] "workbook"
## attr(,"package")
## [1] "XLConnect"

List and read Excel sheets

Just as readxl and gdata, you can use XLConnect to import data from Excel file into R.

To list the sheets in an Excel file, use getSheets(). To actually import data from a sheet, you can use readWorksheet(). Both functions require an XLConnect workbook object as the first argument.

# Build connection to urbanpop.xlsx
my_book <- loadWorkbook("../xDatasets/urbanpop.xlsx")

# List the sheets in my_book
getSheets(my_book)
## [1] "1960-1966" "1967-1974" "1975-2011"
# Import the second sheet in my_book
head(readWorksheet(my_book, sheet = 2))
##          country      X1967      X1968      X1969      X1970      X1971
## 1    Afghanistan 1119067.20 1182159.06 1248900.79 1319848.78 1409001.09
## 2        Albania  621179.85  639964.46  658853.12  677839.12  698932.25
## 3        Algeria 4826104.22 5017298.60 5219331.87 5429743.08 5619041.53
## 4 American Samoa   17348.66   17995.51   18618.68   19206.39   19752.02
## 5        Andorra   15439.62   16726.99   18088.32   19528.96   20928.73
## 6         Angola  757496.32  798459.26  841261.96  886401.63  955010.09
##        X1972      X1973      X1974
## 1 1502401.79 1598835.45 1696444.83
## 2  720206.57  741681.04  763385.45
## 3 5815734.49 6020647.35 6235114.38
## 4   20262.67   20741.97   21194.38
## 5   22405.84   23937.05   25481.98
## 6 1027397.35 1103829.78 1184486.23

Customize readWorksheet

To get a clear overview about urbanpop.xlsx without having to open up the Excel file, you can execute the following code:

my_book <- loadWorkbook("../xDatasets/urbanpop.xlsx")
sheets <- getSheets(my_book)
all <- lapply(sheets, readWorksheet, object = my_book)
str(all)
## List of 3
##  $ :'data.frame':    209 obs. of  8 variables:
##   ..$ country: chr [1:209] "Afghanistan" "Albania" "Algeria" "American Samoa" ...
##   ..$ X1960  : num [1:209] 769308 494443 3293999 NA NA ...
##   ..$ X1961  : num [1:209] 814923 511803 3515148 13660 8724 ...
##   ..$ X1962  : num [1:209] 858522 529439 3739963 14166 9700 ...
##   ..$ X1963  : num [1:209] 903914 547377 3973289 14759 10748 ...
##   ..$ X1964  : num [1:209] 951226 565572 4220987 15396 11866 ...
##   ..$ X1965  : num [1:209] 1000582 583983 4488176 16045 13053 ...
##   ..$ X1966  : num [1:209] 1058743 602512 4649105 16693 14217 ...
##  $ :'data.frame':    209 obs. of  9 variables:
##   ..$ country: chr [1:209] "Afghanistan" "Albania" "Algeria" "American Samoa" ...
##   ..$ X1967  : num [1:209] 1119067 621180 4826104 17349 15440 ...
##   ..$ X1968  : num [1:209] 1182159 639964 5017299 17996 16727 ...
##   ..$ X1969  : num [1:209] 1248901 658853 5219332 18619 18088 ...
##   ..$ X1970  : num [1:209] 1319849 677839 5429743 19206 19529 ...
##   ..$ X1971  : num [1:209] 1409001 698932 5619042 19752 20929 ...
##   ..$ X1972  : num [1:209] 1502402 720207 5815734 20263 22406 ...
##   ..$ X1973  : num [1:209] 1598835 741681 6020647 20742 23937 ...
##   ..$ X1974  : num [1:209] 1696445 763385 6235114 21194 25482 ...
##  $ :'data.frame':    209 obs. of  38 variables:
##   ..$ country: chr [1:209] "Afghanistan" "Albania" "Algeria" "American Samoa" ...
##   ..$ X1975  : num [1:209] 1793266 785350 6460138 21632 27019 ...
##   ..$ X1976  : num [1:209] 1905033 807990 6774099 22047 28366 ...
##   ..$ X1977  : num [1:209] 2021308 830959 7102902 22452 29677 ...
##   ..$ X1978  : num [1:209] 2142248 854262 7447728 22899 31037 ...
##   ..$ X1979  : num [1:209] 2268015 877898 7810073 23457 32572 ...
##   ..$ X1980  : num [1:209] 2398775 901884 8190772 24177 34366 ...
##   ..$ X1981  : num [1:209] 2493265 927224 8637724 25173 36356 ...
##   ..$ X1982  : num [1:209] 2590846 952447 9105820 26342 38618 ...
##   ..$ X1983  : num [1:209] 2691612 978476 9591900 27655 40983 ...
##   ..$ X1984  : num [1:209] 2795656 1006613 10091289 29062 43207 ...
##   ..$ X1985  : num [1:209] 2903078 1037541 10600112 30524 45119 ...
##   ..$ X1986  : num [1:209] 3006983 1072365 11101757 32014 46254 ...
##   ..$ X1987  : num [1:209] 3113957 1109954 11609104 33548 47019 ...
##   ..$ X1988  : num [1:209] 3224082 1146633 12122941 35095 47669 ...
##   ..$ X1989  : num [1:209] 3337444 1177286 12645263 36618 48577 ...
##   ..$ X1990  : num [1:209] 3454129 1198293 13177079 38088 49982 ...
##   ..$ X1991  : num [1:209] 3617842 1215445 13708813 39600 51972 ...
##   ..$ X1992  : num [1:209] 3788685 1222544 14248297 41049 54469 ...
##   ..$ X1993  : num [1:209] 3966956 1222812 14789176 42443 57079 ...
##   ..$ X1994  : num [1:209] 4152960 1221364 15322651 43798 59243 ...
##   ..$ X1995  : num [1:209] 4347018 1222234 15842442 45129 60598 ...
##   ..$ X1996  : num [1:209] 4531285 1228760 16395553 46343 60927 ...
##   ..$ X1997  : num [1:209] 4722603 1238090 16935451 47527 60462 ...
##   ..$ X1998  : num [1:209] 4921227 1250366 17469200 48705 59685 ...
##   ..$ X1999  : num [1:209] 5127421 1265195 18007937 49906 59281 ...
##   ..$ X2000  : num [1:209] 5341456 1282223 18560597 51151 59719 ...
##   ..$ X2001  : num [1:209] 5564492 1315690 19198872 52341 61062 ...
##   ..$ X2002  : num [1:209] 5795940 1352278 19854835 53583 63212 ...
##   ..$ X2003  : num [1:209] 6036100 1391143 20529356 54864 65802 ...
##   ..$ X2004  : num [1:209] 6285281 1430918 21222198 56166 68301 ...
##   ..$ X2005  : num [1:209] 6543804 1470488 21932978 57474 70329 ...
##   ..$ X2006  : num [1:209] 6812538 1512255 22625052 58679 71726 ...
##   ..$ X2007  : num [1:209] 7091245 1553491 23335543 59894 72684 ...
##   ..$ X2008  : num [1:209] 7380272 1594351 24061749 61118 73335 ...
##   ..$ X2009  : num [1:209] 7679982 1635262 24799591 62357 73897 ...
##   ..$ X2010  : num [1:209] 7990746 1676545 25545622 63616 74525 ...
##   ..$ X2011  : num [1:209] 8316976 1716842 26216968 64817 75207 ...

Suppose we’re only interested in urban population data of the years 1968, 1969 and 1970. The data for these years is in the columns 3, 4, and 5 of the second sheet. Only selecting these columns will leave us in the dark about the actual countries the figures belong to.

# Build connection to urbanpop.xlsx
my_book <- loadWorkbook("../xDatasets/urbanpop.xlsx")

# Import columns 3, 4, and 5 from second sheet in my_book: urbanpop_sel
urbanpop_sel <- readWorksheet(my_book, sheet = 2, startCol = 3, endCol = 5)

# Import first column from second sheet in my_book: countries
countries <- readWorksheet(my_book, sheet = 2, startCol = 1, endCol = 1)

# cbind() urbanpop_sel and countries together: selection
selection <- cbind(countries, urbanpop_sel)

selection %>% 
  tail() %>% 
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = T, position = "left", , font_size = 11) %>%
  row_spec(0, bold = T, color = "white", background = "#3f7689")
country X1968 X1969 X1970
204 Vietnam 7169607.15 7487420.72 7819407.00
205 Virgin Islands (U.S.) 36618.47 40041.03 43842.96
206 Yemen 736943.57 776968.13 817283.94
207 Zambia 1069556.61 1160043.76 1256178.42
208 Zimbabwe 792772.84 846773.92 903905.48
209 South Sudan 321097.03 326810.10 333013.31

Adapting sheets

Add worksheet

Where readxl and gdata were only able to import Excel data, XLConnect’s approach of providing an actual interface to an Excel file makes it able to edit your Excel files from inside R. In this exercise, you’ll create a new sheet. In the next exercise, you’ll populate the sheet with data, and save the results in a new Excel file.

# Build connection to urbanpop.xlsx
my_book <- loadWorkbook("../xDatasets/urbanpop.xlsx")

# Add a worksheet to my_book, named "data_summary"
createSheet(my_book, "data_summary")

# Use getSheets() on my_book
getSheets(my_book)
## [1] "1960-1966"    "1967-1974"    "1975-2011"    "data_summary"

Populate worksheet

The first step of creating a sheet is done; let’s populate it with some data now! summ, a data frame with some summary statistics on the two Excel sheets is already coded so you can take it from there.

# Build connection to urbanpop.xlsx
my_book <- loadWorkbook("../xDatasets/urbanpop.xlsx")

# Add a worksheet to my_book, named "data_summary"
createSheet(my_book, "data_summary")

# Create data frame: summ
sheets <- getSheets(my_book)[1:3]
dims <- sapply(sheets, function(x) dim(readWorksheet(my_book, sheet = x)), USE.NAMES = FALSE)
summ <- data.frame(sheets = sheets,
                   nrows = dims[1, ],
                   ncols = dims[2, ])

# Add data in summ to "data_summary" sheet
writeWorksheet(my_book, summ, sheet = "data_summary")

# Save workbook as summary.xlsx
saveWorkbook(my_book, file = "summary.xlsx")

See the created workook here

Renaming sheets

Come to think of it, "data_summary" is not an ideal name. As the summary of these excel sheets is always data-related, you simply want to name the sheet "summary".

# Rename "data_summary" sheet to "summary"
renameSheet(my_book, "data_summary", "summary")

# Print out sheets of my_book
getSheets(my_book)
## [1] "1960-1966" "1967-1974" "1975-2011" "summary"
# Save workbook to "renamed.xlsx"
saveWorkbook(my_book, file = "renamed.xlsx")

See the updated sheet renamed here

Removing sheets

After presenting the new Excel sheet to your peers, it appears not everybody is a big fan. Why summarize sheets and store the info in Excel if all the information is implicitly available? To hell with it, just remove the entire fourth sheet!

# Load the XLConnect package
library(XLConnect)

# Build connection to renamed.xlsx: my_book
my_book <- loadWorkbook("renamed.xlsx")

# Remove the fourth sheet
removeSheet(my_book, sheet = "summary")

# Save workbook to "clean.xlsx"
saveWorkbook(my_book, file = "clean.xlsx")