Writing R packages, fast

2020 June 10

•

R packages are great. R users have a rich ecosystem of extensions to help us doing various things. We have our own integrated package management system, CRAN, and we also have Metacran which gives us an easy way to work out how popular specific packages are used. R also makes it ‘easy’ for you to write your own packages! Sort of.

How can I create and R package?

CRAN suggests making an R package is really simple.

5.5 How can I create an R package? A package consists of a subdirectory containing a file DESCRIPTION and the subdirectories R, data, demo, exec, inst, man, po, src, and tests (some of which can be missing). The package subdirectory may also contain files INDEX, NAMESPACE, configure, cleanup, LICENSE, LICENCE, COPYING and NEWS. See section “Creating R packages” in Writing R Extensions, for details. This manual is included in the R distribution, see What documentation exists for R?, and gives information on package structure, the configure and cleanup mechanisms, and on automated package checking and building. R version 1.3.0 has added the function package.skeleton() which will set up directories, save data and code, and create skeleton help files for a set of R functions and datasets.

So I just run the function package.skeleton(), and then I just fill in some R code right? Well, you could. But also, please don’t. Having done it that way myself the first few times, you can do that, but that doesn’t mean you should. Luckily R is a programming language, and people use programming languages to automate things. So obviously people have built packages the slow way that will help you build your package easier, better, faster. We should use those instead!

Starting a project

You should be using RStudio. It’s an Open Source IDE for R specifically, which is free to download. The fastest way to start your project is to open RStudio, and go to File > New Project > New Directory > R Package. Then give it a name, and make sure to create a git repository with it.

You now have a Hello World package. It contains a function called hello in the R directory, with a few comments and notes about short cuts. You will be able to Build the package, and you can run a Check, which will give you a warning, and run the Test which will give you an error.

You’ll also have a few other files and folders. man will hold the manual, or documentation. The DESCRIPTION gives you a bunch of metadata, and the NAMESPACE will only have the line exportPattern("^[[:alpha:]]+"). Don’t worry about that for now, we’ll fix it in a moment. .Rbuildignore is a bit like the .gitignore. It will just contain references for R to not use when running Build.

Setting up a package and managing package structure with`usethis`

I talked about usethis in my last post, but didn’t do it nearly the justice it deserves. The best place to start is Usage and then have a look at the Reference.

In my projects I tend to use these functions to set up each one:

use_pipe() to allow the package to use pipes from magrittr
use_testthat() to set up tests that for the package
use_package() to include a dependency, such as ggplot2 or readr
use_vignette() to include long form documentation
use_readme_rmd()/use_readme_md() to create Readmes
use_badge() to fill in those Readmes with badges

And then to write my package I tend to use these functions to write it:

use_r() to create a new R file to source
use_test() to create a new test for a specific open R file
use_data() to include data sets in my packages

So you can do all these things manually, and maybe the first time you should. Maybe you should learn that to use a package you need to modify the Imports field in the DESCRIPTION, and maybe you need to learn that to make testthat work in your package you need to make a specific test folder, and then include a testthat.R file to make it work. However, I also don’t need to implement my own class every time I want to make an object that holds a data matrix either, and I also don’t need to know how the interpreter actually works when I run my hello function. There’s a reason why we don’t write in 1 and 0, and it’s the same reason I don’t see the value in writing my own NAMESPACE files any more. I can do it, but it’s a simple boring task that I’ve done a bunch of times before, and that at this stage I’m actually likely to get more wrong than the computer. Talking of which…

`NAMESPACE`, self-documenting code and self-coding documentation

The NAMESPACE helps your package understand what functions it sources from what other packages, and what functions it allows to be used. The R packages book has a chapter on it that goes into detail, but the key thing to really understand is how to use it.

As an example, lets run usethis::use_pipe(). This will set-up our package to allow us to use the %>% operator to chain functions. Our console tells us:

● Run `devtools::document()`

We also have a new file ./R/utils-pipe.R:

#' Pipe operator
#'
#' See \code{magrittr::\link[magrittr]{\%>\%}} for details.
#'
#' @name %>%
#' @rdname pipe
#' @keywords internal
#' @export
#' @importFrom magrittr %>%
#' @usage lhs \%>\% rhs
NULL

So this is a file that’s all comments? Not quite. This is a Roxygen comment block with #', and it has special powers. It won’t actually be executed if this file is sourced, but if we document our package, the process will read these comment blocks, and the @ tags, and populate out the documentation in ./man. As we got prompted though, we need to run the documentation process manually, so lets do that with devtools::document().

Updating making.packages documentation
First time using roxygen2. Upgrading automatically...
Writing NAMESPACE
Loading making.packages
Writing NAMESPACE
Writing pipe.Rd

Now we have some changes. Our NAMESPACE LOOKS LIKE THIS:

# Generated by roxygen2: do not edit by hand

export("%>%")
importFrom(magrittr,"%>%")

and we have a new ./man/pipe.Rd file:

% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/utils-pipe.R
\name{\%>\%}
\alias{\%>\%}
\title{Pipe operator}
\usage{
lhs \%>\% rhs
}
\description{
See \code{magrittr::\link[magrittr]{\%>\%}} for details.
}
\keyword{internal}

What’s happened? A few things:

Roxygen was associated with our package build to parse and process the comments
Roxygen re-wrote the NAMESPACE
Roxygen read the Roxygen comments in the ./R/utils-pipe.R
Roxygen used those special comments to put an import and an export line into the NAMESPACE
- The export will allow users to use the %>% when they load our package
- The importFrom will allow us to use the %>% in our own code in the package itself
Roxygen populated the manual page for the function so we can see it in the RStudio help.

Compare the two comment blocks now. What are the differences?

Roxygen comments

#' Pipe operator
#'
#' See \code{magrittr::\link[magrittr]{\%>\%}} for details.
#'
#' @name %>%
#' @rdname pipe
#' @keywords internal
#' @export
#' @importFrom magrittr %>%
#' @usage lhs \%>\% rhs
NULL

Roxygenised `.Rd`

% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/utils-pipe.R
\name{\%>\%}
\alias{\%>\%}
\title{Pipe operator}
\usage{
lhs \%>\% rhs
}
\description{
See \code{magrittr::\link[magrittr]{\%>\%}} for details.
}
\keyword{internal}

So the first big one is formatting. The first one looks (mostly) human readable, while the second one has a lot more cruft. This cruft is basically LaTeX. The order has been changed around and few of the labels are a little different, but there is another big change. @export and @importFrom aren’t in the documentation! Those were only included to help Roxygen populate the NAMESPACE file. So you can see that those two tags are actually amongst the most important things to include in documenting other functions. If you check in the DESCRIPTION file we now also have a new Imports field in the YAML with magrittr being the only entry. Did you notice that there was no actual R code in this function? It actually explicitly contained NULL. All of this was purely generated from the documentation, but now we actually have a new piece of fundamental functionality in our package. Weird huh?

The package skeleton’s connected to the comment bone. The comment bone’s connected to the sinew imports.

So now you’re all prepared to write your own package. You know how to start a new package project, how to set it up with usethis and how to document functions with Roxygen. Hold up though. That thing about not having to do the basically the same fiddly boring job multiple times in each package? I really meant it. The cool R kids don’t even write their own Roxygen any more.

Go back to your ./R/hello.R file, and ditch those boring vanilla # comments. Now source the function into your global environment so you can run it interactively. Now install.packages("sinew"), and run sinew::makeOxygen(hello).

#' @description FUNCTION_DESCRIPTION

#' @return OUTPUT_DESCRIPTION
#' @details DETAILS
#' @examples 
#' \dontrun{
#' if(interactive()){
#'  #EXAMPLE1
#'  }
#' }
#' @rdname hello
#' @export

So sinew has built us a template for Roxygen, and put the function name in? Cute, but I don’t need that to write one word for me. OK, how about something a bit more real world. I’ve been making a wrapper for the dev.to API (using all these methods). If I go get my post_new_article function and run sinew::makeOxygen(post_new_article):

#' @title FUNCTION_TITLE
#' @description FUNCTION_DESCRIPTION
#' @param file PARAM_DESCRIPTION
#' @param key PARAM_DESCRIPTION, Default: NA
#' @return OUTPUT_DESCRIPTION
#' @details DETAILS
#' @examples 
#' \dontrun{
#' if(interactive()){
#'  #EXAMPLE1
#'  }
#' }
#' @seealso 
#'  \code{\link[rmarkdown]{yaml_front_matter}},\code{\link[rmarkdown]{render}}
#'  \code{\link[readr]{read_file}}
#'  \code{\link[stringr]{str_remove}}
#'  \code{\link[glue]{glue}}
#'  \code{\link[httr]{POST}},\code{\link[httr]{add_headers}}
#' @rdname post_new_article
#' @export 
#' @importFrom rmarkdown yaml_front_matter render
#' @importFrom readr read_file
#' @importFrom stringr str_remove
#' @importFrom glue glue
#' @importFrom httr POST add_headers

This is my output. Sinew inspects my function, and then derives the functions I need to import and adds them to the Roxygen comment block. It also links the functions in the @seealso which will show up in the docs. Remember that these now get automatically added to the NAMESPACE and DESCRIPTIONS where they need to be, each time I document my package. It’s also understood what arguments my function takes, and populated them with the defaults, if present. Now, why would you bother writing this by hand any more?

TL;DR

Start your package in the RStudio IDE as a new project
Set up your package with usethis
Populate your Roxygen comments with sinew
Profit!

Go to top

Read Previous

Building my first Django project with a Database