This vignette describes how to get from a computer with no R or software development tools to a system on which you can do data analysis the Schola way. It is written primarily with Windows systems in mind.
It is written primarily with Windows computers in mind.
You can install this package on a machine with none of the development-related stuff like Rtools or git. Just run
options(repos = c(getOption("repos"), "scholaempirica" = "scholaempirica.github.io/drat"))
install.packages("reschola")
This will install the binary version of the package (i.e. no compilation needed, hence no need for the stuff in Rtools) and install all the packages you need. You will be able to run code in Schola Empirica projects, but you will not be able to follow some components of the workflow, which relies on git and Github for version control.
To get going, you will need:
You can simply install the package - it will automatically install all R packages needed to run standard Schola Empirica projects. See above for the commands.
Install the latest ‘release’ version from https://cran.r-project.org/.
Install the latest version from rstudio.com. There are preview releases, which are stable, and dailies, which are not and are not meant for normal use.
Once installed, set it up for git if you already have it (see below for setting up RStudio for git and for installing git.)
Also, go to Global Options and under Workspace
, uncheck
Restore .RData into workspace at startup
and set
Save workspace to .RData on exit
to Never
.
What this does: it makes sure that whatever objects you have in the workspace were created in this session, presumably from code or interactively. It also makes sure that when you restart R, you get a clean slate so no leftovers interfere with your work.
Why this is a good thing: it puts in practice the principle that code is real. It also forces you to work in scripts rather than typing things into the console (though you can retrieve code from history and store it in R files.)
What to watch out for: remember that once you close RStudio, whatever data is not saved, or whatever data you do not have code for recreating, will be lost.
If you need someone to run something in R but they don’t have anything installed, you can point them to RStudio Cloud.
It requires free registration.
This is a well-functioning installation on a remote server which you can run in the browser.
It is also great for teaching.
First, run install.packages("devtools")
. You may or may
not be prompted to install Rtools. If you are, go along. If not, run
devtools::devtools::dev_sitrep()
. This might prompt you to
install Rtools - if it does, again, go along. If you run
devtools::devtools::dev_sitrep()
again, you should see that
Rtools is installed and a path to it.
On a Mac, instead of installing Rtools, you install the Xcode
development tools (xcode-select --install
in the
Terminal).
Generally, the configuration that will affect how R behaves goes into
.Rprofile
. This lives somewhere in your user directory and
can be edited in RStudio using
usethis::use_r_profile()
.
Environment variables - used for things like passwords that you
should not put in your code can be put into .Renviron
(usethis::use_r_environ()
). You then use the variable in
code using Sys.getenv("VARIABLE_NAME")
. Note that
.Renviron
is not a standard R file, so values are not put
into quotes.
See the section in Colin Gillespie’s Efficient R Programming on R startup files.
All the packages you will normally need will be installed when you
install reschola
.
To streamline downloading packages, R likes to use a geographically
close CRAN mirror server. RStudio should set this for you to something
sensible - see Tools > Global Options > Packages
.
If this fails or you want to set one yourself, you can put this
somewhere close to the beginning of your .Rprofile
:
local({r <- getOption("repos")
r["CRAN"] <- "https://cran.rstudio.com" # change to CRAN mirror URL you like
options(repos=r)})
CRAN contains packages that are vetted for correctness, good
documentation etc. You install these using
install.packages()
.
CRAN typically holds binary packages. This
means they do not need compilation, i.e. you don’t need
devtools
and the other tools described above.
For some recently released or updated packages, a binary may not be available just yet. R will ask you to build from source; go ahead if you have the tools above - but the build may fail for some complicated packages. Or you can wait, usually a matter of days.
Github contains packages without much quality control. Proxies of quality include how well documented the package is externally, how often/recently it has been updated, whether the author responds to issues etc.
You need to build Github packages so your machine may need to have the build tools.
Often a CRAN package will have a more recent, but less well tested, version on Github. You can install it if you need a newer version but beware. Often, package authors also accept issues (bug reports) on Github and Github is also where you would contribute to a package.
For installing and setting up git and finding your way around Git and Github, the best you can do is follow Jenny Bryan’s Happy Git With R step by step. This also has some useful troubleshooting tips for the usual hell around authentication to Github etc.
This includes the setup of RStudio for git: there are a few options you need to check or change.
Then run usethis::git_vaccinate
to get git to always
ignore files which you never want committed. You only need to do this
once per computer.
Go to Tools > Global Options > Git/SVN
and tick
Use version control...
You should see a git executable in
the field below. If you don’t, see Jenny Bryan’s
troubleshooting guide.
If you set a project to use git, you should see a Git pane in the top
right. (This is done either in the Project Options
menu or
by usethis::use_git()
).
In brief:
usethis::use_github()
in an existing repo - you can add
organization = "scholaempirica"
to make a repo owned by the
scholaempirica
Github org. Or (recommended) create
a project from the reschola project template
(File > New Project > New Directory > Standard Reschola Project
)
to guide you through this.git init
and
gir remote add origin {repo-url}
, then commit and
git push --set-upstream orgin master
(assuming you are on
the master branch)usethis::git_sitrep()
In particular, take a look at the Connect chapter of Happy Git With R for steps to streamline your connection with Github, including (much recommended) caching credentials and using a personal access token and (optionally) SSH setup.
The terminology of git can be daunting, so here are my attempts at common-sense explanations.
On your computer, this is a folder (directory). Git knows it is a git
repository because there is a hidden .git
directory inside
it which holds all the “metadata” on versions, commits etc. You can type
git status
in your git-bash to see if the current directory
is a repository. In RStudio, you will see a Git tab on the top right if
your RStudio is correctly set up and if the current project or working
directory is a git repo.
Note that this is different from a repository in the context of R, where it means “place (server) from which you can install packages”.
You rename a repository on your machine just by renaming the directory - no other action needed. You can also move the directory at will.
Any repository lives in a local directory. In the context of working
with Github, this is your local copy. Copies elsewhere with which you
may want to “sync” (see push and
pull below) are called remotes. (You can see the
remotes for your repo by git remotes -v
) Each remote has a
name; github repose are customarily called origin - origin tends to be
the default remote.
This is the most crucial distinction which is not often described: as you work with a git repo, you are working with three different sets of objects:
This distinction means that if you have committed work on a file and then worked on it further, you can always easily revert your working copy to the last committed status.
It also means that the staged files are not just a window in an application, but a particular snapshot stored in git’s database. If you stage in one client (say RStudio), you will see the same staging status in another (say, Fork). It also means that if you stage a file (which really means staging all lines changed between the last commit and now), and then make changes to it, those changes are not staged and hence will not be in the next commit. If you want them there, you need to stage those new changes.
See this guide which I think is the best explanation of moving a file between these sets.
git diff
)The comparison, line-by-line, between two states of the file, typically between what is in your working copy and HEAD. Sometimes, e.g. when staging (see below) bits of a file that is already in the staging area, you will see a diff between working copy and staging area.
git add
)You stage changes to be committed - either whole files or individual lines.
git commit
)Committing means the current copy of
git push
)Sending all that has been committed to a remote.
Retrieving the current state of the remote and updating your repository (committed stuff/HEAD) with that.
For different workflows of how to use these commands together and in sequence, see Happy Git With R.
if any of this fails, there are a couple of components that may be at
fault: your git setup, the usethis
package, the
git2r
package, or the gh
package. For git, try
reinstalling it from the official installer. For the packages, try
updating or installing the latest github version.
Git GUIs are tricky in that they sometimes do under the hood something different from what the UI shows. From what I can tell, Fork and Gittower largely avoid this, and I really like Fork. Gittower is now an annual subscription, Fork will soon cost $50.
RStudio has a basic GUI in the Git tab. It is OK for making simple commits and pushing and pulling. Beyond that (even things like patch commits, i.e. committing only some changed lines in a file) I would suggest using something else. (Annoyingly it also lacks an option to force push and the ability to create a new branch on the remote when pushing a locally created new branch - both of which can leave you baffled in certain situation.)
All I can suggest for now is to use tidytex
for
installing and troubleshooting your LaTEX distribution.
On Windows, TextMate is supposed to be fine, as is SublimeText. For writing a lot of RMarkdown, you can look for a Markdown editor or install a Markdown plugin into these text editors.
To be able to produce charts in the default reschola font - Roboto
and Roboto Condensed - you need to have these fonts on your machine and
registered with R. The easiest way to do is to run
reschola::install_reschola_fonts()
and then
reschola::register_reschola_fonts()
to install and register
the fonts with your system.