3  Database from dataset

You might be in the situation, that you have a dataset in a file, that you want to move to REDCap. Here I will try to walk through the steps of creating a database from a given dataset.

3.0.1 Create a data dictionary

Import the dataset

require(dplyr)
Loading required package: dplyr

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
data(iris)
head(iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa
# Using rownames as index and reordering to put record_id first (for REDCap)
ds <- iris %>% mutate(record_id=rownames(.)) %>% select(record_id,everything(.))

3.0.2 Specify variable names etc.

Have a look at the demonstration file below for inspiration.

(dd_demo <- read.csv("https://raw.githubusercontent.com/agdamsbo/redcap-r-handbook/main/redcap/REDCapDataDictonaryDemo.csv")) |> head() |> gt::gt()
Variable...Field.Name Form.Name Section.Header Field.Type Field.Label Choices..Calculations..OR.Slider.Labels Field.Note Text.Validation.Type.OR.Show.Slider.Number Text.Validation.Min Text.Validation.Max Identifier. Branching.Logic..Show.field.only.if.... Required.Field. Custom.Alignment Question.Number..surveys.only. Matrix.Group.Name
study_id demographics text Study ID NA NA NA NA
date_enrolled demographics Demographic Characteristics text Date subject signed consent YYYY-MM-DD date NA NA NA NA
first_name demographics text First Name NA NA y NA NA
last_name demographics text Last Name NA NA y NA NA
address demographics Contact Information notes Street, City, State, ZIP NA NA y NA NA
telephone_1 demographics text Phone number Include Area Code phone NA NA y NA NA

3.0.3 Format choices

Note the different Field.Type’s used and Choices... , called labels. Each label will have a raw value. This is performed with a small script.

For radio buttons, I have written the different options to choose from. For variables with a score to choose from, I have written the scores, and these variables and only these variables are named something with “_score”. Variables with binary or other types of outcome, I have written the label.

# I enjoy thoughtful variable naming
dd <- data.frame(matrix(nrow = ncol(ds),ncol=ncol(dd_demo)))
colnames(dd) <- colnames(dd_demo)

dd$Variable...Field.Name <- colnames(ds)

str(ds)
'data.frame':   150 obs. of  6 variables:
 $ record_id   : chr  "1" "2" "3" "4" ...
 $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
 $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
 $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
 $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
dd$Field.Type <- case_when(sapply(ds,"class") == "factor" ~ "radio",
          TRUE ~ "text")

for (i in seq_len(nrow(dd))){
if (dd$Field.Type[i] == "radio") {
  dd$Choices..Calculations..OR.Slider.Labels[i] <- paste0(paste0(paste0(seq_along(levels(ds[,i])),", ",levels(ds[,i])),collapse = " | "))
  }
}

dd$Form.Name <- "base"

Have a look at the results for now.

dd |> gt::gt()
Variable...Field.Name Form.Name Section.Header Field.Type Field.Label Choices..Calculations..OR.Slider.Labels Field.Note Text.Validation.Type.OR.Show.Slider.Number Text.Validation.Min Text.Validation.Max Identifier. Branching.Logic..Show.field.only.if.... Required.Field. Custom.Alignment Question.Number..surveys.only. Matrix.Group.Name
record_id base NA text NA NA NA NA NA NA NA NA NA NA NA NA
Sepal.Length base NA text NA NA NA NA NA NA NA NA NA NA NA NA
Sepal.Width base NA text NA NA NA NA NA NA NA NA NA NA NA NA
Petal.Length base NA text NA NA NA NA NA NA NA NA NA NA NA NA
Petal.Width base NA text NA NA NA NA NA NA NA NA NA NA NA NA
Species base NA radio NA 1, setosa | 2, versicolor | 3, virginica NA NA NA NA NA NA NA NA NA NA

3.0.4 Upload the DataDictionary

The DataDictionary can be uploaded using the API or manually via the REDCap webpage. Here I’ll show the first option:

REDCapR::redcap_metadata_write(dd_upload,
                               redcap_uri = keyring::key_get("DB_URI"),
                               token = keyring::key_get("example_api"))

3.1 Upload data to the project

REDCapR::redcap_write(ds_to_write = dd,
                      redcap_uri = keyring::key_get("DB_URI"),
                      token = keyring::key_get("example_api"))