4  Database from scratch

Below will follow steps necessary to get a data base set up based on a data set provided.

4.0.1 Create a data dictionary

Copy and run the following to create a DataDictionary draft.

Change the var_name vector to the variable names of your own project and the same for instrument. Then export the DataDictionary draft and open the .ods file in a visual editor as LibreOffice.

require(readODS)
Loading required package: readODS
# Naming the different variables ("record_id" is the default name for the record number)
var_name <- c("record_id","base_date","hypertension", "diabetes", "cohabitation","mrs_assessed","mrs_date","mrs_score","event_date","event_type") 
# Instrument/form name corresponding to the variables
instrument = c("base","base","base","base","base","mrs","mrs","mrs","event","event")

# Combining the vectors with default data in data frame
(dd_draft <- data.frame(Variable...Field.Name=var_name,
           Form.Name=instrument,
           Field.Type = "text",
           Choices..Calculations..OR.Slider.Labels = "",
           Field.Label = "")) |>  gt::gt()
Variable...Field.Name Form.Name Field.Type Choices..Calculations..OR.Slider.Labels Field.Label
record_id base text
base_date base text
hypertension base text
diabetes base text
cohabitation base text
mrs_assessed mrs text
mrs_date mrs text
mrs_score mrs text
event_date event text
event_type event text
# Exporting for manual manipulation
write_ods(dd_draft,
          path = here::here("data/dd_draft.ods"))

4.0.2 Specify variable names etc.

Have a look at the demonstration file below for inspiration.

(dd_demo <- read.csv("https://raw.githubusercontent.com/agdamsbo/redcap-r-handbook/main/redcap/REDCapDataDictonaryDemo.csv")) |> head() |> gt::gt()
Variable...Field.Name Form.Name Section.Header Field.Type Field.Label Choices..Calculations..OR.Slider.Labels Field.Note Text.Validation.Type.OR.Show.Slider.Number Text.Validation.Min Text.Validation.Max Identifier. Branching.Logic..Show.field.only.if.... Required.Field. Custom.Alignment Question.Number..surveys.only. Matrix.Group.Name
study_id demographics text Study ID NA NA NA NA
date_enrolled demographics Demographic Characteristics text Date subject signed consent YYYY-MM-DD date NA NA NA NA
first_name demographics text First Name NA NA y NA NA
last_name demographics text Last Name NA NA y NA NA
address demographics Contact Information notes Street, City, State, ZIP NA NA y NA NA
telephone_1 demographics text Phone number Include Area Code phone NA NA y NA NA

Open the exported file “dd_draft.ods”.

# Opens file with standard program
system2("open", "data/dd_draft.ods")

Type or paste the needed information manually, save the file with a new name (I did “data/dd_draft_adds.ods”) and load it again.

(dd <- readODS::read_ods(here::here("data/dd_draft_adds.ods"), na = ""))|> gt::gt()
Variable...Field.Name Form.Name Field.Type Choices..Calculations..OR.Slider.Labels Field.Label
record_id base text NA ID
base_date base text NA Inclusion date
hypertension base radio Yes, No Hypertension
diabetes base radio Yes, No Diabetes
cohabitation base radio Yes, No Cohabitation
mrs_assessed mrs radio Yes, No Assesed
mrs_date mrs text NA Assessment date
mrs_score mrs radio 0, 1, 2, 3, 4, 5, 6 mRS score
event_date event text NA NA
event_type event radio TIA, AIS, ICH, SAH NA

4.0.3 Format choices

Note the different Field.Type’s used and Choices... , called labels. Each label will have a raw value. And for each of these I will also ad an “Unknown” with a artificial value of 99. This is performed with a small script.

For radio buttons, I have written the different options to choose from. For variables with a score to choose from, I have written the scores, and these variables and only these variables are named something with “_score”. Variables with binary or other types of outcome, I have written the label.

# I enjoy thoughtful variable naming
var_bin <- dd$Field.Type=="radio" & !grepl("_score", dd$Variable...Field.Name)

choices_bin <- unlist(lapply(strsplit(dd$Choices..Calculations..OR.Slider.Labels[var_bin], split = ", "),function(i){
  paste(paste0(c(seq_len(length(i)),"99"), ","),c(i,"Unknown"),collapse = " | ")
}))

dd$Choices..Calculations..OR.Slider.Labels[var_bin] <- choices_bin
var_score <- dd$Field.Type=="radio" & grepl("_score", dd$Variable...Field.Name)

choices_score <- unlist(lapply(strsplit(dd$Choices..Calculations..OR.Slider.Labels[var_score], split = ", "),function(i){
  paste(paste0(i, ","),i,collapse = " | ")
}))

dd$Choices..Calculations..OR.Slider.Labels[var_score] <- choices_score

Have a look at the results for now.

dd |> gt::gt()
Variable...Field.Name Form.Name Field.Type Choices..Calculations..OR.Slider.Labels Field.Label
record_id base text NA ID
base_date base text NA Inclusion date
hypertension base radio 1, Yes | 2, No | 99, Unknown Hypertension
diabetes base radio 1, Yes | 2, No | 99, Unknown Diabetes
cohabitation base radio 1, Yes | 2, No | 99, Unknown Cohabitation
mrs_assessed mrs radio 1, Yes | 2, No | 99, Unknown Assesed
mrs_date mrs text NA Assessment date
mrs_score mrs radio 0, 0 | 1, 1 | 2, 2 | 3, 3 | 4, 4 | 5, 5 | 6, 6 mRS score
event_date event text NA NA
event_type event radio 1, TIA | 2, AIS | 3, ICH | 4, SAH | 99, Unknown NA

4.0.4 Merging and data validation

Merging with all needed columns from the DataDictionary demo.

demo_names <- colnames(dd_demo)

col_miss <- demo_names[!demo_names %in% colnames(dd)]

df_miss <- data.frame(matrix(nrow = nrow(dd), ncol=length(col_miss)))
colnames(df_miss) <- col_miss

dd_upload <- cbind(dd,df_miss)[demo_names]

dd_upload$Text.Validation.Type.OR.Show.Slider.Number[grepl("_date",dd_upload$Variable...Field.Name)] <- "date_dmy"

dd_upload |> gt::gt()
Variable...Field.Name Form.Name Section.Header Field.Type Field.Label Choices..Calculations..OR.Slider.Labels Field.Note Text.Validation.Type.OR.Show.Slider.Number Text.Validation.Min Text.Validation.Max Identifier. Branching.Logic..Show.field.only.if.... Required.Field. Custom.Alignment Question.Number..surveys.only. Matrix.Group.Name
record_id base NA text ID NA NA NA NA NA NA NA NA NA NA NA
base_date base NA text Inclusion date NA NA date_dmy NA NA NA NA NA NA NA NA
hypertension base NA radio Hypertension 1, Yes | 2, No | 99, Unknown NA NA NA NA NA NA NA NA NA NA
diabetes base NA radio Diabetes 1, Yes | 2, No | 99, Unknown NA NA NA NA NA NA NA NA NA NA
cohabitation base NA radio Cohabitation 1, Yes | 2, No | 99, Unknown NA NA NA NA NA NA NA NA NA NA
mrs_assessed mrs NA radio Assesed 1, Yes | 2, No | 99, Unknown NA NA NA NA NA NA NA NA NA NA
mrs_date mrs NA text Assessment date NA NA date_dmy NA NA NA NA NA NA NA NA
mrs_score mrs NA radio mRS score 0, 0 | 1, 1 | 2, 2 | 3, 3 | 4, 4 | 5, 5 | 6, 6 NA NA NA NA NA NA NA NA NA NA
event_date event NA text NA NA NA date_dmy NA NA NA NA NA NA NA NA
event_type event NA radio NA 1, TIA | 2, AIS | 3, ICH | 4, SAH | 99, Unknown NA NA NA NA NA NA NA NA NA NA

4.0.5 Upload the DataDictionary

The DataDictionary can be uploaded using the API or manually via the REDCap webpage. Here I’ll show the first option:

REDCapR::redcap_metadata_write(dd_upload,
                               redcap_uri = keyring::key_get("DB_URI"),
                               token = keyring::key_get("example_api"))

4.1 Upload data to the project

REDCapR::redcap_read_oneshot(redcap_uri = keyring::key_get("DB_URI"),
                             token = keyring::key_get("example_api"))
REDCapR::redcap_write(ds_to_write = df,
                      redcap_uri = keyring::key_get("DB_URI"),
                      token = keyring::key_get("example_api"))

4.2 Coming up

  • Rewrite of chapter to a more general use case
  • Tips for branching and calculations
  • more…