Title: | Helper Files for the CTN-0094 Relational Database |
---|---|
Description: | Engineered features and "helper" functions ancillary to the 'public.ctn0094data' package, extending this package for ease of use (see <https://CRAN.R-project.org/package=public.ctn0094data>). This 'public.ctn0094data' package contains harmonized datasets from some of the National Institute of Drug Abuse's Clinical Trials Network (NIDA's CTN) projects. Specifically, the CTN-0094 project is to harmonize and de-identify clinical trials data from the CTN-0027, CTN-0030, and CTN-51 studies for opioid use disorder. This current version is built from 'public.ctn0094data' v. 1.0.6. |
Authors: | Gabriel Odom [aut, cre] , Raymond Balise [aut] |
Maintainer: | Gabriel Odom <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.0.4 |
Built: | 2024-11-17 03:24:36 UTC |
Source: | https://github.com/ctn-0094/public.ctn0094extra |
Create a Subject History Table by Protocol
CreateCTN30ProtocolHistory( randCTN30_df, start_int = -30, phase1Len_int = 98, phase2Len_int = 168 )
CreateCTN30ProtocolHistory( randCTN30_df, start_int = -30, phase1Len_int = 98, phase2Len_int = 168 )
randCTN30_df |
A data frame of subject randomization days for CTN-0030.
This data will have columns for |
start_int |
When should the protocol timeline start? Defaults to 30 days before consent (allowing for self-reported drug use via Timeline Follow- Back data). |
phase1Len_int |
When should Phase I end per protocol? Defaults to 98 days (14 weeks). |
phase2Len_int |
When should Phase II end per protocol? Defaults to 168 days (24 weeks). |
We may want to perform SQL-like operations on a set of tables. This
data table will form the "backbone" for future join operations. It creates
one record per person in a study for each day in that study, and it is
designed to work with subject-specific two-arm protocols (where Phase I
and Phase II treatments could start on different days for each subject).
For subjects who consented to treatment but were never randomized, we use
an "intent to treat" philosophy and assign them an empty protocol timeline
starting on the day of consent (day 0) until phase1Len_int
.
NOTE: We expect this function to be used specifically to create a
potential visit backbone for CTN-0030. For studies with fixed protocol
lengths, please use CreateSubjectHistory()
instead.
A tibble with columns who
and when
. Each subject will
have one row for each day in the study range.
# Subject A started Phase I on day 2 and was switched from treatment # Phase I to Phase II on day 45. Subject B started Phase I on day 6 and # was never switched to Phase II (either because they dropped out of the # study or because the treatment given to them in Phase I worked). rand_df <- data.frame( who = c("A", "A", "B", "B"), which = c( 1, 2, 1, 2), when = c( 2, 45, 6, NA) ) # Based on this example data above, we expect to see potential contact # days per the protocol for subject A to range from -30 to 45 + 168. # For subject B, we expect this range to be from -30 to 6 + 98 CreateCTN30ProtocolHistory(rand_df)
# Subject A started Phase I on day 2 and was switched from treatment # Phase I to Phase II on day 45. Subject B started Phase I on day 6 and # was never switched to Phase II (either because they dropped out of the # study or because the treatment given to them in Phase I worked). rand_df <- data.frame( who = c("A", "A", "B", "B"), which = c( 1, 2, 1, 2), when = c( 2, 45, 6, NA) ) # Based on this example data above, we expect to see potential contact # days per the protocol for subject A to range from -30 to 45 + 168. # For subject B, we expect this range to be from -30 to 6 + 98 CreateCTN30ProtocolHistory(rand_df)
Create a Subject History Table
CreateHistory( rawData_ls, personsTable = "everybody", firstDay_int = NULL, lastDay_int = NULL )
CreateHistory( rawData_ls, personsTable = "everybody", firstDay_int = NULL, lastDay_int = NULL )
rawData_ls |
a list of tibbles returned by |
personsTable |
What is the name of the data table that contains all the
subject IDs? Defaults to |
firstDay_int |
(OPTIONAL) What should be the first "day" counter for all subjects? This may be beneficial to set if you wish to have the History ignore all days before a certain point. |
lastDay_int |
(OPTIONAL) What should be the last "day" counter for all subjects? |
We may want to perform SQL-like operations on a set of tables. This data table will form the "backbone" for future join operations. It creates one record per person for each day in the study. The default behavior is to set the date range to the smallest observed value for "when" across all data tables to the largest observed value for "when" across all tables, inclusive. Necessarily, this is very sensitive to outliers or coding errors (for instance, if one subject has a "first day" 200 days before Study Day 0, then this History table will include days -200 to -1 for ALL subjects, regardless of any recorded contact in this period).
If this behavior is not desirable, you must specify a study range. For
example, you may have a short recruitment period followed by a 6-month
clinical trial. In this instance, you may want to ignore any data more
than 30 days prior to Study Day 0 for each subject or more than a few
weeks after the end of the trial. Thus, you would set
firstDay_int = -30L
and lastDay_int = 6 * 30 + 14
.
A tibble with columns who
and when
. Each subject will
have one row for each day in the study range.
data_ls <- loadRawData(c("tlfb", "all_drugs", "everybody")) CreateHistory(rawData_ls = data_ls, firstDay_int = -30)
data_ls <- loadRawData(c("tlfb", "all_drugs", "everybody")) CreateHistory(rawData_ls = data_ls, firstDay_int = -30)
Create a Subject History Table by Protocol
CreateProtocolHistory(start_vec, end_vec, persons_df = "everybody")
CreateProtocolHistory(start_vec, end_vec, persons_df = "everybody")
start_vec |
a named integer vector with the number of days before subject consent when the subject history should start, per protocol |
end_vec |
a named integer vector with the length of the study phase of interest, per protocol |
persons_df |
Either the name of the data frame that contains all the
subject IDs and their clinical trials (which defaults to
|
We may want to perform SQL-like operations on a set of tables. This
data table will form the "backbone" for future join operations. It creates
one record per person in each study for each day in those studies (when
persons_df
is set to "everybody"
), or it creates one record
per person contained in the table persons_df
for each day in those
studies. The default behavior is to use the supplied "everybody"
table for all consenting subjects in the CTN-0027, CTN-0030, and CTN-0051
clinical trials. However, users may only care about a smaller subset of
these patients, so a subset of the "everybody"
data frame can be
supplied to the persons_df
argument if desired.
NOTE: this function is only appropriate for trial with fixed start and
end days (such as CTN-0027 or CTN-0051). For studies with variable-length
(i.e., subject-specific) protocol lengths, please use
CreateSubjectProtocolHistory()
instead.
A tibble with columns who
, project
, and when
.
Each subject will have one row for each day in the study range.
start_int <- c(`27` = -30L, `51` = -30L) end_int <- c(`27` = 168L, `51` = 168L) CreateProtocolHistory( start_vec = start_int, end_vec = end_int )
start_int <- c(`27` = -30L, `51` = -30L) end_int <- c(`27` = 168L, `51` = 168L) CreateProtocolHistory( start_vec = start_int, end_vec = end_int )
This data set measures the number of days from a participant's randomization until they received their first dose of study drug.
data(derived_inductDelay)
data(derived_inductDelay)
A tibble with 2,492 rows and 3 variables:
Patient ID
What treatment is prescribed: "Inpatient BUP", "Inpatient NR-NTX", "Methadone", "Outpatient BUP", "Outpatient BUP + EMM", "Outpatient BUP + SMM"
How many days after being assigned to a treatment arm did the participant receive their first dose of study drug? Missing values indicate that the subject never received their first dose.
This data set is a derived data set. The inputs are the treatment
and randomization data sets. The code to calculate the induction delay is
given in scripts/create_inductDelay_20210729.R
. The treatment arm is
also included in this data set because the induction delay depends on the
type of treatment. For example, inpatient treatment arms may have
different protocols than outpatient treatment arms for determining how
many days the subject must wait after randomization before treatment.
Summarize the patients' self-reported race and ethnicity into four groups: "Non-Hispanic White", "Non-Hispanic Black", "Hispanic", and "Other".
data(derived_raceEthnicity)
data(derived_raceEthnicity)
A tibble with 3,560 rows and columns:
Patient ID
Self-reported race. Options are "White", "Black", "Other", and "Refused/missing".
Self-reported Hispanic/Latino ethnicity.
Derived composite marker of race and ethnicity.
This data set contains a summary of self-reported race and ethnicity
in four levels. Of note, the "Other" category includes 2 participants who
marked "no" to the question of Hispanic/Latino ethnicity but refused to
answer their race. Also, the "Other" category includes 33 participants for
whom all information about race and ethnicity is missing. This data set is
a derived data set; the script used to create it is
"scripts/create_raceEthnicity_20220816.R"
.
Given a series of weekly clinic visits described per protocol, this data marks subjects as present or missing.
data(derived_visitImputed)
data(derived_visitImputed)
A tibble with 87,891 rows and columns:
Patient ID
Study day
Marked as "Present"
if the subject visited the clinic on that day,
or "Missing"
if the subject did not visit the clinic on a day we
would have expected them to (based on regular weekly visits).
This contains planned visits. Not all appointments were kept. We
indicate if an appointment was kept on a certain day by marking the
subject as "Present"
on that day. If the subject goes more than 7 days
without a clinic visit, we mark the subject as "Missing"
on days that
are multiples of 7 from the randomization day. For subjects without a
randomization day, weekly visits after day of consent are marked as
"Missing"
instead. This data set is a derived data set; the script used
to create it is "scripts/create_visitImputed_20210909.R"
.
NOTE: because our window is a strict weekly window, a subject who shows up for their clinic visit one or more days late will still be marked as missing on the day they were supposed to appear. This means that some subjects will be marked as having missed their weekly clinic visit on one day, but be present in the clinic the next.
Show the pattern of positive, negative, and missing urine drug screen (UDS) results for opioids by patient over the study protocol. Study "Week 1" starts the day after randomization (for patients who were randomized) or the day after signed consent (for patients who were not randomized).
data(derived_weeklyOpioidPattern)
data(derived_weeklyOpioidPattern)
A tibble with 3,560 rows and columns:
Patient ID
The start of the "word" is how many weeks before randomization? This should be -4 for most people, but can be as high as -8. Note that week 0 is included, so a value of -4 represents data in the 5th week before randomization; that is, 29-35 days prior to randomization. Most subjects have timeline follow-back data 30 days before consent, and delays between consent and randomization are common.
Week of first randomization (1, if randomized; NA if not)
Week of second randomization (only for CTN-0030)
The end of the "word" is how many weeks after randomization? This depends on the study protocol, but should be close to 16 or 24 weeks for most subjects.
A character string of symbols from startWeek
to
the last week before randWeek1
. Symbols are as defined in
Phase_1
.
A character string of symbols from randWeek1
to endWeek
(for subjects from CTN-0027 and CTN-0051) or the
last week before randWeek2
(for CTN-0030). These symbols are:
"+"
= the subject's UDS showed presence of an opioid in that
week; "-"
= the subject's UDS showed absence of an opioid in
that week; "*"
= two or more UDS in the same week, where >= 1
UDS was positive and >= 1 UDS was negative; "o"
if the
subject was supposed to visit a clinic per the study protocol but did
not visit or did not complete a UDS screen; and "_"
to
represent weeks wherein the subject was not scheduled to provide UDS.
A character string of symbols from randWeek2
to endWeek
for subjects from CTN-0030 only. Symbols are as
defined in Phase_1
.
This data set contains a "word" describing weekly non-study opioid
use patterns as measured by UDS. Based on the substances screened in this
data set, our list of substances classified as an opioid is: Oxymorphone,
Opium, Fentanyl, Hydromorphone, Codeine, Suboxone, Tramadol, Morphine,
Buprenorphine, Hydrocodone, Opioid, Methadone, Oxycodone, and Heroin. UDS
results indicating the presence of one or more of these substances will be
marked with "+"
for that week. UDS results negative for these
substances will be marked with "-"
. If, by study protocol, the
subject was supposed to visit a clinic to complete a UDS but they did not,
then the visit for that week will be marked with "o"
. If, by study
protocol, the subject was NOT supposed to visit a clinic to complete a UDS
for a given week, then visit for that week will be marked with "_"
.
This data set is a derived data set; the script used to create it is
"scripts/create_weeklyOpioidPattern_20211123.R"
.
NOTE: because our window is a strict weekly window, a subject could have
both positive and negative UDS within the same 7-day period. In this case,
the week is marked as "*"
. Depending on the definition of treatment
failure or treatment success desired, these dual-status indicators can be
re-coded to "+"
or "-"
as appropriate. Also, some studies
include a baseline UDS in the week of consent (before the subject was
randomized to a treatment arm). Some subjects were randomized in the same
week as the week wherein consent was signed, while other subjects were
randomized weeks later. We represent the weeks before consent and the
variable number of weeks between consent and randomization with "_"
if there were no baseline UDS visits in those weeks (this represents the
pre-study period). For subjects who were never randomized, the weeks
before consent are also marked as pre-study ("_"
), and all
subsequent protocol weeks are marked as missing.
Show the pattern of positive and negative patient self-report (timeline follow-back, TLFB) results for opioids by patient over the study protocol. Study "Week 1" starts the day after randomization (for patients who were randomized) or the day after signed consent (for patients who were not randomized).
data(derived_weeklyTLFBPattern)
data(derived_weeklyTLFBPattern)
A tibble with 3,560 rows and columns:
Patient ID
The start of the "word" is how many weeks before randomization? This should be -4 for most people, but can be as high as -8. Note that week 0 is included, so a value of -4 represents data in the 5th week before randomization; that is, 29-35 days prior to randomization. Most subjects have timeline follow-back data 30 days before consent, and delays between consent and randomization are common.
Week of first randomization (1, if randomized; NA if not)
Week of second randomization (only for CTN-0030)
The end of the "word" is how many weeks after randomization? This depends on the study protocol, but should be close to 16 or 24 weeks for most subjects.
A character string of symbols from startWeek
to
the last week before randWeek1
. Symbols are as defined in
Phase_1
.
A character string of symbols from randWeek1
to endWeek
(for subjects from CTN-0027 and CTN-0051) or the
last week before randWeek2
(for CTN-0030). These symbols are:
"+"
= the subject reported the use of an opioid for two or more
days in that week; "-"
= the subject did not report use of an
opioid in that week; "*"
= the subject reported one day of
opioid use in that week; "o"
if the subject was supposed to
report TLFB but did not; and "_"
to represent weeks wherein the
subject was not scheduled to provide TLFB (for example, more than 30
days before randomization).
A character string of symbols from randWeek2
to endWeek
for subjects from CTN-0030 only. Symbols are as
defined in Phase_1
.
This data set contains a "word" describing weekly non-study opioid
use patterns as reported by the subject in TLFB. Based on the substances
reported by the subjects, our list of substances classified as an opioid
is: non-study Buprenorphine, non-study Methadone, heroin, and "opioids"
(which includes Oxymorphone, Opium, Fentanyl, Hydromorphone, Codeine,
Suboxone, Tramadol, Morphine, Hydrocodone, and Oxycodone). TLFB reporting
indicating the presence of two or more use days of these substances will
be marked with "+"
for that week. TLFB results positive for these
substances for 0 days will be marked with "-"
. TLFB results
positive for these substances for 1 day in the week will be marked with
"*"
for that week. This data set is a derived data set; the script
used to create it is
"scripts/create_weeklyTLFBPattern_20220511.R"
.
NOTE: some studies collected more TLFB data than others. Also, all times
are marked starting with the week of randomization. We represent the weeks
before randomization with "_"
if no TLFB data was collected. For
subjects who were never randomized, all subsequent protocol weeks are
marked as missing ("o"
).
Load Data Sets into a List
loadRawData(dataNames_char)
loadRawData(dataNames_char)
dataNames_char |
Names of data sets to load |
We may want to perform SQL-like operations on a set of tables without loading each table into R's Global Environment separately. This function loads these data sets into a self-destructing environment and then returns a named list of these data sets.
Loads data sets specified into the current function environment for further evaluation (unused) and then returns these data sets as a named list
loadRawData(c("tlfb", "all_drugs"))
loadRawData(c("tlfb", "all_drugs"))
Given a complete timeline of potential subject visits per study protocol, mark certain visits as "Missing"
MarkMissing(timeline_df, windowWidth = 7, daysGrace = 0)
MarkMissing(timeline_df, windowWidth = 7, daysGrace = 0)
timeline_df |
A data frame with columns |
windowWidth |
How many days are expected between clinic visits? Defaults to 7, representing weekly clinic visits. |
daysGrace |
How many days late are subjects allowed to be for their weekly visit. Defaults to 0. Under this default behavior with weekly visits, a subject who visits the clinic on days 8 and 14 instead of days 7 and 14 will have a missing visit imputed for day 7. |
Most definitions of opioid use disorder treatment success or failure partially depend on a tally of the number of missed clinic visits. For example, a definition of early treatment failure could be "3 or more UDS positive for non-study opioids or missing visits within the first 28 days of randomization". Given a table of subject visits by day over the entire protocol timeline, this function will estimate when each subject missed a clinic visit (unfortunately, missed visits can often be improperly recorded in the patient logs; if such information is complete, using this function is unnecessary).
This estimation is conducted as follows: (1) first, for each subject, a
regular grid of days is spread from the randomization day to the end of
treatment by windowWidth
; (2) next, we iterate over each day in
this regular grid, and at each step we check the next windowWidth
plus daysGrace
days for a visit in that range, and we mark the day
at the end of the window as "missing" if there are no visits in that
range; (3) and finally, we combine these subject-specific data tables.
A copy of timeline_df
with the column visitYM
added.
This column is a copy of the visit
column with additional cells
marking if a subject should have attended the clinic but did not.
# TO DO
# TO DO
Mark Use Day by Subject
MarkUse( targetDrugs_char, drugs_df = NULL, reportSource = c("TFB", "UDSAB", "UDS"), retainEmptyRows = FALSE )
MarkUse( targetDrugs_char, drugs_df = NULL, reportSource = c("TFB", "UDSAB", "UDS"), retainEmptyRows = FALSE )
targetDrugs_char |
A character vector including which drugs should be counted against the subject |
drugs_df |
A data frame with columns |
reportSource |
A character vector matching the source of the reported
drug use. The options must be from Timeline Followback ( |
retainEmptyRows |
A logical flag to force rows for participants who did
not have UDS positive for the substances listed in |
This function is basically just a fancy wrapper around some dplyr code. We just don't want the user to have to 1) know dplyr, or 2) write the code themselves.
A modification of the drugs_df
data set: the columns are
"who"
, "when"
, and "source"
; each row corresponds
to one use day per subject per use source (if, for instance, there is drug
use for a particular day recorded in both TFB and UDS, then that day will
have two rows in the resulting data set).
MarkUse(c("Crack", "Pcp", "Opioid"))
MarkUse(c("Crack", "Pcp", "Opioid"))