R/msprep.R
msprep.Rd
This function converts a dataset which is in wide format (one subject per line, multiple columns indicating time and status for different states) into a dataset in long format (one line for each transition for which a subject is at risk). Selected covariates are replicated per subjects.
msprep(time, status, data, trans, start, id, keep)
time | Either 1) a matrix or data frame of dimension n x S (n being the
number of individuals and S the number of states in the multi-state model),
containing the times at which the states are visited or last follow-up time,
or 2) a character vector of length S containing the column names indicating
these times. In the latter cases, some elements of |
---|---|
status | Either 1) a matrix or data frame of dimension n x S,
containing, for each of the states, event indicators taking the value 1 if
the state is visited or 0 if it is not (censored), or 2) a character vector
of length S containing the column names indicating these status variables.
In the latter cases, some elements of |
data | Data frame (not a tibble) in wide format in which to interpret
|
trans | Transition matrix describing the states and transitions in the
multi-state model. If S is the number of states in the multi-state model,
|
start | List with elements |
id | Either 1) a vector of length n containing the subject
identifications, or 2) a character string indicating the column name
containing these subject ids. If not provided, |
keep | Either 1) a data frame or matrix with n rows or a numeric or
factor vector of length n containing covariate(s) that need to be retained
in the output dataset, or 2) a character vector containing the column names
of these covariates in |
An object of class "msdata"
, which is a data frame in long
(counting process) format containing the subject id, the covariates
(replicated per subject), and
the starting state
the receiving state
the transition number
the starting time of the transition
the stopping time of the transition
status variable, with 1 indicating an event (transition), 0 a censoring
For msprep
, the transition matrix should correspond to an
irreversible acyclic Markov chain. In particular, on the diagonals only
NA
s are allowed.
The transition matrix, if irreversible and acyclic, will have starting
states, i.e. states into which no transitions are possible. For these
starting states NA
s are allowed in the time
and status
arguments, either as columns, when specified as matrix or data frame, or as
elements of the character vector when specified as character vector.
The function msprep
uses a recursive algorithm through calls to the
recursive function msprepEngine
. First, with the current transition
matrix, all starting states are detected (defined as states into which there
are no transitions). For each of these starting states, all subjects
starting from that state are selected and for each subject the next visited
state is detected by looking at all transitions from that starting state and
determining the smallest transition time with status
=1. The recursive
msprepEngine
is called again with the starting states deleted from
the transition matrix and with subjects deleted that either reached an
absorbing state or that were censored. For the remaining subjects the
starting states and times are updated in the next call. Datasets returned
from the msprepEngine
calls are appended to the current dataset in
long format and finally sorted.
A warning is issued for a subject, if multiple transitions exist with the
same smallest transition time (and status
=0). In such cases the next
transition cannot be determined unambiguously, and the state with the
smallest number is chosen. In our experience, occasionally the shortest
transition time has status
=0, while a higher transition time has
status
=1. Then this larger transition time and the corresponding
transition is selected. No warning is issued for these data inconsistencies.
Putter H, Fiocco M, Geskus RB (2007). Tutorial in biostatistics: Competing risks and multi-state models. Statistics in Medicine 26, 2389--2430.
Hein Putter H.Putter@lumc.nl and Marta Fiocco
# transition matrix for illness-death model tmat <- trans.illdeath() # some data in wide format tg <- data.frame(stt=rep(0,6),sts=rep(0,6), illt=c(1,1,6,6,8,9),ills=c(1,0,1,1,0,1), dt=c(5,1,9,7,8,12),ds=c(1,1,1,1,1,1), x1=c(1,1,1,2,2,2),x2=c(6:1)) tg$x1 <- factor(tg$x1,labels=c("male","female")) tg$patid <- factor(2:7,levels=1:8,labels=as.character(1:8)) # define time, status and covariates also as matrices tt <- matrix(c(rep(NA,6),tg$illt,tg$dt),6,3) st <- matrix(c(rep(NA,6),tg$ills,tg$ds),6,3) keepmat <- data.frame(gender=tg$x1,age=tg$x2) # data in long format using msprep msprep(time=tt,status=st,trans=tmat,keep=as.matrix(keepmat))#> An object of class 'msdata' #> #> Data: #> id from to trans Tstart Tstop time status keep1 keep2 #> 1 1 1 2 1 0 1 1 1 male 6 #> 2 1 1 3 2 0 1 1 0 male 6 #> 3 1 2 3 3 1 5 4 1 male 6 #> 4 2 1 2 1 0 1 1 0 male 5 #> 5 2 1 3 2 0 1 1 1 male 5 #> 6 3 1 2 1 0 6 6 1 male 4 #> 7 3 1 3 2 0 6 6 0 male 4 #> 8 3 2 3 3 6 9 3 1 male 4 #> 9 4 1 2 1 0 6 6 1 female 3 #> 10 4 1 3 2 0 6 6 0 female 3 #> 11 4 2 3 3 6 7 1 1 female 3 #> 12 5 1 2 1 0 8 8 0 female 2 #> 13 5 1 3 2 0 8 8 1 female 2 #> 14 6 1 2 1 0 9 9 1 female 1 #> 15 6 1 3 2 0 9 9 0 female 1 #> 16 6 2 3 3 9 12 3 1 female 1msprep(time=c(NA,"illt","dt"),status=c(NA,"ills","ds"),data=tg, id="patid",keep=c("x1","x2"),trans=tmat)#> An object of class 'msdata' #> #> Data: #> patid from to trans Tstart Tstop time status x1 x2 #> 1 2 1 2 1 0 1 1 1 male 6 #> 2 2 1 3 2 0 1 1 0 male 6 #> 3 2 2 3 3 1 5 4 1 male 6 #> 4 3 1 2 1 0 1 1 0 male 5 #> 5 3 1 3 2 0 1 1 1 male 5 #> 6 4 1 2 1 0 6 6 1 male 4 #> 7 4 1 3 2 0 6 6 0 male 4 #> 8 4 2 3 3 6 9 3 1 male 4 #> 9 5 1 2 1 0 6 6 1 female 3 #> 10 5 1 3 2 0 6 6 0 female 3 #> 11 5 2 3 3 6 7 1 1 female 3 #> 12 6 1 2 1 0 8 8 0 female 2 #> 13 6 1 3 2 0 8 8 1 female 2 #> 14 7 1 2 1 0 9 9 1 female 1 #> 15 7 1 3 2 0 9 9 0 female 1 #> 16 7 2 3 3 9 12 3 1 female 1# Patient no 5, 6 now start in state 2 at time t=4 and t=10 msprep(time=tt,status=st,trans=tmat,keep=keepmat, start=list(state=c(1,1,1,1,2,2),time=c(0,0,0,0,4,10)))#> An object of class 'msdata' #> #> Data: #> id from to trans Tstart Tstop time status gender age #> 1 1 1 2 1 0 1 1 1 male 6 #> 2 1 1 3 2 0 1 1 0 male 6 #> 3 1 2 3 3 1 5 4 1 male 6 #> 4 2 1 2 1 0 1 1 0 male 5 #> 5 2 1 3 2 0 1 1 1 male 5 #> 6 3 1 2 1 0 6 6 1 male 4 #> 7 3 1 3 2 0 6 6 0 male 4 #> 8 3 2 3 3 6 9 3 1 male 4 #> 9 4 1 2 1 0 6 6 1 female 3 #> 10 4 1 3 2 0 6 6 0 female 3 #> 11 4 2 3 3 6 7 1 1 female 3 #> 12 5 2 3 3 4 8 4 1 female 2 #> 13 6 2 3 3 10 12 2 1 female 1