Title: | Nonparametric Predictive Model for Sparse and Irregular Longitudinal Data |
---|---|
Description: | The proposed method aims at predicting the longitudinal mean response trajectory by a kernel-based estimator. The kernel estimator is constructed by imposing weights based on subject-wise similarity on L2 metric space between predictor trajectories as well as time proximity. Users could also perform variable selections to derive functional predictors with predictive significance by the proposed multiplicative model with multivariate Gaussian kernels. |
Authors: | Shixuan Wang [aut, cre], Seonjin Kim [aut], Hyunkeun Cho [aut], Won Chang [aut] |
Maintainer: | Shixuan Wang <[email protected]> |
License: | GPL-3 |
Version: | 0.1.0 |
Built: | 2024-11-16 05:36:28 UTC |
Source: | https://github.com/cran/longke |
This package contains the function for fitting nonparametric predictive model for sparse and irregular longitudinal data
Wang S, Kim S, Cho H, Chang W. Nonparametric predictive model for sparse and irregular longitudinal data. (2023+)
Function used to simulate sample sparse and irregular longitudinal data
datagen(ntotal,ntest,t_all,t_split,seed)
datagen(ntotal,ntest,t_all,t_split,seed)
ntotal |
Number of total longitudinal subjects |
ntest |
Number of total longitudinal subjects in the testing set |
t_all |
Vector of discrete measurement time (i.e 1,2,3,4,...) |
t_split |
A measurement time where the longitudinal response is of interest to predict after this t_split |
seed |
Seed to derive replicable data |
A list containing two elements
A long format data matrix containing one functional response (yy) and two functional predictors (xx,zz) with (ntotal-ntest) subjects
A long format data matrix containing one functional response (yy) and two functional predictors (xx,zz) with (ntest) subjects
data = datagen(ntotal=350,ntest=50,t_all=1:50,t_split=25,seed=1) data$test data$train
data = datagen(ntotal=350,ntest=50,t_all=1:50,t_split=25,seed=1) data$test data$train
Function used to perform functional principal component analysis (FPCA) for a single functional variable
FPCA_trajectory(data,...)
FPCA_trajectory(data,...)
data |
A long format data matrix containing 3 columns ordered by time, subject ID, variable where the measurement time of the longitudinal data should be discretized |
... |
Arguments to be passed to fdapace::FPCA |
A list containing two elements
fpca_target |
A FPCA object |
target_fit |
A num.t x num.sub matrix containing the imputated longitudinal trajectories where num.t is the total number of the discrete measurement time and num.sub is the total number of subjects |
Carroll, C., Gajardo, A., Chen, Y., Dai, X., Fan, J., Hadjipantelis, P. Z., ... & Wang, J. L. (2020). fdapace: Functional data analysis and empirical dynamics. R package version 0.5, 4.
Yao, F., Müller, H. G., & Wang, J. L. (2005). Functional data analysis for sparse longitudinal data. Journal of the American statistical association, 100(470), 577-590.
t_all = 1:50 data = datagen(ntotal=350,ntest=50,t_all=t_all,t_split=25,seed=1) data.sample = data$test[,c(1,2,3)] # In this case, num.t=50 and num.sub=50 since we only used 50 subjects in the testing data data.FPCA = FPCA_trajectory(data.sample,list(dataType='Sparse', error=FALSE, kernel='gauss', verbose=FALSE, nRegGrid=length(t_all))) data.FPCA$target_fit
t_all = 1:50 data = datagen(ntotal=350,ntest=50,t_all=t_all,t_split=25,seed=1) data.sample = data$test[,c(1,2,3)] # In this case, num.t=50 and num.sub=50 since we only used 50 subjects in the testing data data.FPCA = FPCA_trajectory(data.sample,list(dataType='Sparse', error=FALSE, kernel='gauss', verbose=FALSE, nRegGrid=length(t_all))) data.FPCA$target_fit
Function used to perform leave-one-subject-out cross validation to select optimal time bandwidth (b_s) and trajectory bandwidth (b_w)
KE_bwselection(data,bw_time,bw_subj,T1,T2)
KE_bwselection(data,bw_time,bw_subj,T1,T2)
data |
A long format data matrix containing columns ordered by time, subject ID, response, predictor1, predictor2, ... where the measurement time of the longitudinal data should be discretized |
bw_time |
A numeric vector that contains the candidate time bandwidths |
bw_subj |
A numeric vector that contains the candidate trajectory bandwidths |
T1 |
A measurement time domain where the functional predictors are measured within |
T2 |
A measurement time domain where the functional response is of interest to predict |
A list containing 3 elements
BWSelecStep |
Total SSE for each bandwidth combination |
optimalBW |
A vector containing the optimal time/trajectory bandwidth |
RunningTime |
Running time of the bandwidth selection |
Wang S, Kim S, Cho H, Chang W. Nonparametric predictive model for sparse and irregular longitudinal data. (2023+)
t_all = 1:50 T1=c(1,25);T2=c(26,50) data = datagen(ntotal=350,ntest=50,t_all=t_all,t_split=25,seed=1) data.sample = data$train bwsele.toy = KE_bwselection(data=data.sample, bw_time=c(1,2),bw_subj=c(0.1,0.5),T1=T1,T2=T2) bwsele.toy$optimalBW
t_all = 1:50 T1=c(1,25);T2=c(26,50) data = datagen(ntotal=350,ntest=50,t_all=t_all,t_split=25,seed=1) data.sample = data$train bwsele.toy = KE_bwselection(data=data.sample, bw_time=c(1,2),bw_subj=c(0.1,0.5),T1=T1,T2=T2) bwsele.toy$optimalBW
Function used to predict response trajectory by nonparametric kernel estimator
KE_fit(train,test,T1,T2,bw_time,bw_subj,alpha=0.05,seed=1,coefCI=FALSE)
KE_fit(train,test,T1,T2,bw_time,bw_subj,alpha=0.05,seed=1,coefCI=FALSE)
train |
A long format data matrix containing columns ordered by time, subject ID, response, predictor1, predictor2, ... where the measurement time of the longitudinal data should be discretized within T1. |
test |
A long format data matrix containing columns ordered by time, subject ID, response, predictor1, predictor2, ... where the measurement time of the longitudinal data should be discretized within T2. |
T1 |
A measurement time domain where the functional predictors are measured within |
T2 |
A measurement time domain where the functional response is of interest to predict |
bw_time |
(optimal) time bandwidth |
bw_subj |
(optimal) trajectory/subject bandwidth |
alpha |
confidence level for bootstrap CI of alpha_0, alpha_1, ... |
seed |
A random seed fo producing replicable bootstrap CI of alpha_0, alpha_1, ... |
coefCI |
Logical statement: TRUE to derive bootstrap CI of alpha0, alpha1, ... default is FALSE |
A list containing 6 elements
testTraj |
A num.test x num.T2 matrix containing num.test subjects' trajectories where num.T2 is the total number of the discrete measurement time over T2 |
proxycoeff |
Coefficient estimation for the non-negative least square regression. From left to right they are alpha_0, alpha_1, ... |
fpca.fit |
A list containing FPCA fit for the functional predictors and the functional response |
w.hat |
A list containing num.test elements where ith element contains the proxy distance/similarity between ith testing subject and other training subjects |
bootCI.mean |
Bootstrap confidence interval of alpha_0, alpha_1, ... |
input.list |
A list containing the input arguments |
Wang S, Kim S, Cho H, Chang W. Nonparametric predictive model for sparse and irregular longitudinal data. (2023+)
t_all = 1:50 T1=c(1,25);T2=c(26,50) data = datagen(ntotal=350,ntest=50,t_all=t_all,t_split=25,seed=1) train = data$train test = data$test ke.fit = KE_fit(train=train,test=test,T1=T1,T2=T2,bw_time=1,bw_subj = 0.2)
t_all = 1:50 T1=c(1,25);T2=c(26,50) data = datagen(ntotal=350,ntest=50,t_all=t_all,t_split=25,seed=1) train = data$train test = data$test ke.fit = KE_fit(train=train,test=test,T1=T1,T2=T2,bw_time=1,bw_subj = 0.2)
Function used to derive simultaneous confidence band (SCB) for the predicted response trajectory
KE_trajSCB(KE.fit.object,nboot=500,alpha=0.05)
KE_trajSCB(KE.fit.object,nboot=500,alpha=0.05)
KE.fit.object |
An object whose class is KE (you can get it by letting ke = KE.fit()) |
nboot |
Number of bootstrap sample size to construct SCB |
alpha |
Confidence level for bootstrap SCB of predicted response trajectory |
A list containing num.test elements where the num.test represents the number of testing subjects. Within each element, there is a list containing 3 elements:
se |
A vector containing standard errors at each discrete measurement time |
traj.upper |
A vector containing upper bound of the testing subject at each measurement time |
traj.lower |
A vector containing lower bound of the testing subject at each measurement time |
Wang S, Kim S, Cho H, Chang W. Nonparametric predictive model for sparse and irregular longitudinal data. (2023+)
Kim, S., Ryan Cho, H., & Kim, M. O. (2021). Predictive generalized varying‐coefficient longitudinal model. Statistics in Medicine, 40(28), 6243-6259.
t_all = 1:50 T1=c(1,25);T2=c(26,50) data = datagen(ntotal=350,ntest=50,t_all=t_all,t_split=25,seed=1) train = data$train test = data$test ke.fit = KE_fit(train=train,test=test,T1=T1,T2=T2,bw_time=1,bw_subj = 0.2) ketraj.toy = KE_trajSCB(KE.fit.object = ke.fit, nboot=10,alpha=0.05)
t_all = 1:50 T1=c(1,25);T2=c(26,50) data = datagen(ntotal=350,ntest=50,t_all=t_all,t_split=25,seed=1) train = data$train test = data$test ke.fit = KE_fit(train=train,test=test,T1=T1,T2=T2,bw_time=1,bw_subj = 0.2) ketraj.toy = KE_trajSCB(KE.fit.object = ke.fit, nboot=10,alpha=0.05)