Title: | Semiparametric Estimating Functions |
---|---|
Description: | Functions for fitting semiparametric regression models for panel count survival data. An overview of the package can be found in Wang and Yan (2011) <doi:10.1016/j.cmpb.2010.10.005> and Chiou et al. (2018) <doi:10.1111/insr.12271>. |
Authors: | Sy Han (Steven) Chiou [aut, cre], Xiaojing Wang [aut], Jun Yan [aut] |
Maintainer: | Sy Han (Steven) Chiou <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.0.9 |
Built: | 2025-01-24 05:12:19 UTC |
Source: | https://github.com/stc04003/spef |
spef
is an R package that consists of a collection of functions for fitting
semiparametric regression models for panel count survival data.
Estimating procedures include: Wang-Yan’s augmented estimating equations (AEE, AEEX),
Huang-Wang-Zhang’s method (HWZ), Zhang’s maximum pseudolikelihood (MPL),
Maximum pseudolikelihood with I-Splines (MPLs), Maximum likelihood with I-Splines (MLs),
Sun-Wei’s method (‘EE.SWa', 'EE.SWb', 'EE.SWc'), Hu-Sun-Wei’s method ('EE.HSWc', 'EE.HSWm'),
and accelerated mean model ('AMM').
Maintainer: Sy Han (Steven) Chiou [email protected]
Authors:
Xiaojing Wang
Jun Yan
Chiou, S., Xu, G., Yan, J., and Huang, C.-Y. (2017). Semiparametric estimation of the accelerated mean model with panel count data under informative examination times. Biometrics, to appear. <doi: 10.1111/biom.12840>.
Huang, C.-Y., Wang, M., and Zhang, Y. (2006). Analysing panel count data with informative observation times. Biometrika, 93(4), 763–776.
Hu, X. J., Sun, J. and Wei, L. J. (2003). Regression parameter estimation from panel counts. Scandinavian Journal of Statistics, 30, 25–43.
Lu, M., Zhang, Y., and Huang, J. (2007). Estimation of the mean function with panel count data using monotone polynomial splines. Biometrika, 94(3), 705–718.
Sun, J. and Wei, L. J. (2000). Regression analysis of panel count data with covariates-dependent observation and censoring times. Journal of the Royal Statistical Society, Series B: Statistical Methodology, 62(2), 293–302.
Wang, X. and Yan, J. (2011). Fitting semiparametric regressions for panel count survival data with an R package spef. Computer methods and programs in biomedicine 104(2), 278–285.
Wang, X. and Yan, J. (2013). Augmented estimating equations for semiparametric panel count regression with informative observation times and censoring time. Statistica Sinica, 23(1), 359–381.
Zhang, Y. (2002). A Semiparametric pseudolikelihood estimation method for panel count data. Biometrika, 89(1), 39–48.
Useful links:
A data frame contains data on recurrences of bladder cancer, used by many people to demonstrate methodology for recurrent event modeling. The data was obtained by courtesy of Ying Zhang, containing records of 118 patients from three treatment arms: 48 are from the placebo arm, 37 are from the thiotepa arm, and the rest 33 are from the pyridoxine arm.
data(bladTumor)
data(bladTumor)
A data.frame
contains the following columns:
subject
patient id
time
observation time
count
cumulative number of tumors
count2
number of new tumors since last observation time
number
initial number of tumors (8=8 or more)
size
size (cm) of largest initial tumor
pyridoxine
dummy variable for pyridoxine arm
thiotepa
dummy variable for thiotepa arm
To our surprise, the two-treatment (placebo and thiotepa) subset of
the full version bladTumor
do not match the two-treatment
version blaTum
.
Byar, D. P. (1980). The Veterans administration study of chemoprophylaxis for recurrent stage I bladder tumors: Comparisons of placebo, pyridoxine, and topical thiotepa. Bladder Tumors and Other Topics in Urological Oncology, pp. 363–370. New York: Plenum.
Wellner, J. A. and Zhang, Y. (2007) Two likelihood-based semiparametric estimation methods for panel count data with covariates. Annals of Statistics, 35(5), 2106–2142.
Lu, M., Zhang, Y. and Huang, J. (2009) Semiparametric estimation methods for panel count data using monotone B-Splines. Journal of the American Statistical Association 104(487), 1060–1070.
data(bladTumor) ## Plot bladder tumor data p <- plot(with(bladTumor, PanelSurv(subject, time, count2))) print(p)
data(bladTumor) ## Plot bladder tumor data p <- plot(with(bladTumor, PanelSurv(subject, time, count2))) print(p)
A data frame contains data on recurrences of bladder cancer,
used by many people to demonstrate methodology for recurrent event modelling.
This data set organized differently from bladTumor
.
The data contains records of 85 patients from two treatment arms:
48 are from the placebo arm, and the rest 37 are from the thiotepa arm.
data(blaTum)
data(blaTum)
A data.frame
contains the following columns:
id
patient id
treatment
placebo = 0, thiotepa = 1
size
size (cm) of largest initial tumor
num
initial number of tumors (8 = 8 or more)
time
observation time
count
number of new tumors since last observation time
To our surprise, the two-treatment (placebo and thiotepa) subset of
the full version bladTumor
do not match the two-treatment version blaTum
.
Byar, D. P. (1980). The Veterans administration study of chemoprophylaxis for recurrent stage I bladder tumors: comparisons of placebo, pyridoxine, and topical thiotepa. Bladder Tumors and Other Topics in Urological Oncology, pp. 363–370. New York: Plenum.
Sun, J. and Wei, L. J. (2000) Regression analysis of panel count data with covariate dependent observation and censoring times. Journal of the Royal Statistical Society, Series B: Statistical Methodology, 62(2), 293–302.
Huang, C. Y., Wang, M. C. and Zhang, Y. (2006). Analyzing panel count data with informative observation times. Biometrika, 93(4): 763–776.
data(blaTum) library(ggplot2) ggplot(blaTum, aes(time, id)) + geom_tile(aes(fill=count)) + facet_grid(treatment ~ ., scales="free_y", )
data(blaTum) library(ggplot2) ggplot(blaTum, aes(time, id)) + geom_tile(aes(fill=count)) + facet_grid(treatment ~ ., scales="free_y", )
Fits an proportional means model:
where is a
vector of covariate
coefficient and
is a completely unspecified
baseline mean function.
Estimating procedures include: Wang-Yan's augmented estimating
equations ("AEE"
, "AEEX"
),
Huang-Wang-Zhang's method ("HWZ"
),
Zhang's maximum pseudo-likelihood ("MPL"
),
Maximum pseudolikelihood with I-Splines ("MPLs"
),
Maximum likelihood with I-Splines ("MLs"
),
Sun-Wei's method ("EE.SWa"
, "EE.SWb"
, "EE.SWc"
),
and Hu-Sun-Wei's method ("EE.HSWc"
, "EE.HSWm"
).
The function can also fits an accelerated mean model ("AMM"
):
panelReg(formula, data, method = c("AEE", "AEEX", "HWZ", "MPL", "MPLs", "MLs", "AMM", "EE.SWa", "EE.SWb", "EE.SWc", "EE.HSWc", "EE.HSWm"), se = c("NULL", "smBootstrap", "Bootstrap", "Impute", "Sandwich"), control = list())
panelReg(formula, data, method = c("AEE", "AEEX", "HWZ", "MPL", "MPLs", "MLs", "AMM", "EE.SWa", "EE.SWb", "EE.SWc", "EE.HSWc", "EE.HSWm"), se = c("NULL", "smBootstrap", "Bootstrap", "Impute", "Sandwich"), control = list())
formula |
A formula object, with the response on the left of a
"~" operator, and the terms on the right. The response must be a
panel count survival object as returned by function |
data |
A data.frame in which to interpret the variables named in
the |
method |
Fitting method. See ‘Details’. |
se |
Standard error estimation method. See ‘Details’. |
control |
A list of control parameters. See ‘Details’. |
Some assumptions details about the observation times and censoring time need clarification. Three possible scenarios of observation times are considered: 1) independent observation times – the observation times are independent of the underlying recurrent event process; 2) conditional independent observation times – the observation times are independent of the event process given observed covariates; 3) informative observation times – after conditioning on observed covariates, the observation times and the event process are still dependent through an unobserved multiplicative frailty. Similarly, the three scenarios apply to the censoring time.
"AEE"
and "AEEX"
are the augmented estimating
equation methods under conditional independent censoring and
informative censoring respectively. Both allow informative observation times.
"HWZ"
is Huang-Wang-Zhang's method. It allows both
information observation times and informative censoring time. It does
not need to specify the dependence structure or model the frailty.
"MPL"
and "MPLs"
are maximum pseudolikelihood
methods, with nonparametric and monotone spline estimates of the baseline
mean function respectively. They assume conditional independent
observation times and censoring time. The underlying event process
is assumed to be Poisson, and the within subjects dependence is ignored.
"MLs"
is maximum likelihood method with monotone splines
estimates of the baseline mean function. It assumes conditional
independent observation times and censoring time, and a Poisson
underlying event process.
"EE.SWa"
, "EE.SWb"
and "EE.SWc"
are estimating
equation approaches based on Sun-Wei's methods.
The first assumes independent observation times and censoring
time. The second assumes conditional independent observation times but
independent censoring time. The third assumes conditional independent
observation times and censoring time. All three variations work on
centered covariates and avoid estimating the baseline mean.
"EE.HSWc"
and "EE.HSWm"
are estimating
equation approaches based on Hu-Sun-Wei's methods.
The first assumes independent observation times and censoring
time. The second assumes conditional independent observation times but
independent censoring time. Both variations work on centered covariates
and avoid estimating the baseline mean.
"AMM"
is the accelerated mean model.
The observation time process is allowed to be correlated with
the underlying recurrent event process through a frailty variable,
which does not need to be specified. The model also allows marginal
interpretations for the regression parameters and connects naturally
with AFT model. See Chiou et al. (2017) for more details.
For standard errors estimation method:
"NULL"
means do not calculate standard errors;
"smBootstrap"
is the smoothed bootstrap estimation method
that works with "AMM"
.
"Bootstrap"
works with all fitting methods; "Impute"
is
the multiple imputation method that works with "AEE"
and
"AEEX"
; "Sandwich"
is the robust sandwich estimation
method that works with "AEE"
and "AEEX"
.
The control
argument is a list that can supply any of the
following components:
betaInit
:Object of class "numeric"
,
initial value for covariate coefficient, default 0
.
interval
:Object of class "numeric"
,
initial search interval for solving beta
, default
(-5, 5)
.
maxIter
:Object of class "numeric"
, maximum
iteration allowed, default 500
for "AEEX"
and
"HWZ"
, 150
for others.
absTol
:Object of class "numeric"
, absolute
tolerance, default 1e-6
.
relTol
:Object of class "numeric"
, relative
tolerance, default 1e-6
.
a
:Object of class "numeric"
, a tune parameter,
default 0.1
. In the case of gamma frailty, "a"
corresponds to the value of both shape and rate parameters.
R
:Object of class "numeric"
, number of
bootstrap or imputation, default 30.
An object of S3 class "panelReg"
representing the fit.
See panelReg.object
for details.
Chiou, S., Xu, G., Yan, J., and Huang, C.-Y. (2017). Semiparametric estimation of the accelerated mean model with panel count data under informative examination times. Biometrics, to appear. <doi: 10.1111/biom.12840>.
Huang, C.-Y., Wang, M., and Zhang, Y. (2006). Analysing panel count data with informative observation times. Biometrika, 93(4), 763–776.
Hu, X. J., Sun, J. and Wei, L. J. (2003). Regression parameter estimation from panel counts. Scandinavian Journal of Statistics, 30, 25–43.
Lu, M., Zhang, Y., and Huang, J. (2007). Estimation of the mean function with panel count data using monotone polynomial splines. Biometrika, 94(3), 705–718.
Sun, J. and Wei, L. J. (2000). Regression analysis of panel count data with covariates-dependent observation and censoring times. Journal of the Royal Statistical Society, Series B: Statistical Methodology, 62(2), 293–302.
Wang, X. and Yan, J. (2011). Fitting semiparametric regressions for panel count survival data with an R package spef. Computer Methods and Programs in Biomedicine, 104(2), 278–285.
Wang, X. and Yan, J. (2013). Augmented estimating equations for semiparametric panel count regression with informative observation times and censoring time. Statistica Sinica, 359–381.
Zhang, Y. (2002). A Semiparametric pseudolikelihood estimation method for panel count data. Biometrika, 89(1), 39–48.
## Not run: data(blaTum) ## Fit the bladder tumor data set formula <- PanelSurv(id, time, count) ~ num + size + treatment panelReg(formula, data = blaTum, method = "AEE", se = "Sandwich") panelReg(formula, data = blaTum, method = "AEEX", se = "Impute", control = list(a = 0.1, R = 30)) panelReg(formula, data = blaTum, method = "HWZ", se = "Bootstrap", control = list(R = 30)) panelReg(formula, data = blaTum, method = "MLs", se = "NULL") panelReg(formula, data = blaTum, method = "EE.SWa", se = "Bootstrap", control = list(R = 30)) panelReg(formula, data = blaTum, method = "EE.HSWc", se = "Bootstrap", control = list(R = 30)) ## End(Not run)
## Not run: data(blaTum) ## Fit the bladder tumor data set formula <- PanelSurv(id, time, count) ~ num + size + treatment panelReg(formula, data = blaTum, method = "AEE", se = "Sandwich") panelReg(formula, data = blaTum, method = "AEEX", se = "Impute", control = list(a = 0.1, R = 30)) panelReg(formula, data = blaTum, method = "HWZ", se = "Bootstrap", control = list(R = 30)) panelReg(formula, data = blaTum, method = "MLs", se = "NULL") panelReg(formula, data = blaTum, method = "EE.SWa", se = "Bootstrap", control = list(R = 30)) panelReg(formula, data = blaTum, method = "EE.HSWc", se = "Bootstrap", control = list(R = 30)) ## End(Not run)
This S3 class of objects is returned by the panelReg
class of
functions to represent a fitted semiparametric panel count regression
model. Objects of this class have methods for the functions
print
and plot
.
a vector, estimated coefficients of the linear predictor.
a step function or spline function, estimated cumulative baseline mean.
a vector, ordered unique set of observation times.
a vector, estimated baseline mean for each interval.
an integer code, 0
indicates successful
convergence, error code 1
indicates that the iteration limit
maxIter
had been reached.
an integer value, number of interactions when stopped.
a vector, estimated standard errors of beta.
a matrix, estimated covariance matrix of beta.
a vector, estimated standard error of cumulative baseline mean.
Create a panel count survival object, usually used as a response variable in a model formula.
PanelSurv(ID, time, count) is.PanelSurv(x)
PanelSurv(ID, time, count) is.PanelSurv(x)
ID |
Observation subject's ID. |
time |
Observation time. |
count |
Observation subject's ID. |
x |
An |
An object of S3 class "PanelSurv"
.
a data frame, part of original input data frame with variable "ID", "time" and "count".
ordered distinct observation times in the set of all observation times.
a matrix representation of panel count data, one
row per subject, one column per time point in "timeGrid"
.
In the case of is.PanelSurv
, a logical value TRUE
if
x
inherits from class "PanelSurv"
, otherwise an FALSE
.
In the case of plot.PanelSurv
, a tile plot of
panelMatrix
produced by package ggplot2
with color
indicating number of counts since last observation time.
data(blaTum) response <- with(blaTum, PanelSurv(id, time, count)) is.PanelSurv(response) plot(response)
data(blaTum) response <- with(blaTum, PanelSurv(id, time, count)) is.PanelSurv(response) plot(response)
panelReg
Object.Plot the estimated baseline mean function. If "se"
option of
panelReg
is not "NULL"
, 95
interval is also plotted.
## S3 method for class 'panelReg' plot(x, ...)
## S3 method for class 'panelReg' plot(x, ...)
x |
The result of a call to the |
... |
Other graphical parameters such as line type, color, or axis labels. |
data(blaTum) ## Plot the fit of bladder tumor data set fm <- PanelSurv(id, time, count) ~ num + size + treatment fit1 <- panelReg(fm, data=blaTum, method = "AEE", se = "Sandwich") plot(fit1) fit2 <- panelReg(fm, data=blaTum, method = "MLs", se = "NULL") plot(fit2)
data(blaTum) ## Plot the fit of bladder tumor data set fm <- PanelSurv(id, time, count) ~ num + size + treatment fit1 <- panelReg(fm, data=blaTum, method = "AEE", se = "Sandwich") plot(fit1) fit2 <- panelReg(fm, data=blaTum, method = "MLs", se = "NULL") plot(fit2)
Plot the tile plot from a PanelSurv
object.
## S3 method for class 'PanelSurv' plot(x, heat = FALSE, order = TRUE, ...)
## S3 method for class 'PanelSurv' plot(x, heat = FALSE, order = TRUE, ...)
x |
an object of class |
heat |
an optional logical value indicating whether a swimmer-plot-like tile plot will be produced. |
order |
an optional logical value indicating whether the tile plot will be sorted by the largest observationt time. |
... |
future extension |
A ggplot
object
A data frame contains data on the recurrence of skin tumor. The original data is available in Table A.3. of Sun and Zhao (2013). This dataset contains records of 290 patients.
data(skinTumor)
data(skinTumor)
This data frame contains the following columns:
id
: patient id (repeated for each recurrence).
time
: observation time.
age
: patient's age at enrollment.
male
: gender; male = 1, female = 0.
dfmo
: treatment (DFMO) group = 1; placebo = 0.
priorTumor
: number of prior tumor from diagnosis to randomization.
countBC
: number of newly developed basal cell carcinomas tumors since last observation time.
countSC
: number of newly developed squamous cell carcinomas tumors since last observation time.
count
: number of newly developed non-melanoma tumors
since last observation time; this is equal to countBC + countSC
.
Chiou, S., Xu, G., Yan, J., and Huang, C.-Y. (2017). Semiparametric estimation of the accelerated mean model with panel count data under informative examination times. Biometrics, to appear. <doi: 10.1111/biom.12840>.
Sun, J. and Zhao, X. (2013). Statistical Analysis of Panel Count Data. New York: Springer.
skiTum
data(skinTumor) library(ggplot2) ggplot(skinTumor, aes(time, id, width = 25, height = 2)) + geom_tile(aes(fill = count)) + theme_bw() + facet_grid(dfmo ~ ., scales = "free_y", as.table = FALSE, labeller = labeller(dfmo = function(x) paste("DFMO =", x))) + scale_fill_gradient(low = "grey", high = "black") + scale_x_continuous(breaks = seq(0, 2000, 200)) + labs(fill = "Count") + xlab("Time in days")
data(skinTumor) library(ggplot2) ggplot(skinTumor, aes(time, id, width = 25, height = 2)) + geom_tile(aes(fill = count)) + theme_bw() + facet_grid(dfmo ~ ., scales = "free_y", as.table = FALSE, labeller = labeller(dfmo = function(x) paste("DFMO =", x))) + scale_fill_gradient(low = "grey", high = "black") + scale_x_continuous(breaks = seq(0, 2000, 200)) + labs(fill = "Count") + xlab("Time in days")
A data frame contains data on the recurrence of skin tumor. This simulated data is created for illustrative purpose and it mimic the Skin Cancer Chemoprevention Trial used in Chiou et al. (2017). This data set contains records of 100 simulated patients.
data(skiTum)
data(skiTum)
This data frame contains the following columns:
id
: patient id (repeated for each recurrence).
time
: observation time.
age
: patient's age at enrollment; age = 1 if greater than 65, age = 0 otherwise.
male
: gender; male = 1, female = 0.
dfmo
: treatment (DFMO) group = 1; placebo = 0.
priorTumor
: number of prior tumor from diagnosis to randomization.
count
: number of new tumors since last observation time.
skinTumor
data(skiTum) library(ggplot2) skiTum$dfmo <- factor(skiTum$dfmo, levels = c(1, 0), labels = c("placebo", "DFMO")) ggplot(skiTum, aes(time, id, height = 2, width = 25)) + geom_tile(aes(fill = count)) + theme_bw() + facet_grid(dfmo ~ ., scales = "free_y") + scale_fill_gradient(low = "grey", high = "black") + labs(fill="Count") + scale_x_continuous(breaks = seq(0, 1800, 100)) + xlab("Time in days")
data(skiTum) library(ggplot2) skiTum$dfmo <- factor(skiTum$dfmo, levels = c(1, 0), labels = c("placebo", "DFMO")) ggplot(skiTum, aes(time, id, height = 2, width = 25)) + geom_tile(aes(fill = count)) + theme_bw() + facet_grid(dfmo ~ ., scales = "free_y") + scale_fill_gradient(low = "grey", high = "black") + labs(fill="Count") + scale_x_continuous(breaks = seq(0, 1800, 100)) + xlab("Time in days")