Package 'spef' reference manual

Title:	Semiparametric Estimating Functions
Description:	Functions for fitting semiparametric regression models for panel count survival data. An overview of the package can be found in Wang and Yan (2011) <doi:10.1016/j.cmpb.2010.10.005> and Chiou et al. (2018) <doi:10.1111/insr.12271>.
Authors:	Sy Han (Steven) Chiou [aut, cre], Xiaojing Wang [aut], Jun Yan [aut]
Maintainer:	Sy Han (Steven) Chiou <[email protected]>
License:	GPL (>= 3)
Version:	1.0.9
Built:	2025-02-23 05:02:06 UTC
Source:	https://github.com/stc04003/spef

spef: Semiparametric Estimating Functions

Description

spef is an R package that consists of a collection of functions for fitting semiparametric regression models for panel count survival data. Estimating procedures include: Wang-Yan’s augmented estimating equations (AEE, AEEX), Huang-Wang-Zhang’s method (HWZ), Zhang’s maximum pseudolikelihood (MPL), Maximum pseudolikelihood with I-Splines (MPLs), Maximum likelihood with I-Splines (MLs), Sun-Wei’s method (‘EE.SWa', 'EE.SWb', 'EE.SWc'), Hu-Sun-Wei’s method ('EE.HSWc', 'EE.HSWm'), and accelerated mean model ('AMM').

Author(s)

Maintainer: Sy Han (Steven) Chiou [email protected]

Authors:

Xiaojing Wang
Jun Yan

References

Chiou, S., Xu, G., Yan, J., and Huang, C.-Y. (2017). Semiparametric estimation of the accelerated mean model with panel count data under informative examination times. Biometrics, to appear. <doi: 10.1111/biom.12840>.

Huang, C.-Y., Wang, M., and Zhang, Y. (2006). Analysing panel count data with informative observation times. Biometrika, 93(4), 763–776.

Hu, X. J., Sun, J. and Wei, L. J. (2003). Regression parameter estimation from panel counts. Scandinavian Journal of Statistics, 30, 25–43.

Lu, M., Zhang, Y., and Huang, J. (2007). Estimation of the mean function with panel count data using monotone polynomial splines. Biometrika, 94(3), 705–718.

Sun, J. and Wei, L. J. (2000). Regression analysis of panel count data with covariates-dependent observation and censoring times. Journal of the Royal Statistical Society, Series B: Statistical Methodology, 62(2), 293–302.

Wang, X. and Yan, J. (2011). Fitting semiparametric regressions for panel count survival data with an R package spef. Computer methods and programs in biomedicine 104(2), 278–285.

Wang, X. and Yan, J. (2013). Augmented estimating equations for semiparametric panel count regression with informative observation times and censoring time. Statistica Sinica, 23(1), 359–381.

Zhang, Y. (2002). A Semiparametric pseudolikelihood estimation method for panel count data. Biometrika, 89(1), 39–48.

Bladder Tumors Cancer Recurrences

Description

A data frame contains data on recurrences of bladder cancer, used by many people to demonstrate methodology for recurrent event modeling. The data was obtained by courtesy of Ying Zhang, containing records of 118 patients from three treatment arms: 48 are from the placebo arm, 37 are from the thiotepa arm, and the rest 33 are from the pyridoxine arm.

Usage

data(bladTumor)
data(bladTumor)

Format

A data.frame contains the following columns:

subject: patient id
time: observation time
count: cumulative number of tumors
count2: number of new tumors since last observation time
number: initial number of tumors (8=8 or more)
size: size (cm) of largest initial tumor
pyridoxine: dummy variable for pyridoxine arm
thiotepa: dummy variable for thiotepa arm

Note

To our surprise, the two-treatment (placebo and thiotepa) subset of the full version bladTumor do not match the two-treatment version blaTum.

References

Byar, D. P. (1980). The Veterans administration study of chemoprophylaxis for recurrent stage I bladder tumors: Comparisons of placebo, pyridoxine, and topical thiotepa. Bladder Tumors and Other Topics in Urological Oncology, pp. 363–370. New York: Plenum.

Wellner, J. A. and Zhang, Y. (2007) Two likelihood-based semiparametric estimation methods for panel count data with covariates. Annals of Statistics, 35(5), 2106–2142.

Lu, M., Zhang, Y. and Huang, J. (2009) Semiparametric estimation methods for panel count data using monotone B-Splines. Journal of the American Statistical Association 104(487), 1060–1070.

Examples

data(bladTumor)
## Plot bladder tumor data
p <- plot(with(bladTumor, PanelSurv(subject, time, count2)))
print(p)
data(bladTumor)
## Plot bladder tumor data
p <- plot(with(bladTumor, PanelSurv(subject, time, count2)))
print(p)

Bladder Tumors Cancer Recurrences

Description

A data frame contains data on recurrences of bladder cancer, used by many people to demonstrate methodology for recurrent event modelling. This data set organized differently from bladTumor. The data contains records of 85 patients from two treatment arms: 48 are from the placebo arm, and the rest 37 are from the thiotepa arm.

Usage

data(blaTum)
data(blaTum)

Format

A data.frame contains the following columns:

id: patient id
treatment: placebo = 0, thiotepa = 1
size: size (cm) of largest initial tumor
num: initial number of tumors (8 = 8 or more)
time: observation time
count: number of new tumors since last observation time

Note

To our surprise, the two-treatment (placebo and thiotepa) subset of the full version bladTumor do not match the two-treatment version blaTum.

References

Byar, D. P. (1980). The Veterans administration study of chemoprophylaxis for recurrent stage I bladder tumors: comparisons of placebo, pyridoxine, and topical thiotepa. Bladder Tumors and Other Topics in Urological Oncology, pp. 363–370. New York: Plenum.

Sun, J. and Wei, L. J. (2000) Regression analysis of panel count data with covariate dependent observation and censoring times. Journal of the Royal Statistical Society, Series B: Statistical Methodology, 62(2), 293–302.

Huang, C. Y., Wang, M. C. and Zhang, Y. (2006). Analyzing panel count data with informative observation times. Biometrika, 93(4): 763–776.

Examples

data(blaTum)
library(ggplot2)
ggplot(blaTum, aes(time, id)) + geom_tile(aes(fill=count)) +
    facet_grid(treatment ~ ., scales="free_y", )
data(blaTum)
library(ggplot2)
ggplot(blaTum, aes(time, id)) + geom_tile(aes(fill=count)) +
    facet_grid(treatment ~ ., scales="free_y", )

Fits Semiparametric Regression Models for Panel Count Survival Data

Description

Fits an proportional means model:

$\Lambda (t; X_i) = E[N_i(t)|X_i] = \Lambda(t)e^{X_i ' \beta},$

where $\beta$ is a $p \times 1$ vector of covariate coefficient and $\Lambda(\cdot)$ is a completely unspecified baseline mean function.

Estimating procedures include: Wang-Yan's augmented estimating equations ("AEE", "AEEX"), Huang-Wang-Zhang's method ("HWZ"), Zhang's maximum pseudo-likelihood ("MPL"), Maximum pseudolikelihood with I-Splines ("MPLs"), Maximum likelihood with I-Splines ("MLs"), Sun-Wei's method ("EE.SWa", "EE.SWb", "EE.SWc"), and Hu-Sun-Wei's method ("EE.HSWc", "EE.HSWm").

The function can also fits an accelerated mean model ("AMM"):

$\Lambda (t; X_i) = E[N_i(t)|X_i] = \Lambda(te^{X_i ' \beta})e^{X_i ' \beta}.$

Usage

panelReg(formula, data, method = c("AEE", "AEEX", "HWZ", "MPL", "MPLs",
  "MLs", "AMM", "EE.SWa", "EE.SWb", "EE.SWc", "EE.HSWc", "EE.HSWm"),
  se = c("NULL", "smBootstrap", "Bootstrap", "Impute", "Sandwich"),
  control = list())
panelReg(formula, data, method = c("AEE", "AEEX", "HWZ", "MPL", "MPLs",
  "MLs", "AMM", "EE.SWa", "EE.SWb", "EE.SWc", "EE.HSWc", "EE.HSWm"),
  se = c("NULL", "smBootstrap", "Bootstrap", "Impute", "Sandwich"),
  control = list())

Arguments

`formula`	A formula object, with the response on the left of a "~" operator, and the terms on the right. The response must be a panel count survival object as returned by function `PanelSurv`.
`data`	A data.frame in which to interpret the variables named in the `formula`. Three variables including subjects' id, observation times, and number of new events since last observation time are required to feed into function `PanelSurv` as response. At least one covariate variable is required.
`method`	Fitting method. See ‘Details’.
`se`	Standard error estimation method. See ‘Details’.
`control`	A list of control parameters. See ‘Details’.

Details

Some assumptions details about the observation times and censoring time need clarification. Three possible scenarios of observation times are considered: 1) independent observation times – the observation times are independent of the underlying recurrent event process; 2) conditional independent observation times – the observation times are independent of the event process given observed covariates; 3) informative observation times – after conditioning on observed covariates, the observation times and the event process are still dependent through an unobserved multiplicative frailty. Similarly, the three scenarios apply to the censoring time.

"AEE" and "AEEX" are the augmented estimating equation methods under conditional independent censoring and informative censoring respectively. Both allow informative observation times.

"HWZ" is Huang-Wang-Zhang's method. It allows both information observation times and informative censoring time. It does not need to specify the dependence structure or model the frailty.

"MPL" and "MPLs" are maximum pseudolikelihood methods, with nonparametric and monotone spline estimates of the baseline mean function respectively. They assume conditional independent observation times and censoring time. The underlying event process is assumed to be Poisson, and the within subjects dependence is ignored.

"MLs" is maximum likelihood method with monotone splines estimates of the baseline mean function. It assumes conditional independent observation times and censoring time, and a Poisson underlying event process.

"EE.SWa", "EE.SWb" and "EE.SWc" are estimating equation approaches based on Sun-Wei's methods. The first assumes independent observation times and censoring time. The second assumes conditional independent observation times but independent censoring time. The third assumes conditional independent observation times and censoring time. All three variations work on centered covariates and avoid estimating the baseline mean.

"EE.HSWc" and "EE.HSWm" are estimating equation approaches based on Hu-Sun-Wei's methods. The first assumes independent observation times and censoring time. The second assumes conditional independent observation times but independent censoring time. Both variations work on centered covariates and avoid estimating the baseline mean.

"AMM" is the accelerated mean model. The observation time process is allowed to be correlated with the underlying recurrent event process through a frailty variable, which does not need to be specified. The model also allows marginal interpretations for the regression parameters and connects naturally with AFT model. See Chiou et al. (2017) for more details.

For standard errors estimation method:

"NULL" means do not calculate standard errors; "smBootstrap" is the smoothed bootstrap estimation method that works with "AMM". "Bootstrap" works with all fitting methods; "Impute" is the multiple imputation method that works with "AEE" and "AEEX"; "Sandwich" is the robust sandwich estimation method that works with "AEE" and "AEEX".

The control argument is a list that can supply any of the following components:

betaInit:: Object of class "numeric", initial value for covariate coefficient, default 0.
interval:: Object of class "numeric", initial search interval for solving beta, default (-5, 5).
maxIter:: Object of class "numeric", maximum iteration allowed, default 500 for "AEEX" and "HWZ", 150 for others.
absTol:: Object of class "numeric", absolute tolerance, default 1e-6.
relTol:: Object of class "numeric", relative tolerance, default 1e-6.
a:: Object of class "numeric", a tune parameter, default 0.1. In the case of gamma frailty, "a" corresponds to the value of both shape and rate parameters.
R:: Object of class "numeric", number of bootstrap or imputation, default 30.

Value

An object of S3 class "panelReg" representing the fit. See panelReg.object for details.

References

Huang, C.-Y., Wang, M., and Zhang, Y. (2006). Analysing panel count data with informative observation times. Biometrika, 93(4), 763–776.

Hu, X. J., Sun, J. and Wei, L. J. (2003). Regression parameter estimation from panel counts. Scandinavian Journal of Statistics, 30, 25–43.

Lu, M., Zhang, Y., and Huang, J. (2007). Estimation of the mean function with panel count data using monotone polynomial splines. Biometrika, 94(3), 705–718.

Wang, X. and Yan, J. (2011). Fitting semiparametric regressions for panel count survival data with an R package spef. Computer Methods and Programs in Biomedicine, 104(2), 278–285.

Wang, X. and Yan, J. (2013). Augmented estimating equations for semiparametric panel count regression with informative observation times and censoring time. Statistica Sinica, 359–381.

Zhang, Y. (2002). A Semiparametric pseudolikelihood estimation method for panel count data. Biometrika, 89(1), 39–48.

Examples

## Not run: 
data(blaTum)
## Fit the bladder tumor data set
formula <- PanelSurv(id, time, count) ~ num + size + treatment

panelReg(formula, data = blaTum, method = "AEE", se = "Sandwich")
panelReg(formula, data = blaTum, method = "AEEX", se = "Impute",
         control = list(a = 0.1, R = 30))
panelReg(formula, data = blaTum, method = "HWZ", se = "Bootstrap",
         control = list(R = 30))
panelReg(formula, data = blaTum, method = "MLs", se = "NULL")
panelReg(formula, data = blaTum, method = "EE.SWa", se = "Bootstrap",
         control = list(R = 30))
panelReg(formula, data = blaTum, method = "EE.HSWc", se = "Bootstrap",
         control = list(R = 30))

## End(Not run)

## Not run: 
data(blaTum)
## Fit the bladder tumor data set
formula <- PanelSurv(id, time, count) ~ num + size + treatment

panelReg(formula, data = blaTum, method = "AEE", se = "Sandwich")
panelReg(formula, data = blaTum, method = "AEEX", se = "Impute",
         control = list(a = 0.1, R = 30))
panelReg(formula, data = blaTum, method = "HWZ", se = "Bootstrap",
         control = list(R = 30))
panelReg(formula, data = blaTum, method = "MLs", se = "NULL")
panelReg(formula, data = blaTum, method = "EE.SWa", se = "Bootstrap",
         control = list(R = 30))
panelReg(formula, data = blaTum, method = "EE.HSWc", se = "Bootstrap",
         control = list(R = 30))

## End(Not run)

Semiparametric Panel Count Regression Object

Description

This S3 class of objects is returned by the panelReg class of functions to represent a fitted semiparametric panel count regression model. Objects of this class have methods for the functions print and plot.

Value

beta: a vector, estimated coefficients of the linear predictor.
baseline: a step function or spline function, estimated cumulative baseline mean.
timeGrid: a vector, ordered unique set of observation times.
lambda: a vector, estimated baseline mean for each interval.
convergence: an integer code, 0 indicates successful convergence, error code 1 indicates that the iteration limit maxIter had been reached.
iter: an integer value, number of interactions when stopped.
betaSE: a vector, estimated standard errors of beta.
betaVar: a matrix, estimated covariance matrix of beta.
baselineSE: a vector, estimated standard error of cumulative baseline mean.

Create a PanelSurv Object

Description

Create a panel count survival object, usually used as a response variable in a model formula.

Usage

PanelSurv(ID, time, count)

is.PanelSurv(x)
PanelSurv(ID, time, count)

is.PanelSurv(x)

Arguments

`ID`	Observation subject's ID.
`time`	Observation time.
`count`	Observation subject's ID.
`x`	An `PanelSurv` object.

Value

An object of S3 class "PanelSurv".

psDF: a data frame, part of original input data frame with variable "ID", "time" and "count".
timeGrid: ordered distinct observation times in the set of all observation times.
panelMatrix: a matrix representation of panel count data, one row per subject, one column per time point in "timeGrid".

In the case of is.PanelSurv, a logical value TRUE if x inherits from class "PanelSurv", otherwise an FALSE.

In the case of plot.PanelSurv, a tile plot of panelMatrix produced by package ggplot2 with color indicating number of counts since last observation time.

Examples

data(blaTum)
response <- with(blaTum, PanelSurv(id, time, count))
is.PanelSurv(response)
plot(response)
data(blaTum)
response <- with(blaTum, PanelSurv(id, time, count))
is.PanelSurv(response)
plot(response)

Plot a `panelReg` Object.

Description

Plot the estimated baseline mean function. If "se" option of panelReg is not "NULL", 95 interval is also plotted.

Usage

## S3 method for class 'panelReg'
plot(x, ...)
## S3 method for class 'panelReg'
plot(x, ...)

Arguments

`x`	The result of a call to the `panelReg` function.
`...`	Other graphical parameters such as line type, color, or axis labels.

Examples

data(blaTum)
## Plot the fit of bladder tumor data set
fm <- PanelSurv(id, time, count) ~ num + size + treatment
fit1 <- panelReg(fm, data=blaTum, method = "AEE", se = "Sandwich")
plot(fit1)

fit2 <- panelReg(fm, data=blaTum, method = "MLs", se = "NULL")
plot(fit2)
data(blaTum)
## Plot the fit of bladder tumor data set
fm <- PanelSurv(id, time, count) ~ num + size + treatment
fit1 <- panelReg(fm, data=blaTum, method = "AEE", se = "Sandwich")
plot(fit1)

fit2 <- panelReg(fm, data=blaTum, method = "MLs", se = "NULL")
plot(fit2)

Produce Tile Plot

Description

Plot the tile plot from a PanelSurv object.

Usage

## S3 method for class 'PanelSurv'
plot(x, heat = FALSE, order = TRUE, ...)
## S3 method for class 'PanelSurv'
plot(x, heat = FALSE, order = TRUE, ...)

Arguments

`x`	an object of class `PanelSurv`.
`heat`	an optional logical value indicating whether a swimmer-plot-like tile plot will be produced.
`order`	an optional logical value indicating whether the tile plot will be sorted by the largest observationt time.
`...`	future extension

Value

A ggplot object

Skin cancer chemoprevention trial

Description

A data frame contains data on the recurrence of skin tumor. The original data is available in Table A.3. of Sun and Zhao (2013). This dataset contains records of 290 patients.

Usage

data(skinTumor)
data(skinTumor)

Format

This data frame contains the following columns:

id: : patient id (repeated for each recurrence).
time: : observation time.
age: : patient's age at enrollment.
male: : gender; male = 1, female = 0.
dfmo: : treatment (DFMO) group = 1; placebo = 0.
priorTumor: : number of prior tumor from diagnosis to randomization.
countBC: : number of newly developed basal cell carcinomas tumors since last observation time.
countSC: : number of newly developed squamous cell carcinomas tumors since last observation time.
count: : number of newly developed non-melanoma tumors since last observation time; this is equal to countBC + countSC.

References

Sun, J. and Zhao, X. (2013). Statistical Analysis of Panel Count Data. New York: Springer.

Examples

data(skinTumor)
library(ggplot2)
ggplot(skinTumor, aes(time, id, width = 25, height = 2)) +
    geom_tile(aes(fill = count)) + theme_bw() +
    facet_grid(dfmo ~ ., scales = "free_y", as.table = FALSE,
               labeller = labeller(dfmo = function(x) paste("DFMO =", x))) +
    scale_fill_gradient(low = "grey", high = "black") +
    scale_x_continuous(breaks = seq(0, 2000, 200)) +
    labs(fill = "Count") + xlab("Time in days")
data(skinTumor)
library(ggplot2)
ggplot(skinTumor, aes(time, id, width = 25, height = 2)) +
    geom_tile(aes(fill = count)) + theme_bw() +
    facet_grid(dfmo ~ ., scales = "free_y", as.table = FALSE,
               labeller = labeller(dfmo = function(x) paste("DFMO =", x))) +
    scale_fill_gradient(low = "grey", high = "black") +
    scale_x_continuous(breaks = seq(0, 2000, 200)) +
    labs(fill = "Count") + xlab("Time in days")

A Simulated Data Mimicking a Skin Cancer Chemoprevention Trial

Description

A data frame contains data on the recurrence of skin tumor. This simulated data is created for illustrative purpose and it mimic the Skin Cancer Chemoprevention Trial used in Chiou et al. (2017). This data set contains records of 100 simulated patients.

Usage

data(skiTum)
data(skiTum)

Format

This data frame contains the following columns:

id: : patient id (repeated for each recurrence).
time: : observation time.
age: : patient's age at enrollment; age = 1 if greater than 65, age = 0 otherwise.
male: : gender; male = 1, female = 0.
dfmo: : treatment (DFMO) group = 1; placebo = 0.
priorTumor: : number of prior tumor from diagnosis to randomization.
count: : number of new tumors since last observation time.

Examples

data(skiTum)
library(ggplot2)
skiTum$dfmo <- factor(skiTum$dfmo, levels = c(1, 0), labels = c("placebo", "DFMO"))
ggplot(skiTum, aes(time, id, height = 2, width = 25)) +
    geom_tile(aes(fill = count)) + theme_bw() +
    facet_grid(dfmo ~ ., scales = "free_y") +
    scale_fill_gradient(low = "grey", high = "black") + labs(fill="Count") +
    scale_x_continuous(breaks = seq(0, 1800, 100)) + xlab("Time in days")
data(skiTum)
library(ggplot2)
skiTum$dfmo <- factor(skiTum$dfmo, levels = c(1, 0), labels = c("placebo", "DFMO"))
ggplot(skiTum, aes(time, id, height = 2, width = 25)) +
    geom_tile(aes(fill = count)) + theme_bw() +
    facet_grid(dfmo ~ ., scales = "free_y") +
    scale_fill_gradient(low = "grey", high = "black") + labs(fill="Count") +
    scale_x_continuous(breaks = seq(0, 1800, 100)) + xlab("Time in days")

Package 'spef'

Help Index

spef: Semiparametric Estimating Functions

Description

Author(s)

References

See Also

Bladder Tumors Cancer Recurrences

Description

Usage

Format

Note

References

See Also

Examples

Bladder Tumors Cancer Recurrences

Description

Usage

Format

Note

References

See Also

Examples

Fits Semiparametric Regression Models for Panel Count Survival Data

Description

Usage

Arguments

Details

Value

References

See Also

Examples

Semiparametric Panel Count Regression Object

Description

Value

See Also

Create a PanelSurv Object

Description

Usage

Arguments

Value

See Also

Examples

Plot a panelReg Object.

Description

Usage

Arguments

See Also

Examples

Produce Tile Plot

Description

Usage

Arguments

Value

Skin cancer chemoprevention trial

Description

Usage

Format

References

See Also

Examples

A Simulated Data Mimicking a Skin Cancer Chemoprevention Trial

Description

Usage

Format

See Also

Examples

Plot a `panelReg` Object.