open access publication

Article, 2018

Analysis of generalized semiparametric regression models for cumulative incidence functions with missing covariates

Computational Statistics & Data Analysis, ISSN 1872-7352, 0167-9473, Volume 122, Pages 59-79, 10.1016/j.csda.2018.01.003

Contributors

Lee, Unkyung [1] Sun, Yanqing (Corresponding author) [2] Scheike, Thomas Harder 0000-0002-2148-4740 [3] Gilbert, Gilbert Peter 0000-0002-2662-9427 [4] [5]

Affiliations

  1. [1] Texas A&M University
  2. [NORA names: United States; America, North; OECD];
  3. [2] University of North Carolina at Charlotte
  4. [NORA names: United States; America, North; OECD];
  5. [3] University of Copenhagen
  6. [NORA names: KU University of Copenhagen; University; Denmark; Europe, EU; Nordic; OECD];
  7. [4] Fred Hutch Cancer Center
  8. [NORA names: United States; America, North; OECD];
  9. [5] University of Washington
  10. [NORA names: United States; America, North; OECD]

Abstract

The cumulative incidence function quantifies the probability of failure over time due to a specific cause for competing risks data. The generalized semiparametric regression models for the cumulative incidence functions with missing covariates are investigated. The effects of some covariates are modeled as non-parametric functions of time while others are modeled as parametric functions of time. Different link functions can be selected to add flexibility in modeling the cumulative incidence functions. The estimation procedures based on the direct binomial regression and the inverse probability weighting of complete cases are developed. This approach modifies the full data weighted least squares equations by weighting the contributions of observed members through the inverses of estimated sampling probabilities which depend on the censoring status and the event types among other subject characteristics. The asymptotic properties of the proposed estimators are established. The finite-sample performances of the proposed estimators and their relative efficiencies under different two-phase sampling designs are examined in simulations. The methods are applied to analyze data from the RV144 vaccine efficacy trial to investigate the associations of immune response biomarkers with the cumulative incidence of HIV-1 infection.

Keywords

HIV-1 infection, RV144, RV144 vaccine efficacy trial, analysis, analyzed data, association, asymptotic properties, binomial regression, biomarkers, cases, cause, censoring status, characteristics, contribution, covariates, cumulative incidence, cumulative incidence function, data, design, effect, efficacy trials, efficiency, equations, estimation, estimation procedure, event types, events, failure, finite-sample performance, flexibility, function, immune response biomarkers, incidence function, incidence of HIV-1 infection, infection, inverse probability weighting, inversion, least-squares equations, members, method, model, non-parametric function, parametric functions, performance, probability, probability of failure, probability weighting, procedure, properties, regression, regression models, response biomarkers, risk, risk data, sampling design, sampling probability, semiparametric regression model, simulation, square equation, status, subject characteristics, trials, two-phase sampling design, type, vaccine efficacy trials, weight

Funders

  • National Cancer Institute
  • National Institute of Allergy and Infectious Diseases
  • Directorate for Mathematical & Physical Sciences
  • United States Army

Data Provider: Digital Science