open access publication

Article, 2017

Generalized partially linear regression with misclassified data and an application to labour market transitions

Computational Statistics & Data Analysis, ISSN 1872-7352, 0167-9473, Volume 110, Pages 145-159, 10.1016/j.csda.2017.01.003

Contributors

Dlugosz, Stephan [1] Mammen, Enno [2] Wilke, Ralf Andreas 0000-0002-6105-6345 (Corresponding author) [1] [3] [4]

Affiliations

  1. [1] Centre for European Economic Research
  2. [NORA names: Germany; Europe, EU; OECD];
  3. [2] Heidelberg University
  4. [NORA names: Germany; Europe, EU; OECD];
  5. [3] Copenhagen Business School
  6. [NORA names: CBS Copenhagen Business School; University; Denmark; Europe, EU; Nordic; OECD];
  7. [4] University of Strasbourg
  8. [NORA names: France; Europe, EU; OECD]

Abstract

Large data sets that originate from administrative or operational activity are increasingly used for statistical analysis as they often contain very precise information and a large number of observations. But there is evidence that some variables can be subject to severe misclassification or contain missing values. Given the size of the data, a flexible semiparametric misclassification model would be good choice but their use in practise is scarce. To close this gap a semiparametric model for the probability of observing labour market transitions is estimated using a sample of 20 m observations from Germany. It is shown that estimated marginal effects of a number of covariates are sizeably affected by misclassification and missing values in the analysis data. The proposed generalized partially linear regression extends existing models by allowing a misclassified discrete covariate to be interacted with a nonparametric function of a continuous covariate.

Keywords

Germany, activity, analysis, analysis data, applications, continuous covariates, covariates, data, discrete covariates, effect, estimated marginal effects, evidence, function, information, labor, labour market transitions, linear regression, marginal effect, market transition, misclassification, misclassification model, misclassified data, missing values, model, nonparametric function, observations, operational activities, partial linear regression, practise, probability, regression, samples, semiparametric model, severe misclassifications, size, statistical analysis, transition, values, variables

Funders

  • Deutsche Forschungsgemeinschaft

Data Provider: Digital Science