SAFETYLIT WEEKLY UPDATE

We compile citations and summaries of about 400 new articles every week.
RSS Feed

HELP: Tutorials | FAQ
CONTACT US: Contact info

Search Results

Journal Article

Citation

Zhao Y, Long Q. Wiley Interdiscip. Rev. Comput. Stat. 2017; 9(5): e1402.

Affiliation

Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania.

Copyright

(Copyright © 2017, John Wiley and Sons)

DOI

10.1002/wics.1402

PMID

29085552

PMCID

PMC5659333

Abstract

Variable selection plays an essential role in regression analysis as it identifies important variables that associated with outcomes and is known to improve predictive accuracy of resulting models. Variable selection methods have been widely investigated for fully observed data. However, in the presence of missing data, methods for variable selection need to be carefully designed to account for missing data mechanisms and statistical techniques used for handling missing data. Since imputation is arguably the most popular method for handling missing data due to its ease of use, statistical methods for variable selection that are combined with imputation are of particular interest. These methods, valid used under the assumptions of missing at random (MAR) and missing completely at random (MCAR), largely fall into three general strategies. The first strategy applies existing variable selection methods to each imputed dataset and then combine variable selection results across all imputed datasets. The second strategy applies existing variable selection methods to stacked imputed datasets. The third variable selection strategy combines resampling techniques such as bootstrap with imputation. Despite recent advances, this area remains under-developed and offers fertile ground for further research.


Language: en

Keywords

MAR; MCAR; MNAR; bootstrap; imputation; missing data; resampling; variable selection

NEW SEARCH


All SafetyLit records are available for automatic download to Zotero & Mendeley
Print