Estimating spatial variation in disease risk from locations coarsened by incomplete geocoding

Article ID	Journal	Published Year	Pages	File Type
1151198	Statistical Methodology	2012	12 Pages	PDF

Abstract

Inference for spatial variation in relative risk of disease is an important problem in spatial epidemiologic studies. A standard component of data assimilation in these studies is the assignment of a geocode, i.e. point-level spatial coordinates, to the address of each subject in the study population. Unfortunately, when geocoding is performed by the standard procedure of street-segment matching to a georeferenced road file and subsequent interpolation, it is rarely completely successful. Typically, 10-30% of the addresses in the study population fail to geocode, which can adversely affect relative risk estimation, especially if one of the disease groups (e.g. cases) has a different geocoding success rate than another (e.g. controls). The possibility exists, however, for ameliorating this effect by incorporating geographic information coarser than a point (e.g. a Zip code) that is measured for the observations that fail to geocode. This article develops coarsened-data methods for relative risk estimation from incompletely geocoded data. Nonparametric (kernel smoothing) estimation procedures are featured; parametric (likelihood-based) procedures are described as well, but their applicability is much more limited. We demonstrate, via simulation and a real example of childhood asthma cases in an Iowa county that substantial improvements in the quality of relative risk estimates are possible using the proposed nonparametric coarsened-data methods.

Keywords

Geocoding Spatial epidemiology Relative risk