Article ID Journal Published Year Pages File Type
524737 Transportation Research Part C: Emerging Technologies 2016 11 Pages PDF
Abstract

•Mathematical programming is used to match time-stamped records.•This extracts relevant information from systems with recording errors and omissionss.•It enables systematic exploration of a range of possible interpretations of the data.•The technique is robust for automatic pre-processing of data from new technologies.

Time-stamped data for transportation and logistics are essential for estimating times on transportation legs and times between successive stages in logistic processes. Often these data are subject to recording errors and omissions. Matches must then be inferred from the time stamps alone because identifying keys are unavailable, suppressed to preserve confidentiality, or ambiguous because of missing observations. We present an integer programming (IP) model developed for matching successive events in such situations and illustrate its application in three problem settings involving (a) airline operations at an airport, (b) taxi service between an airport and a train station, and (c) taxi services from an airport. With data from the third setting (where a matching key was available), we illustrate the robustness of estimates for median and mean times between events under different random rates for “failure to record”, different screening criteria for outliers, and different target times used in the IP objective. The IP model proves to be a tractable and informative tool for data matching and data cleaning, with a wide range of potential applications.

Related Topics
Physical Sciences and Engineering Computer Science Computer Science Applications
Authors
, ,