Article ID Journal Published Year Pages File Type
474644 Computers & Operations Research 2014 10 Pages PDF
Abstract

We present a fast method for determining the tightest possible bounds, as well as all feasible values, for the underlying cell counts in a two-way contingency table based on knowledge of the corresponding unrounded conditional probabilities, the sample size, and (optionally) bounds on cells and certain sums of cells. This information can be used in statistical inference procedures and also has potential uses in statistical disclosure control, which deals with protecting privacy and confidentiality when data summaries are released to the public. The problem formally consists of a large number of integer linear knapsack optimizations (two per cell). Here we identify special common structure that allows for efficient reuse among cells of intermediate results within a dynamic programming framework. The method runs very quickly on practical examples, thereby enabling a real-time interactive exploration of disclosure risk for two-way rearrangements of large multi-way tables.

Related Topics
Physical Sciences and Engineering Computer Science Computer Science (General)
Authors
, ,