A quick observation on the provenance of randomness in privacy

1 minute read

Checking out Kifer and Machanavajjhala’s TODS paper on their Pufferfish privacy framework, I read the following (note that \(M_1\) and \(M_2\) are randomized query response algorithms):

AXIOM 5.2. (Convexity [Kifer and Lin 2012]). If \(M_1\) and \(M_2\) satisfy a privacy definition, and \(p \in [0, 1]\), then the algorithm \(M_p\) which runs M1 with probability \(p\) and \(M_2\) with probability \(1 - p\) should also satisfy the privacy definition.

As written, the above has a problem: it does not state that \(p\) must be chosen independently of the data (Kifer and Lin do state this in the citation above, albeit only in a parenthesis). To see why stating the source of randomness is truly central to the application of this axiom, consider the following scenario.

First, let \(M\) be a randomized query response algorithm returning an integer, satisfying privacy definition \(P\). Now assume that post-processing the output of \(M\) by a non-secret algorithm \(f\) satisfies \(P\) as well. This is not hard to agree to, as \(f\) is not secret, any recipient of \(M\)’s output can apply \(f\) themselves. Now, let \(f\) encode its input as a character string, \(M_1 = M\), and \(M_2 = f \circ M\). We now have that both \(M_1\) and \(M_2\) satisfy \(P\).

Bob works for Charlie the data curator. Now Alice comes along and queries Bob for data containing Daniel’s HIV status. Bob is in collusion with Alice to provide her with Daniel’s status, but has to prove to Charlie that any responses he gives Alice satisfy \(P\). If Charlie accepts the axiom above as written, Bob can choose \(p = 1\) if Daniel is positive, and \(p = 0\) otherwise, and still prove to Charlie that his response to Alice satisfies \(P\). This means that Alice receives an integer response if Daniel is positive and a string response otherwise, bypassing privacy protection completely.

Therefore, whenever privacy (or any other thing, for that matter) depends on randomization, one should think about, and explicitly state, the source randomness.