The level of privacy protection for individuals in the dataset is governed by two privacy loss parameters, ε and δ. You can think of δ as the probability of a blatant privacy violation. As such, δ should be kept quite small. ε should be set to a small constant. Larger ε corresponds to less privacy and more accurate statistics. We do not recommend ε exceeding 1. Below is Harvard's list of varying levels of sensitivity for datasets and reasonable privacy loss parameters for each level. The recommendations below are just a guideline. See Harvard's secure data classifications for more information.
Public information: It is not necessary to use differential privacy for public information.
Information the disclosure of which would not cause material harm, but which the University has chosen to keep confidential: (ε=1, δ=10-5=0.00001)
Information that could cause risk of material harm to individuals or the University if disclosed: (ε=.25, δ=10-6=0.000001)
Information that would likely cause serious harm to individuals or the University if disclosed: (ε=.05, δ=10-7=0.0000001)
Information that would cause severe harm to individuals or the University if disclosed: It is not recommended that the PSI tool be used with such severely sensitive data.
Secrecy of the Sample
If the data is a random and secret sample from a larger population of known size, then the accuracy of the released statistics can be boosted without changing the privacy guarantee. Here, secret means that the choice of the people in the sample will not be revealed.
This boost requires an estimate of the size of the larger population. It is important to be conservative in your estimate. In other words, it is okay to underestimate but could violate privacy if you overestimate.
Set the privacy loss parameters for your dataset below and if secrecy of the sample applies to your data, estimate the size of the population from which it was drawn: