UCGIS I-GUIDE Community Champion Project

Geomasking Sensitive Individual-level Data to Protect Location Privacy

Yue Lin, The Ohio State University

Location privacy is an individual right that prevents a person from being identified based on his or her geographic location. As geospatial technologies advance, so do concerns about location privacy. A number of methods have been developed to preserve location privacy in the past decades. Geomasking methods, which alter the geographic location of an individual, are often applied before releasing individual-level data to protect confidentiality. In this notebook, we demonstrate how geomasking techniques can be used to protect privacy by providing hands-on examples and code.

Related Topics in GIS&T Body of Knowledge: Location Privacy, GIS and Critical Ethics, Professional and Practical Ethics of GIS&T

Experimental Data

We use a synthetic individual-level population data set from Guernsey County, Ohio1 to illustrate the use of geomasking methods. This data set contains 40,087 individual records across over 2,000 census blocks. We begin by importing and mapping this test data set as shown below. The blue dots on the map represent hypothetical original individual locations.

In [2]:
import pandas as pd
import geopandas as gpd

# Read boundary data
filename_poly = 'data/tl_2020_39059_tabblock10.shp'
poly = gpd.read_file(filename_poly)
poly = poly.to_crs('EPSG:3395')

# Read individual point locations
filename_df = 'data/guernsey_data.csv'
df = pd.read_csv(filename_df)
df['Block'] = df['Block'].astype(str)
gdf = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(x=df.lon, y=df.lat)).set_crs('EPSG:4326')
gdf = gdf.to_crs('EPSG:3395')

# Plot
base = poly.plot(color='white', edgecolor='black')
ax = gdf.plot(ax=base, markersize=0.1) 
ax.set_axis_off()
PROJ: proj_create_from_database: Cannot find proj.db

We can zoom in and focus on Block 390599773001005.

In [3]:
# Obtain block ID
blockid = poly.loc[[1610],'GEOID10'].to_list()[0]

# Plot
base = poly.loc[[1610],'geometry'].plot(color='white', edgecolor='black')
ax = gdf[gdf['Block'] == blockid].plot(ax=base, markersize=5) 
ax.set_axis_off()

Geomasking Methods

Various geographic masking techniques have been developed over the years. All of these include some level of randomization to reduce the probability of identifying an individual based on his or her geographic locations. We here demonstrate the use of five commonly used geomasking techniques: affine transformations, random perturbation, Gaussian perturbation, donut masking, and location swapping.

Affine Transformation

Affine transformation is a set of methods for deterministically moving individual locations to a new set of locations using translation (moving each point by a fixed offset from its original location), change of scale (multiplying the coordinates of each point by a scaling constant), and rotation (rotating each point by a fixed angle about a pivot point)2.

In [11]:
from shapely.geometry import Point
from shapely import affinity

# Set user-defined transformation parameters
# Translation offset
offset = 50
# Scaling factor
factor = 1.1
# Angle of rotation
angle = 90

# Move each point using affine transformation
gdf_at = gdf.copy()
for index, row in gdf_at.iterrows():
    # Translation
    pt = affinity.translate(row['geometry'], xoff=offset, yoff=offset)
    # Change of scale
    new_pt = affinity.scale(pt, xfact=factor, yfact=factor)
    # Rotation
    new_pt = affinity.rotate(new_pt, angle)
    gdf_at.at[index, 'geometry'] = new_pt

# Plot
base = poly.plot(color='white', edgecolor='black')
ax = gdf_at.plot(ax=base, markersize=0.1, color='red') 
ax.set_axis_off()

Zooming in to Block 390599773001005:

In [12]:
# Plot
base = poly.loc[[1610],'geometry'].plot(color='white', edgecolor='black')
ax = gdf_at[gdf_at['Block'] == blockid].plot(ax=base, markersize=5, color='red') 
ax.set_axis_off()

Random Perturbation

Adding randomized noise to the original coordinates is a common way to protect the location privacy of individual-level data23. Each point location can be placed at random within a circle with a center at the original point and a radius defined by the user, or within any other polygon defined relative to the original point. We here illustrate the use of perturbation polygon for privacy protection.

In [3]:
import random
from shapely.geometry import Point

# Set user-defined radius
radius = 50     

def get_random_point_in_polygon(buf):
    """Obtain a random point within the polygon (buffer).

    buf: A polygon with a center at the original point and a user-defined radius

    Returns:
    p: Coordinates of the point with random noise added
    """
    minx, miny, maxx, maxy = buf.bounds
    while True:
        pt = Point(random.uniform(minx, maxx), random.uniform(miny, maxy))
        if buf.contains(pt):
            return pt

# Obtain perturbation polygons
gdf_rp = gdf.copy()
buf = gdf_rp.buffer(radius)

# Move each point at random within the polygon
for index, row in buf.iteritems():
    new_pt = get_random_point_in_polygon(row)
    gdf_rp.at[index, 'geometry'] = new_pt

# Plot
base = poly.plot(color='white', edgecolor='black')
ax = gdf_rp.plot(ax=base, markersize=0.1, color='red') 
ax.set_axis_off()

Zooming in to Block 390599773001005:

In [4]:
# Plot
base = poly.loc[[1610],'geometry'].plot(color='white', edgecolor='black')
ax = gdf_rp[gdf_rp['Block'] == blockid].plot(ax=base, markersize=5, color='red') 
ax.set_axis_off()

Gaussian Perturbation

Allowing the distance of displacement to follow a certain distribution, such as Gaussian or uniform, is another way to achieve displacement4. The code below shows how to move each point according to a Gaussian distribution.

In [17]:
import random
import numpy as np

# Set standard deviation
sigma = 50

# Displace each point following a Gaussian distribution
gdf_gp = gdf.copy()
for index, row in gdf_gp.iterrows():
    pt = row['geometry']
    new_pt = Point(pt.x + np.random.normal(0, sigma, 1), pt.y + np.random.normal(0, sigma, 1))
    gdf_gp.at[index, 'geometry'] = new_pt

# Plot
base = poly.plot(color='white', edgecolor='black')
ax = gdf_gp.plot(ax=base, markersize=0.1, color='red') 
ax.set_axis_off()

Zooming in to Block 390599773001005:

In [18]:
# Plot
base = poly.loc[[1610],'geometry'].plot(color='white', edgecolor='black')
ax = gdf_gp[gdf_gp['Block'] == blockid].plot(ax=base, markersize=5, color='red') 
ax.set_axis_off()

Donut Masking

We define a maximum distance for shifting particular individual locations in random perturbation. Donut masking extends this method by incorporating a minimum distance of displacement, ensuring a user-defined minimum level of privacy protection5.

In [19]:
import random
from shapely.geometry import Point

# Set user-defined radius
radius_in = 10
radius_out = 50

def get_random_point_in_donut(buf_in, buf_out):
    """Obtain a random point within the donut (buffer).

    buf_in: The inner polygon with a radius of the minimum displacement distance
    buf_out: The outer polygon with a radius of the maximum displacement distance

    Returns:
    p: Coordinates of the point with random noise added
    """
    minx, miny, maxx, maxy = buf_out.bounds
    while True:
        pt = Point(random.uniform(minx, maxx), random.uniform(miny, maxy))
        if buf_out.contains(pt) and not buf_in.contains(pt):
            return pt

# Obtain perturbation donuts
gdf_dm = gdf.copy()
buf_in = gdf_dm.buffer(radius_in)
buf_out = gdf_dm.buffer(radius_out)

# Move each point at random within the donut
for index, row in gdf_dm.iterrows():
    poly_in = buf_in.iloc[index]
    poly_out = buf_out.iloc[index]
    new_pt = get_random_point_in_donut(poly_in, poly_out)
    gdf_dm.at[index, 'geometry'] = new_pt

# Plot
base = poly.plot(color='white', edgecolor='black')
ax = gdf_dm.plot(ax=base, markersize=0.1, color='red') 
ax.set_axis_off()

Zooming in to Block 390599773001005:

In [20]:
# Plot
base = poly.loc[[1610],'geometry'].plot(color='white', edgecolor='black')
ax = gdf_dm[gdf_dm['Block'] == blockid].plot(ax=base, markersize=5, color='red') 
ax.set_axis_off()

References

Lin, Y., & Xiao, N. (2022). Developing synthetic individual-level population datasets: The case of contextualizing maps of privacy-preserving census data. arXiv preprint arXiv:2206.04766.

Armstrong, M. P., Rushton, G., & Zimmerman, D. L. (1999). Geographically masking health data to preserve confidentiality. Statistics in Medicine, 18(5), 497-525.

Kwan, M. P., Casas, I., & Schmitz, B. (2004). Protection of geoprivacy and accuracy of spatial information: How effective are geographical masks?. Cartographica: The International Journal for Geographic Information and Geovisualization, 39(2), 15-28.

Cassa, C. A., Wieland, S. C., & Mandl, K. D. (2008). Re-identification of home addresses from spatial locations anonymized by Gaussian skew. International Journal of Health Geographics, 7(1), 1-9.

Hampton, K. H., Fitch, M. K., Allshouse, W. B., Doherty, I. A., Gesink, D. C., Leone, P. A., ... & Miller, W. C. (2010). Mapping health data: improved privacy protection with donut method geomasking. American Journal of Epidemiology, 172(9), 1062-1069.