Yue Lin, The Ohio State University
Location privacy is an individual right that prevents a person from being identified based on his or her geographic location. As geospatial technologies advance, so do concerns about location privacy. A number of methods have been developed to preserve location privacy in the past decades. Geomasking methods, which alter the geographic location of an individual, are often applied before releasing individual-level data to protect confidentiality. In this notebook, we demonstrate how geomasking techniques can be used to protect privacy by providing hands-on examples and code.
Related Topics in GIS&T Body of Knowledge: Location Privacy, GIS and Critical Ethics, Professional and Practical Ethics of GIS&T
We use a synthetic individual-level population data set from Guernsey County, Ohio1 to illustrate the use of geomasking methods. This data set contains 40,087 individual records across over 2,000 census blocks. We begin by importing and mapping this test data set as shown below. The blue dots on the map represent hypothetical original individual locations.
import pandas as pd
import geopandas as gpd
# Read boundary data
filename_poly = 'data/tl_2020_39059_tabblock10.shp'
poly = gpd.read_file(filename_poly)
poly = poly.to_crs('EPSG:3395')
# Read individual point locations
filename_df = 'data/guernsey_data.csv'
df = pd.read_csv(filename_df)
df['Block'] = df['Block'].astype(str)
gdf = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(x=df.lon, y=df.lat)).set_crs('EPSG:4326')
gdf = gdf.to_crs('EPSG:3395')
# Plot
base = poly.plot(color='white', edgecolor='black')
ax = gdf.plot(ax=base, markersize=0.1)
ax.set_axis_off()
We can zoom in and focus on Block 390599773001005.
# Obtain block ID
blockid = poly.loc[[1610],'GEOID10'].to_list()[0]
# Plot
base = poly.loc[[1610],'geometry'].plot(color='white', edgecolor='black')
ax = gdf[gdf['Block'] == blockid].plot(ax=base, markersize=5)
ax.set_axis_off()
Various geographic masking techniques have been developed over the years. All of these include some level of randomization to reduce the probability of identifying an individual based on his or her geographic locations. We here demonstrate the use of five commonly used geomasking techniques: affine transformations, random perturbation, Gaussian perturbation, donut masking, and location swapping.
Affine transformation is a set of methods for deterministically moving individual locations to a new set of locations using translation (moving each point by a fixed offset from its original location), change of scale (multiplying the coordinates of each point by a scaling constant), and rotation (rotating each point by a fixed angle about a pivot point)2.
from shapely.geometry import Point
from shapely import affinity
# Set user-defined transformation parameters
# Translation offset
offset = 50
# Scaling factor
factor = 1.1
# Angle of rotation
angle = 90
# Move each point using affine transformation
gdf_at = gdf.copy()
for index, row in gdf_at.iterrows():
# Translation
pt = affinity.translate(row['geometry'], xoff=offset, yoff=offset)
# Change of scale
new_pt = affinity.scale(pt, xfact=factor, yfact=factor)
# Rotation
new_pt = affinity.rotate(new_pt, angle)
gdf_at.at[index, 'geometry'] = new_pt
# Plot
base = poly.plot(color='white', edgecolor='black')
ax = gdf_at.plot(ax=base, markersize=0.1, color='red')
ax.set_axis_off()
Zooming in to Block 390599773001005:
# Plot
base = poly.loc[[1610],'geometry'].plot(color='white', edgecolor='black')
ax = gdf_at[gdf_at['Block'] == blockid].plot(ax=base, markersize=5, color='red')
ax.set_axis_off()
Adding randomized noise to the original coordinates is a common way to protect the location privacy of individual-level data23. Each point location can be placed at random within a circle with a center at the original point and a radius defined by the user, or within any other polygon defined relative to the original point. We here illustrate the use of perturbation polygon for privacy protection.
import random
from shapely.geometry import Point
# Set user-defined radius
radius = 50
def get_random_point_in_polygon(buf):
"""Obtain a random point within the polygon (buffer).
buf: A polygon with a center at the original point and a user-defined radius
Returns:
p: Coordinates of the point with random noise added
"""
minx, miny, maxx, maxy = buf.bounds
while True:
pt = Point(random.uniform(minx, maxx), random.uniform(miny, maxy))
if buf.contains(pt):
return pt
# Obtain perturbation polygons
gdf_rp = gdf.copy()
buf = gdf_rp.buffer(radius)
# Move each point at random within the polygon
for index, row in buf.iteritems():
new_pt = get_random_point_in_polygon(row)
gdf_rp.at[index, 'geometry'] = new_pt
# Plot
base = poly.plot(color='white', edgecolor='black')
ax = gdf_rp.plot(ax=base, markersize=0.1, color='red')
ax.set_axis_off()
Zooming in to Block 390599773001005:
# Plot
base = poly.loc[[1610],'geometry'].plot(color='white', edgecolor='black')
ax = gdf_rp[gdf_rp['Block'] == blockid].plot(ax=base, markersize=5, color='red')
ax.set_axis_off()
import random
import numpy as np
# Set standard deviation
sigma = 50
# Displace each point following a Gaussian distribution
gdf_gp = gdf.copy()
for index, row in gdf_gp.iterrows():
pt = row['geometry']
new_pt = Point(pt.x + np.random.normal(0, sigma, 1), pt.y + np.random.normal(0, sigma, 1))
gdf_gp.at[index, 'geometry'] = new_pt
# Plot
base = poly.plot(color='white', edgecolor='black')
ax = gdf_gp.plot(ax=base, markersize=0.1, color='red')
ax.set_axis_off()
Zooming in to Block 390599773001005:
# Plot
base = poly.loc[[1610],'geometry'].plot(color='white', edgecolor='black')
ax = gdf_gp[gdf_gp['Block'] == blockid].plot(ax=base, markersize=5, color='red')
ax.set_axis_off()
import random
from shapely.geometry import Point
# Set user-defined radius
radius_in = 10
radius_out = 50
def get_random_point_in_donut(buf_in, buf_out):
"""Obtain a random point within the donut (buffer).
buf_in: The inner polygon with a radius of the minimum displacement distance
buf_out: The outer polygon with a radius of the maximum displacement distance
Returns:
p: Coordinates of the point with random noise added
"""
minx, miny, maxx, maxy = buf_out.bounds
while True:
pt = Point(random.uniform(minx, maxx), random.uniform(miny, maxy))
if buf_out.contains(pt) and not buf_in.contains(pt):
return pt
# Obtain perturbation donuts
gdf_dm = gdf.copy()
buf_in = gdf_dm.buffer(radius_in)
buf_out = gdf_dm.buffer(radius_out)
# Move each point at random within the donut
for index, row in gdf_dm.iterrows():
poly_in = buf_in.iloc[index]
poly_out = buf_out.iloc[index]
new_pt = get_random_point_in_donut(poly_in, poly_out)
gdf_dm.at[index, 'geometry'] = new_pt
# Plot
base = poly.plot(color='white', edgecolor='black')
ax = gdf_dm.plot(ax=base, markersize=0.1, color='red')
ax.set_axis_off()
Zooming in to Block 390599773001005:
# Plot
base = poly.loc[[1610],'geometry'].plot(color='white', edgecolor='black')
ax = gdf_dm[gdf_dm['Block'] == blockid].plot(ax=base, markersize=5, color='red')
ax.set_axis_off()
Lin, Y., & Xiao, N. (2022). Developing synthetic individual-level population datasets: The case of contextualizing maps of privacy-preserving census data. arXiv preprint arXiv:2206.04766.
Armstrong, M. P., Rushton, G., & Zimmerman, D. L. (1999). Geographically masking health data to preserve confidentiality. Statistics in Medicine, 18(5), 497-525.
Kwan, M. P., Casas, I., & Schmitz, B. (2004). Protection of geoprivacy and accuracy of spatial information: How effective are geographical masks?. Cartographica: The International Journal for Geographic Information and Geovisualization, 39(2), 15-28.
Cassa, C. A., Wieland, S. C., & Mandl, K. D. (2008). Re-identification of home addresses from spatial locations anonymized by Gaussian skew. International Journal of Health Geographics, 7(1), 1-9.
Hampton, K. H., Fitch, M. K., Allshouse, W. B., Doherty, I. A., Gesink, D. C., Leone, P. A., ... & Miller, W. C. (2010). Mapping health data: improved privacy protection with donut method geomasking. American Journal of Epidemiology, 172(9), 1062-1069.