Differentially Private Anomaly Detection. DPAD for short.
It’s a mouthful, yes, but DPAD is the name of a new Rutgers University research initiative that could help solve one of the thorniest problems in today’s world of ubiquitous online communications: how to protect the privacy of individuals when searching for anomalies in big sets of digital data for security or other reasons.
Every time we use our smart phones, send a Twitter message, make a purchase, deposit a check, visit the doctor, search the Internet, or walk down the street in front of ubiquitous security cameras, our personal data is collected and turned into digital information that puts our privacy at risk.
The Rutgers initiative, funded with a federal grant from the Department of Homeland Security, will test a theoretical model of differential privacy first proposed about 10 years ago to better protect the privacy of ordinary citizens when this data is collected and analyzed.
The project is sponsored by the Identity and Access Management Program at the Department of Homeland Security, and is one of a collection of related projects at CCICADA, the DHS University Center of Excellence based at Rutgers University.
“The objective (of the research project) is to evaluate how well and in what contexts differentially private algorithms can reliably detect anomalies while preserving the privacy of non-anomalous data/individuals.” Dr. Rebecca Wright and Dr. Anand D. Sarwate, Rutgers University faculty
Leading the research effort are two Rutgers University faculty, both of whom are experts in differential privacy: Dr. Rebecca Wright and Dr. Anand Sarwate.
Wright is a professor of computer science at Rutgers and director of its Center for Discrete Mathematics and Theoretical Computer Science (DIMACS). Sarwate is assistant professor of electrical and computer engineering. Assisting them are two graduate students, Mohsen Ghassemi and Daniel Bittner, and postdoctoral researcher Morteza Monemizadeh.
Sarwate said the theoretical model they are testing is analogous to “using a flow monitor to control how much information is being leaked as opposed to just letting private information spill out unmonitored into our (digital) waterways.”
“We’re trying to understand when it is, or might be, possible to protect the public’s right to privacy while detecting a few bad individuals,” said Wright.
“This is a balancing of the public and private good,” added Sarwate, noting this balancing is not simply a technical problem, but a policy problem as well. Government must weigh in on privacy issues, deciding what is an acceptable tradeoff between gathering valuable information and protecting the privacy of individuals.
Wright said the theory of differential privacy behind the DPAD model has been studied a great deal in recent years and is even starting to find some practical applications. However, the application to anomaly detection is new. “This is a very promising technology that we are exploring,” she said.
What exactly is anomaly detection?
In the context of the Rutgers research project, it’s using mathematical algorithms to analyze otherwise “normal” or predictable sets of data for unusual or aberrant patterns that could be evidence of an online intrusion (such as a hack) or other threat.
Anomaly detection is currently used by law enforcement agencies to detect threats from computer hackers, terrorists, illegal sex traffickers and other bad actors. It is used to detect suspicious individuals in social networks and to sniff out financial and healthcare fraud. But these detection efforts put the personal information of innocent citizens at risk.
The DPAD model Rutgers is testing theoretically would allow investigators to detect anomalous threats and preserve privacy at the same time. In the world of big data, one could say it’s the equivalent of having your cake and eating it, too.
Citing two examples, Sarwate said government agencies could use DPAD to investigate fraudulent Medicare billing operations or opioid “pill mills” run by unscrupulous doctors and their co-conspirators without exposing the information of good actors.
He noted that Apple is exploring how to use the general concept of differential privacy to mine and analyze ever greater quantities of personal data while still protecting the privacy of its customers. Other technology giants are conducted similar research, including Google with its Rappor project, and Microsoft, where the original model of differential privacy was developed.
The Rutgers University researchers hope to provide answers that will be useful to both business and government.
Wright and Sarwate have posed two underlying questions to guide their research:
1) How can we identify unusual or interesting objects when searching through private data?
2) How can we quantify how intrusive a search is while looking for these anomalies?
In essence, they are trying to understand the basic tradeoffs between privacy and efficiency. When and how is it possible to quickly and efficiently search for anomalies while still guaranteeing individual privacy? (The “efficiency” part of the equation is important because, for example, the FBI and other law enforcement agencies have limited budgets.)
The Rutgers researchers offer this online summation of their project: “The goal is to develop algorithms for screening and anomaly detection in private data using a combination of techniques from group testing, active learning, and sequential hypothesis testing. (Learn more about these techniques). The objective is to evaluate how well and in what contexts differentially private algorithms can reliably detect anomalies while preserving the privacy of non-anomalous data/individuals.”
Wright and Sarwate are employing a mix of classical statistical tools and leading-edge data analytics algorithms to test theoretical privacy walls. They are developing mathematical models for understanding the process of searching in private data, designing algorithms for detecting anomalous data, and quantifying the privacy risk for non-anomalous data.
They are exploring how DPAD might be used by business or government in one of two situations: 1) When they know what kinds of anomalous information they are looking for, and simply need to find it, and 2) When they aren’t sure what they are looking for, but need a way to identify patterns of activity that they assume can be found.
“There’s always this distinction between learning what you’re looking for and already knowing what you are looking for,” Sarwate explained.
Wright pointed out that privacy concerns and protections are nothing new. The US Census has long been conducted in a way that protects the privacy of individuals. Another example is the HIPAA Privacy Rule, which establishes national standards to protect individuals’ medical records and other personal health information.
But HIPAA lacks specific privacy solutions, as does the largely unregulated practice, now common in business and government, of collecting and analyzing huge amounts of complex, online information. These so-called “big data” problems require new solutions.
Consequently, there is growing concern among policy makers, security professionals and ordinary citizens about how to protect the privacy of individuals when so much of their information is already floating in cyberspace—waiting to be captured and misused by profiteers or stolen by computer hackers.
Sarwate and Wright acknowledge they face considerable challenges in their research. “It’s very hard to find a useful place between private and public information,” said Sarwate.
One thing is clear, however. The Rutgers research team is determined to identify that useful space and to develop tools that can be useful in a wide variety of applications.