DONATE

When the Cat is Missing Its Tail: Bringing Together Privacy Researchers to Solve Synthetic Data

By: Erica Lieberman
October 15, 2023
Est. Reading Time: 5 minutes
Collaborative Research Cycle hosted by the National Institute of Standard and Technology
Our "Stories from the field" series shares the experiences of engineers and scientists who are making a difference in their communities.
Share this with your network

Christine Task is a senior research scientist at Knexus Research Corporation and is the contracted technical lead of the Collaborative Research Cycle (CRC) hosted by the National Institute of Standards and Technology’s Privacy Engineering Program. She spoke with ESAL about mobilizing the privacy research community to solve pressing problems facing the federal government.

EL: Before you started working on privacy and security research, you were a mathematician. What sparked your interest in this type of applied research and its application to public policy?

Task: Being in Washington, DC has gotten me rapidly connected to problems that impact policy. I have always been interested in proofs and theoretical problems that are fun, aesthetic, and challenging, and I wanted to focus on a subset of those problems that have an impact on real people’s lives. When I moved to DC, I went to an amazing talk series at the National Science Foundation, hosted by the Secure and Trusted Cyberspace group. The talk series featured speakers with a focus on solving problems like voting machine fraud, infrastructure security, and biases in data. I would sit in the audience in this little dark room and ask a lot of questions, and later started to attend other seminars on privacy and security. While working at Knexus, we began collaborating with the federal government. They presented us with a set of challenges, deemed “impossible problems,” among which was differentially private synthetic data  and others that we are currently tackling.


EL: What is the Collaborative Research Cycle and what issues is it trying to solve?

​​Task: Essentially, the CRC was founded because of the need to strengthen our formal understanding of human data. It was conceived after holding two “classic” prize challenges in differential privacy and synthetic data. To understand these subjects, imagine human data living in a high-dimensional space, and what we are trying to do is to create new data that looks like the old data, but is actually different data. By using these “synthetic people,” we can preserve the “shape” of the data without introducing privacy problems for real individuals.

The first two challenges came about because the Census Bureau had some pain points with evaluating how similar synthetic data was to real data, and we decided it would be a good idea to outsource and open up the hardest problems, the most ambitious stuff that is out there, to the research community. When we started the first challenge, half of the competitors weren’t sure that what we were asking them to do was even possible. Everyone was up for trying it, but we were prepared for no one to do well and that we’d just all learn together what the limitations were. But participants actually were successful!

Christine Task (image from LinkedIn).

When we did exit interviews with the teams, we learned that the teams had actually been doing massive trial-and-error; essentially, each team had been doing many experiments, but they weren’t taking the time to go back and figure out why what they did worked or didn’t work, and therefore couldn’t derive a formal understanding. Introducing research cycles aims to incentivize people to actually go back and work together to gain deeper understanding of the solutions, and be able to iterate on them as a larger team.

EL: How do you expect the outcome of the CRC will bring about improved public policies?

Task: I like using the metaphor of a train to explain this. The train was invented before we understood thermodynamics. People knew about wheels, steam, and fires, and they’d make trains. If you worked around a lot of trains, you could guess how they’d behave, but you didn’t have the math to say why they worked or when they would malfunction. We think it’s important to understand the mathematical rules underlying developments of civic interest. Once you have the math to understand what is going on, you can apply it to even more things and enable society to make huge advances. Otherwise, we are stuck with over-simplifications of data, such as pretending that everyone is on a normal curve, when we already know that’s not the case. These oversimplifications in such a model would compound until we erase the representation of diverse groups from the data, which would cause all sorts of problems. We are trying to have the “thermodynamics” moment of human data.

EL: What do you think are the key aspects of the CRC that allow it to succeed?

Task: The privacy research community is really eager to see their work have real-world impact. In general, any time the research comes down to computers and math, which include a lot of physical sciences, you can design a sort of “research accelerator” - everything you’re not good at, someone else can do that for you. And as hosts, we put a lot of thought into the design of the challenges, making them be welcoming and accessible, because when you are trying to get people to engage in something they’re not being paid to do, it means their experiences should be as effortless and friendly as possible.  

EL: What skills do you think are important for someone to develop if they’re interested in a role similar to yours?

Task: Any technical lead needs to be good at understanding how people work. Getting standards for tech means coordinating outreach to lots of stakeholder groups in a technical area. 

When I was growing up in Dayton, OH, I participated in a seminar every Thursday night where we got to meet different researchers working at the cutting-edge of their fields. I was fortunate to have the privilege of exposure to many different ways in which people contribute productively. I learned that the main thing to do is listen and understand that two groups can have totally different sets of values, and both of their voices are important. 

And finally, it’s important to keep in mind that some people naturally have a focus that is wider or more narrow. To make sure that everyone can contribute to a project, a technical lead needs to know how to effectively construct work packages for people who focus on different scales. 

EL: Besides participating in the CRC, what are other opportunities for people to get involved in data privacy issues?

Task: There’s no good reason that your grade school education prepares you for calculus but not for data science; it’s just a different way of thinking about things and a different set of problems that need to be solved. Algorithms using human data have an increasing amount of impact on human life. We should be teaching them in grade school so that people have an awareness of their environment and can be informed consumers of algorithms, advocating for changes when necessary.

We are also working on preparing coursework on this subject. For now, there are tools online. You can download PDFs and see for yourself how algorithms developed during the CRC are performing on modeling. These materials are extremely accessible for people of any background. If you have ever looked at a scatter plot, you can help out. One of my friends who was an English major observed that the representation of the original data looked like a cat, but that the simulated data was missing its “tail.” This observation actually helped us realize that our model had failed to reproduce part of that data and would need to be improved. 

Do you have a story to tell about your own local engagement or of someone you know? Please submit your idea here , and we will help you develop and share your story for our series.

Engineers & Scientists Acting Locally (ESAL) is a non-advocacy, non-political organization. The information in this post is for general informational purposes and does not imply an endorsement by ESAL for any political candidates, businesses, or organizations mentioned herein.
Published: 10/15/23
Updated: 06/26/24
linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram