The International Organization for Migration (IOM)  has released a new synthetic dataset on human trafficking, made possible by technology developed in partnership with Microsoft Research.

The dataset represents the largest collection of primary human trafficking case data ever made available to the public, while enabling strong privacy guarantees that preserve the anonymity and safety of victims and survivors.

The downloadable Global Synthetic Dataset has been released through the Counter Trafficking Data Collaborative (CTDC) – the first global data portal on human trafficking – and represents data from over 156 000 victims and survivors of trafficking across 189 countries and territories (where victims were first identified and supported).

It provides first-hand, critical information on the socio-demographic profile of victims, types of exploitation, and the trafficking process, including means of control used on victims – all of which is vital information needed to better assist survivors and prosecute perpetrators.

The new technology has enabled CTDC to share more data and allow more effective research to be conducted while protecting privacy and civil liberties. Access to additional attributes of victim case records will enable stakeholders to develop a more comprehensive understanding of this crime and the needs of survivors.

“Making data on human trafficking widely available to stakeholders in a safe manner is crucial to develop evidence-based responses,” says Harry Cook, programme co-ordinator in IOM’s migration protection and assistance division.

“Administrative data on identified cases of human trafficking represent one of the main sources of data available but such information is highly sensitive. IOM has been delighted to work with Microsoft Research over the past two years to make progress on the critical challenge of sharing such data for analysis while protecting the safety and privacy of victims.”

Microsoft Research has worked with IOM to develop a new algorithm to derive “synthetic data” from CTDC’s sensitive victim case data. Rather than systematically redacting cases, which results in a substantial amount of data being suppressed, the algorithm generates a synthetic dataset that accurately preserves the statistical properties and relationships in the original data.

However, the records of the synthetic dataset no longer correspond to actual individuals and each is constructed entirely from common attribute combinations.

This means that none of the attribute combinations in the synthetic dataset can be linked to distinctive individuals (or even small groups of distinctive individuals) in the sensitive dataset, or world at large.

Representative data on all of CTDC’s victim of trafficking cases are now available as a downloadable data file thanks to the new algorithm.

“Creating a simple process for privacy-preserving data sharing has the potential to coordinate and amplify the efforts of anti-trafficking organisations around the world,” says Darren Edge, director of societal resilience at Microsoft Research and project lead.

“We are grateful to IOM for our deep partnership in developing a new approach to data sharing that is grounded in the needs of the anti-trafficking community. By protecting the privacy and safety of victims with synthetic data, and empowering policymakers to view, explore, and make sense of data through rich interactive dashboards, we are showing one of the many ways in which research and technology can support the global fight against human trafficking.”

IOM and Microsoft Research began working together in July 2019 as part of the accelerator programme of the Tech Against Trafficking coalition.

The new privacy-preserving synthetic data solution, developed at Microsoft Research in the Python programming language, is also being made freely available via GitHub.

IOM aims to share the new technique with counter-trafficking organisations worldwide as part of a wider programme to improve the production of data and evidence on human trafficking.

This includes establishing new international standards and guidance to support governments in producing high-quality administrative data, in partnership with the UN Office on Drugs and Crime, and a package of data standards and information management tools for frontline counter-trafficking agencies.

By making this information openly and safely available, IOM and Microsoft hope to ensure the voices of victims and survivors are heard and protected while empowering governments and other stakeholders to take progressive action to end this crime.

CTDC is the first global data portal on human trafficking, combining victim case datasets from multiple counter-trafficking organisations