Science 3 min read

Even with anonymised datasets, AI can identify humans

By Science Gazette25 January 2022

Weekly social contacts provide distinct signatures that distinguish persons.

At least in the eyes of artificial intelligence, how you engage with a crowd may help you stand out.

Researchers report in Nature Communications on January 25 that when given information about a target individual’s mobile phone interactions as well as the interactions of their contacts, AI can correctly pick the target out of more than 40,000 anonymous mobile phone service subscribers more than half of the time. The results imply that people interact in ways that may be exploited to identify individuals in apparently anonymised databases.

According to Jaideep Srivastava, a computer scientist at the University of Minnesota in Minneapolis who was not involved in the work, “it’s no surprise that individuals prefer to stay within established social circles and that these frequent contacts develop a stable pattern over time.” “However, it’s astonishing that you can utilize that pattern to identify the person.”

Companies that gather information about people’s everyday activities may share or sell this data without their knowledge, according to the General Data Protection Regulation of the European Union and the C alifornia Consumer Privacy Act. The only snag is that the information must be anonymised. According to Yves-Alexandre de Montjoye, a computational privacy expert at Imperial College London, some businesses may believe they can achieve this criteria by offering users pseudonyms. “Our findings suggest that this is not the case.”

People’s social conduct, de Montjoye and his colleagues argued, may be used to identify individuals in databases including information on anonymous users’ interactions. To test their theory, the researchers trained an artificial neural network to spot patterns in users’ weekly social contacts. An artificial neural network is an AI that models the neural architecture of a real brain.

The neural network was trained using data from an undisclosed mobile phone provider that tracked 43,606 users’ interactions over a 14-week period for one test. The date, time, length, type (call or text), pseudonyms of the people engaged, and who started the conversation were all included in this data.

The interaction data of each user was arranged into web-shaped data structures, with nodes representing the person and their contacts. The nodes were joined by strings threaded with interaction data. The AI was presented a known person’s interaction web and then given free reign to search the anonymized data for the web that looked the most like it.

When provided interaction webs comprising information about a target’s phone conversations that happened one week after the last records in the anonymous dataset, the neural network only connected 14.7 percent of people to their anonymised self. When provided information on the target’s interactions as well as those of their contacts, it was able to identify 52.4 percent of individuals. When the researchers fed the AI interaction data from the target and contacts 20 weeks after the anonymous dataset, the AI accurately identified users 24.3 percent of the time, suggesting that social conduct can be tracked for lengthy periods of time.

The researchers used a dataset of four weeks of close-proximity data from the mobile phones of 587 anonymous university students obtained by academics in Copenhagen to explore whether the AI could analyze social behavior abroad. This information contained pseudonyms for students, encounter times, and the intensity of the received signal, which indicated closeness to other students. COVID-19 contact tracing software often gather these parameters. The AI properly recognized students in the sample 26.4 percent of the time when given a target and their contacts’ interaction data.

The results, according to the researchers, are unlikely to apply to Google’s and Apple’s contact tracing techniques, which safeguard users’ privacy by encrypting all Bluetooth information and prohibiting the acquisition of location data.

De Montjoye expects that the findings will aid policymakers in developing better ways to secure consumers’ identities. According to him, data protection regulations allow for the exchange of anonymised data in order to encourage beneficial research. “However, in order for this to function, we must ensure that anonymization really protects people’s privacy.”

The Energy Problem AI Cannot Solve for Itself

Efficiency is not the answer. It never was. The question was always about continuity.

14 May 2026

Science

Clean Energy Pledges Meet Their Hardest Test Inside the Data Center

Every time a large language model answers a question, something burns. Not metaphorically.

13 May 2026

Energy

The Circuit Nobody Built Yet: Neutrinos, Jobs, and the Communities Waiting for Both

The science is moving fast. The question nobody's asking is what it means for the rest of us.

5 May 2026