Meta AI emerges as the most data-hungry chatbot

Meta recently introduced its chatbot app, Meta AI – and it has become the new data king.

According to a study by cybersecurity company Surfshark, Meta AI collects user data like no app before. It stands out among all analysed chatbots by collecting 32 out of 35 data types, which is more than twice the average.

Meta AI collects the most user data among the analysed apps, gathering 32 out of 35 possible data types — over 90% of the total. It is also stands out from all the others because it is the only one chatbot app that collects data across categories such as financial information, health and fitness, and even sensitive information, which includes racial or ethnic data, sexual orientation, pregnancy or childbirth information, disability, religious or philosophical beliefs, trade union membership, political opinion, genetic information, or biometric data.

Additionally, only Meta AI and Copilot collect data linked to user identity for purposes such as displaying third-party ads in the app or sharing data with third parties that display third-party ads. While Copilot lists two data types, such as Device ID and Advertising Data, used for this purpose, Meta AI may use up to 24 different data types.

“Meta is an ecosystem that collects user data across platforms like Facebook, Instagram, and Audience Network for displaying third-party ads, and now it’s doing the same through Meta AI,” says Karolis Kaciulis, leading system engineer at Surfshark. “This chatbot learns from public posts, photos, and texts, as well as new data shared by users, which is an example of gross misconduct and mishandling of user data. Generative AI should not be trained on user data, and this highlights why regulations for AI are an urgent necessity.”

The average number of collected types of data is 13 out of a possible 35 for the analysed AI chatbot apps. 45% of the apps collect users’ locations. Additionally, nearly 30% of these apps track user data. Tracking refers to linking user or device data collected from the app with third-party data for targeted advertising or advertising measurement purposes or sharing it with a data broker.

AI chatbots learn from diverse sources of information, with Meta AI having the additional factor of learning from Facebook and Instagram posts and images. As they gather massive amounts of data, including public posts and user-provided content, the results we receive can vary and often be incorrect due to inaccuracies in their training data. The latest example of how X’s Grok responded to unrelated prompts and discussed nationalist themes with X users highlights the challenge we have with current generative AI standards.

“People should keep in mind that even though these chatbots may provide you with a quick answer, the results they get are mediocre,” Kaciulis comments. “Why is that? AI chatbots are being fed with all kinds of information and the majority of it can be inaccurate. Every person is responsible for the results they provide at their job, but generative AI is not; it is unaccountable and is not legally subject to the same scrutiny as a human.”

Be careful when sharing information with chatbots

Google Gemini collects 22 unique data types. This includes precise location data, which only Gemini, Meta AI, Copilot, and Perplexity collect. Gemini also collects a significant amount of data across various other categories, such as contact info (name, email address, phone number, etc.), user content, contacts (such as a list of contacts in the user’s phone), search history, browsing history, and several other types of data.

ChatGPT collects 10 types of data, such as contact information, user content, identifiers, usage data, and diagnostics, while avoiding tracking data or using third-party advertising within the app. While ChatGPT collects chat history, it is possible to use temporary chats, which auto-delete all data after 30 days, or to request the removal of personal data from training sets.

Copilot, Poe, and Jasper are the three apps that collect data used to track you. This data could be sold to data brokers or used to display targeted advertisements in your app. While Copilot and Poe only collect device IDs, Jasper collects device IDs, product interaction data, advertising data, and other usage data, which refers to “any other data about user activity in the app”.

According to Kaciulis, when using chatbots, users pay not only in money for subscriptions but also in personal data. “As a human being, especially in Europe, where GDPR protects user rights, personal data belongs to you, not to corporations or AI systems. Sharing it with generative AI can lead to it being stored, analyzed, and used without your full control, risking targeted manipulation, identity theft, or misuse.

“Also, people should be aware that things AI learns from your personal data can not be unlearned. It’s important to protect your privacy and online integrity in an age where personal data is increasingly treated as a commodity.”

Methodology

For the study, Surfshark identified the 10 most popular AI chatbots, with Meta AI added as an additional app on 20 May 2025, and analysed their privacy details on the Apple App Store.

The comparison was based on how many types of data each app collects, whether it collects any data linked to you, and whether the app includes third-party advertisers.

It also checked the privacy policies of DeepSeek and ChatGPT to better understand what kind of data is kept on servers and for how long.