Privacy Risks posed by Persistent Memory in LLM applications

drajendr
Aug 22
7 min read

Many of you might be aware of the (not so) new feature by ChatGPT which was announced on 10^th April 2025 stating that ChatGPT can now reference old chat history to make your user experience more enhanced. This made me question certain things - How does ChatGPT do that? What has changed? What are the new threats to privacy due to this implementation?

As an AI Security enthusiast, I put on my white hat and started my research journey on this topic as soon as I got the opportunity to do so. As someone who aspires to break into the field of AI Security, I was able to learn a ton about the different facets of how a commercial tool like ChatGPT works, while conducting this independent research

This article entails the details that I have found during my research ranging from the evolution of memory from context windows to persistent memories and the privacy risks posed by persistent memory.

What are context windows?

During your conversation with a chatbot, your entire conversation is passed on to the LLM for processing. This happens every single time you query – until you exceed the context window size. Then the conversation forwarded gets cut off based on the allocated context window size.

I found a reddit user’s (porespellar) ^[1] explanation of context window particularly very memorable – Imagine you are scrolling through a very long webpage. And imagine the monitor as the context window. Also assume that your mouse’s scroll cannot scroll back up as you go down. As you scroll down, the top part of the page becomes unseen. Similarly, this is how a chatbot forgets your earlier chats when it goes “out of scope” as you keep conversing with it. It cannot remember what was going on before and only remembers what is available to it within the defined context window.

Currently, GPT5 offers a massive context window of 400,000 tokens which roughly translates to around 600-1200 pages of text (including your questions and response from ChatGPT). Claude Opus 4.1 comes with a context window of 200,000 tokens.

What does persistent memory mean?

Use of persistent memory happens when the application remembers certain user-specific information outside the context window. These details were saved earlier and reloaded when a new session starts. This happens in the application layer for chatbots and is external to the LLM.

An application maintains persistent memory by summarizing or selectively storing important information outside the context windows in vector databases, which can be easily accessed by the LLM with each query.

So how does that work, exactly?

To make this work, LLMs make use of what is commonly called a RAG technique – Retrieval-Augmented Generation – to store long term chat histories. A RAG model consists of a vector database that stores relevant and important user information as embedded chunks. The LLM first checks this database to fetch additional information that is relevant to the user’s query. With this additional information, the model is better equipped to answer the user. This makes the response more accurate and optimized.

What makes information “relevant” enough to save it as a memory?

User phyde1001 had written a post in the OpenAI community forum ^[2] which stated that relevancy is determined by ChatGPT with the following 5 important topics – Semantic Marking, Recency bias (Your recent chats are given more importance and priority than your historical chats), Frequency (The more you talk about a topic the more relevance it gets in the memory), Intent (Based on the intention of the question asked) and Confidence & fit (How useful a detail is for improving future responses)

How are relevant conversations saved?

When a conversation is saved, it’s not saved as texts. Instead, information is stored as embeddings – numerical representation of meanings – which helps it understand the essence of your information rather than the raw text itself. This is called Semantic Marking, and it helps to give structure to an unstructured conversation. With efficient chunking, embedding and retrieval strategies a RAG model can amplify the accuracy and efficiency of an LLM.

This helps to solve issues related to LLMs being trained with a fixed set of training data leading to outdated answers to prompts. Additional data can now be added to the vector database when needed, keeping it always up to date. It also solves the large language model’s inability to provide an accurate source for the information that it gives.

How is this executed by ChatGPT?

Prasad Thammineni ^[3] explains in his blog that ChatGPT categorizes and stores user details into 3 distinct memory types –

1. Short-term Context Memory: This contains all the immediate context window information that is required to keep a good flow of conversation and information.

2. User Profile Memory: This is where all the user information and preferences are stored. Over time, this builds up, allowing GPT to give better responses by referencing this past history.

3. Episodic Long-Term Memory: This is where unstructured conversation data is stored as structured information (embedded chunks in vector databases) with metadata and timestamps.

When the user inputs a prompt, the memory retrieval process is triggered. The system searches the conversational history to find relevant information present in the query and simultaneously retrieves the user profile information. This information is formatted into user presentable sentences with the help of contextual memory. The new conversation is then logged into the episodic memory and any changes in user preference is recorded in the user profile memory.

Thus, ChatGPT leverages a context window, saved user preferences, and retrieval augmented generation (RAG) techniques with a database of chat history to maintain persistent memory and service optimization.

How does this affect users’ Privacy?

Anything that remembers your private information for longer than needed can impose Privacy issues. OpenAI encourages its users to not share sensitive information to ChatGPT unless it’s necessary. Additionally, it is encouraged to ask ChatGPT to erase such memories to remove them from being stored in the database for future use.

“We’re taking steps to reduce bias and have trained ChatGPT not to proactively remember sensitive information, like health details, unless you explicitly ask it to. We are continuing to improve how the model handles this type of information. You’re in control: you can review and delete saved memories, ask what ChatGPT remembers about you, delete specific conversations, and provide feedback so we can improve. If Memory is enabled, please avoid entering information you wouldn’t want remembered.” ^[4]

If you’re wondering “Well, I don’t really mind GPT keeping my information for long term” then the rest of this post is for you.

What are the safety risks associated with persistent memory?

Privacy Leakage

Privacy leakage occurs when a large language model remembers sensitive training data, and it gets leaked into prompt outputs or is maliciously inferred by adversaries through prompt injection. While this by itself is harmful, imagine what could happen when your personal history is used for training the model?

As per OpenAI’s terms and conditions for using ChatGPT, persistent memory and the option to use your data for improving the model is turned on by default. This means that your personal history is now acting as a training material for their model to customize and optimize the way ChatGPT answers your queries. This feature is switched off for Team, Enterprise, Edu customers.

If you have the “Improve the model for everyone” setting turned on, we may use content you’ve shared with ChatGPT, including past chats, saved memories, and memories from those chats, to help improve our models. ^[4]

Yeah, that’s going to cost under an attack.

Memory Poisoning

If a malicious actor gets access to the vector database that stores long term memories, they can corrupt the data causing a shift in the model’s predictive power. This can cause the model’s responses to be biased or even harmful.

Cross-User / Cross-session drift

ChatGPT users that have their account shared with multiple people can face major privacy risks if the model leaks sensitive content belonging to one user in the prompt response for another user without understanding the distinction between users. This can happen across users that do not share their accounts as well.

Historical Exposure

An insecure memory store could reveal a user's entire decision history, behavioral patterns, and personal interactions, creating a massive privacy risk.

So, how can you protect your data?

To stay safe during your use of AI Chatbot assistants like ChatGPT, you must avoid sharing sensitive information, review privacy policies, and manage chat history and memory settings. The simplest trick would be to turn “Reference chat history” off in your settings. Making sure there are no sensitive information stored in saved custom instructions would serve as additional security. Keeping up and being aware about the latest updates and trends in AI can help users stay more vigilant.

What does this mean to you as a Security Professional?

In the evolving landscape of AI, making sure that security professionals are aware and aligned with the security and privacy risks that the new advancements in AI bring forth, is an important factor. The more you know, the more you can keep things safe and secure. Through this research, I was able to discover and regain control over what my ChatGPT account had stored about me without my knowledge. This helped me bring my sensitive data one step closer to safety.

What should it mean to you as a user?

The convenience of having a personal assistant that remembers your past conversations introduces a new persistent form of surveillance that users are not aware of. I am pretty sure we will be surprised with the percentage of people who know that their history is being used to train the model.

At the end of the day, users must make sure that they are not sharing their Personally Identifiable Information and other sensitive information unnecessarily. Regulations and responsible governance can only take you so far. Users must stay aware and take active steps to safeguard their information. Stay cautious and protect your privacy!

References

[1] Porespellar, 2024, “How does an LLM retain conversation memory”, https://www.reddit.com/r/ollama/comments/1edan5c/comment/lf6odc3/

[2] phyde1001, 11 Apr 2025, “ChatGPT Memory and Chat History Usage Practicalities” https://community.openai.com/t/chatgpt-Smemory-and-chat-history-usage-practicalities/1229848#p-1648702-h-5-use-memory-intentionally-5

[3] Prasad Thammineni, 22 April 2025, “Reverse Engineering Latest ChatGPT Memory Feature (And Building Your Own)” https://agentman.ai/blog/reverse-ngineering-latest-ChatGPT-memory-feature-and-building-your-own

[4] OpenAI, “Memory FAQ”, https://help.openai.com/en/articles/8590148-memory-faq

Comments