Add memory to an LLM

LLMs do not hold an internal state, and many applications require tracking previous interactions with the LLM as part of the interface (e.g. chatbots). To this end, you can add memory to an LLM node under the Stack AI tool.

To add memory to an LLM: 1) open your LLM settings (⚙️ Gear Icon) and 2) toggle the switch next to "Memory".

Some quick facts:

All the LLM memory is encrypted end-to-end in the Stack AI database.
- This data can be self-hosted under the Stack AI enterprise plan.
The LLM memory is user-dependent and an instance of the LLM memory.
- Once the deployed as an API, you can specify the user_id for the LLM memory for each user (see How to deploy).

We offer four types of memory modalities:

Sliding Window
- Stores all LLM prompts and completions.
  - The modality may consume many tokens as the LLM prompts can often occupy thousands of tokens.
- Loads a window of the previous prompts and completions as part of the LLM conversation memory, up to the number of messages in the window.
  - In non-chat models (e.g. Davinci), the memory is added as part of the prompt as a list of messages at the end of the prompt.
Sliding Window Input
- Stores all LLM completions and one LLM input parameter (e.g. in-0).
  - The modality is more token efficient and aligned with many applications (e.g. when you only need to store the user message from in-0)
- Loads a window of the previous inputs and completions as part of the LLM conversation memory, up-to the number of messages in the window.
  - In non-chat models (e.g. davinci-003-text), the memory is added as part of the prompt as a list of messages at the end.
Full History
- Stores all LLM prompts and completions.
  - The modality may consume many tokens as the LLM prompts can occupy thousands of tickets.
- Loads all the previous prompts and completions as part of the LLM conversation memory.
  - In non-chat models (e.g. Davinci) the memory is added as part of the prompt as a list of messages at the end of the prompt.
Full History Input
- Stores all LLM completions and one LLM input parameter (e.g. in-0).
  - The modality is more token efficient and aligned with many applications (e.g. when you only need to store the user message from in-0)
- Loads all the previous values of the input and LLM completions.
  - In non-chat models (e.g. davinci-003-text) the memory is added as part of the prompt as a list of messages at the end of the prompt

PreviousDescription of LLMs available NextHow to improve LLM Performance

Last updated 7 months ago