Offline Data Loaders

LLMs are limited by the fact their prompts can only handle a limited amount of text. However, most interesting data sources contain large bodies of text which often occupy hundreds of pages.

To solve this problem, Stack AI offers some pre-built components that help you load data and review it.

If you want to chat with a document or answer questions about you will want to retrieve the most relevant pieces of the document and ask an LLM to find the answer among them.

We offer three nodes that solve this task:

  • Doc + Search: Splits a list of documents, uploads the embeddings to a Pinecone index, runs a query, and returns the most relevant segments of text from the documents.

    • Inputs:

      • Input Query (coming from user input or another LLM).

    • Outputs:

      • Most relevant pieces of the document are in a string.

    • Quick facts:

      • Once the embeddings are stored in pinecone the file state switches to "LEARNED".

      • Deleting a file will also delete the embeddings from pinecone.

  • URL + Search: Scraps a list of URLs, splits their text, uploads the embeddings to a Pinecone index, runs a query, and returns the most relevant pieces of the document.

    • Inputs:

      • Input Query (coming from user input or another LLM).

    • Outputs:

      • Most relevant pieces of the document are in a string.

    • Quick facts:

      • Once the embeddings are stored in pinecone the file state switches to "LEARNED".

      • Deleting a file will also delete the embeddings from pinecone.

  • Text + Search: Splits a very long user input, stores the embeddings in memory, runs a query, and returns the most relevant pieces of the document.

    • Inputs:

      • Input Query (coming from user input or another LLM).

    • Outputs:

      • Most relevant pieces of the document are in a string.

    • Quick facts:

      • The text in this node is exposed in the API as an input.

Last updated