Data Loaders vs. Offline Data Loader

This page outlines the differences between online and offline data loaders.

We offer two mechanisms to load data to an LLM, tailored to different applications:

  1. Data Loaders: read documents or stream datasets online, every time you run your flow. If you want only to retrieve a segment of the data for the LLM, you can connect the data loader to a Vector Database (Vector Databases).

  2. Offline Data Loaders: upload documents/urls/data to a vector database offline, when drop data in the node, and retrieve the most relevant data online, every time you run your flow.

The difference between these mechanisms lies in their online vs. offline nature.

  • A Data Loader + a Vector DB:

    • Offline: 1) setup the parameters or 2) upload the file

    • Online: 1) load the data, 2) chunk it, 3) compute embeddings, 4) upload the embeddings to the vector DB, and 5) make a query in the vector DB.

  • An Offline Data Loader:

    • Offline: 1) setup the parameters or 2) upload the file, 3) load the data, 4) chunk it, 5) compute embeddings, 5) upload the embeddings to the vector DB

    • Online: 1) make a query in the vector DB.

Document search is much faster for your flow at inference, but the search data will be static.

(A Data Loader) + (A Vector Database) + (Offline upload) = (Offline Data Loader)

or

(A Data Loader) + (A Vector Database) = (Offline Data Loader) + (Online upload)

The following table outlines some practical examples:

Last updated