Data Loaders

Integrating LLM with your own custom data is essential for building an application. To achieve this, we allow you to import data from any source seamlessly. Our data loaders do the following:

Load the data from the source.
Convert the data to text or arrays.
Split the data into smaller segments (with overlapping content).
Return the list of segments of the data.

Because of this process, the data loaders cannot be connected directly to an LLM but rather to a vector database (see Vector Databases).

We offer support for the following data loaders:

String: loads a large body of text as one of your inputs (which cannot fit into an LLM prompt).
1. Parameters: Text (exposed to the API)
2. Outputs: List of text segments (connects to a vector database).

Upload: this allows you to upload file types (such as .txt, .csv, .html, .pdf, .py, .md, and others) to convert them into text data.
1. Parameters: Files uploaded by the user.
2. Outputs: List of text segments (connects to a vector database).

WebScrapper: load an URL and scrap its HTML into markdown text.
1. Parameters: URL, Modality (full HTML or meta-data) (exposed to the API)
2. Outputs (in HTML mode): List of text segments (connects to a vector database).
3. Outputs (in meta-data mode): Returns the meta-data of the website as text (can be connected to an LLM).

Google Search: performs a Google Search, returns the top results, scraps the HTML of the top results, and returns the text segments of the most relevant results.
1. Parameters: API Key for SerpAPI.
2. Inputs: search criteria (Text from user input or LLM output)
3. Outputs: List of text segments (connects to a vector database).

Notion: loads the page and subpages of a Notion database as markdown text.
1. Parameters: Client secret and database ID. (See here how to get them https://developers.notion.com/docs/create-a-notion-integration).
2. Outputs: List of text segments (connects to a vector database).
MongoDB: loads documents of a MongoDB collection as a list of JSONs.
1. Parameters: database, collection, and URI.
2. Inputs: MongoDB query in PyMongo format (Text from user input or LLM output).
3. Outputs: List of text segments (connects to a vector database).

Postgres: loads rows of a Postgres database as a list of JSONs.
1. Parameters: database, username, password, host url, port.
2. Inputs: SQL query (Text from user input or LLM output)
3. Outputs: List of text segments (connects to a vector database)

Airtable: loads rows of an Airtable database as a list of JSONs.
1. Parameters: API key (https://support.airtable.com/docs/creating-and-using-api-keys-and-access-tokens), Base id, and table id (https://support.airtable.com/docs/finding-airtable-ids).
2. Outputs: List of text segments (connects to a vector database).
YouTube Node: transcribe videos from YouTube with this node. You can then send the transcript to LLM nodes for further processing and/or summarization. Here is a full video explaining how to use it (video).
Slack (coming soon).
Big Query (coming soon).
Zoom (coming soon).
Zendesk (coming soon).

PreviousRun OpenAI models in Azure (Enterprise)NextLoading data from Databases

Last updated 10 months ago