Document Readers

Reading and aggregating data from a large document is a challenging task to perform using semantic search tooling.

If you want to analyze a large body of text (e.g. to summarize, answer questions, or extract insights) you need to read the entire document and build up conclusions.

  • Parallelized: splits the document into chunks, calls several LLMs in parallel, and tasks each LLM to read a chunk of the document.

  • Sequential: splits the document into chunks, calls sing LLMs in parallel, and tasks the LLM to read each chunk of the document in order.

To solve these type of tasks, we offer four Document Reader nodes:

NodeFunctionalityApplication

Doc Q&A

Reads a data source and answers a question or aggregates information on it.

Database analysis, summarization, insight extraction.

Summarizer

Reviews a data source and builds summary of the content

Summarization.

Transcriber

Reads a data source and transcribes it using user instructions.

Database aggregation, document translation, code documentation.

Translator

Reads a data source and translates it from one language to another.

Translation.

  • Doc Q&A: This node splits a document into different pieces, runs the LLM over each piece, generates a partial result, and then refines that partial result in the next document piece.

    • Inputs:

      • User Request or Question.

      • Data loader.

    • Outputs:

      • LLM completion after reading all the pages.

    • Some facts:

      • This node is well suited for tasks like insight extraction and question answering.

      • Note that this node does not need a Vector DB.

  • Summarizer: This node splits a document into different pieces, runs the LLM over each piece, and then builds a summary.

    • Inputs:

      • Data loader.

    • Outputs:

      • The data loader summary.

    • Facts:

      • Note that this node does not need a Vector DB.

  • Transscriber: This node splits a document and builds a transcription of the document page by page under a set of user instructions.

    • Inputs:

      • Data loader.

      • User request or task.

    • Outputs:

      • The data loader transcribed.

    • Facts:

      • Note that this node does not need a Vector DB.

  • Translator: This node splits a document and builds a translation of the document page by page to a target language.

    • Inputs:

      • Data loader.

    • Outputs:

      • The data loader translation.

    • Facts:

      • Note that this node does not need a Vector DB.

Last updated