The Tools Landscape for LLM Pipelines Orchestration (Part 1)

Micro-orchestration VS Macro-orchestration

Dec 02, 2024

∙ Paid

Micro-Orchestration
- Prompt Management
- Input preprocessing and output postprocessing
- Handling of model-specific parameters and configurations
- Chaining of multiple LLM calls within a single logical operation
- Integration of external tools or APIs at a task-specific level
- Tracing and Logging
Macro-Orchestration
- Complex Graphical Applications
- Stateful Application
- Agentic Design

For a long time, I was in love with LangChain, mostly because the documentation was structured to educate the users about LLM pipeline orchestration and showcased how they approached building a solution for implementing those pipelines. To some extent, all the existing frameworks took their own opinionated approach to provide solutions to the complexities around LLM pipeline orchestration.

Getting a wide overview of the different capabilities provided by those frameworks is a real learning experience in terms of what it means to build LLM applications, what the typical difficulties are, and how to address those. There are many overlaps in the capabilities of the different frameworks, but I tend to separate those by their specialties:

Micro-orchestration: I refer to Micro-orchestration as the fine-grained coordination and management of individual LLM interactions and related processes. It is more about the granular details of how data flows into, through, and out of LLM within a single task or a small set of related tasks.
Macro-orchestration: it is more about the high-level design, coordination, and management of complex workflows that may incorporate multiple LLM interactions, as well as other AI and non-AI components. It focuses on the overall structure and flow of larger systems or applications.
Agentic Design Frameworks: These frameworks focus on creating and managing autonomous or semi-autonomous AI agents that can perform complex tasks, often involving multiple steps, decision-making, and interaction with other agents or systems:
Optimizer frameworks: These frameworks use algorithmic approaches, often inspired by techniques like backpropagation, to optimize prompts, outputs, and overall system performance in LLM applications. The optimization process is typically guided by specific performance metrics or objectives.

As time went on, I realized it tends to be easier to implement myself the different utilities provided by micro-orchestration frameworks. They tend to over-complicate things, and it can take longer to debug those frameworks for a custom use-case than to implement everything from scratch by using the underlying APIs. However, It is important to not overlook the capabilities provided for tracing and logging of the different LLM calls.

I believe, however, that it is critical to look at the macro-orchestration frameworks more closely as they provide a higher level of control that is fundamental for building large applications.

Nevertheless, let’s review the utilities provided by micro and macro orchestration frameworks!

Micro-Orchestration

I refer to Micro-orchestration in LLM pipelines as the fine-grained coordination and management of individual LLM interactions and related processes. It is more about the granular details of how data flows into, through, and out of language models within a single task or a small set of closely related tasks. It can involve things like:

Prompt Management
Input preprocessing and output postprocessing
Data connection
Handling of model-specific parameters and configurations
Chaining of multiple LLM calls within a single logical operation
Integration of external tools or APIs at a task-specific level

The best examples of that are LangChain, LlamaIndex, Haystack, Semantic Kernel, and AdalFlow.

Prompt Management

All those frameworks, for the better or worse, have a way to structure the prompt inputs to a model. For example, in LangChain, we can wrap a string with the PromptTemplate class:

from langchain_core.prompts import PromptTemplate

prompt_template = PromptTemplate.from_template(
    "Tell me a joke about {topic}"
)

prompt_template.invoke({"topic": "cats"})

> StringPromptValue(text='Tell me a joke about cats')

For example, AdalFlow and Haystack use the Jinja2 package as the templating engine:

from jinja2 import Template

prompt_template = Template("Tell me a joke about {{ topic }}")
prompt_template.render(topic="cats")

> 'Tell me a joke about cats'

This may seem unnecessary in some cases, as we can do pretty much the same thing with the default Python string:

prompt = "Tell me a joke about {topic}"
prompt.format(topic="cats")

> 'Tell me a joke about cats'

However, this can help with maintenance and safer handling of user inputs as it allows for the enforcement of all the required variables. Let’s take, for example, how we create messages in Haystack:

from haystack.dataclasses import ChatMessage

ChatMessage.from_user("Tell me a joke about {topic}")

> ChatMessage(content='Tell me a joke about {topic}', role=<ChatRole.USER: 'user'>, name=None, meta={})

It is a Python data class that provides a more robust Python object to validate the text input than the simpler:

message = {    
    "content": "Tell me a joke about {topic}",
    "role": "user"
}

For example, in Langchain, we can create ChatPrompTemplate object that will parse all the information:

from langchain_core.prompts import ChatPromptTemplate

messages = [
    ("system", "You are an AI assistant."),
    ("user", "Tell me a joke about {topic}"),
]
prompt = ChatPromptTemplate.from_messages(messages)

> ChatPromptTemplate(input_variables=['topic'], messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=[], template='You are an AI assistant.')), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['topic'], template='Tell me a joke about {topic}'))])

And it becomes easier to manipulate the underlying data. For example, I can more easily access the input variables:

prompt.input_variables

> ['topic']

Also, the class is going to throw an error if the wrong role is injected:

messages = [
    ("system", "You are an AI assistant."),
    ("wrong_role", "Tell me a joke about {topic}"),
]
prompt = ChatPromptTemplate.from_messages(messages)

Although it is not groundbreaking, it provides intermediary checks across the code for data validation, so bugs are easier to detect.

In most cases, this allows for better integration of the prompting aspect with the rest of the software. For example, it is used in Langchain to integrate with the other components, such as models:

from langchain_openai import ChatOpenAI

model = ChatOpenAI()
chain = prompt | model
chain.invoke('cat') 

> AIMessage(content="Sure, here's a cat joke for you:\n\nWhy was the cat sitting on the computer?\n\nBecause it wanted to keep an eye on the mouse!", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 30, 'prompt_tokens': 23, 'total_tokens': 53, 'completion_tokens_details': {'audio_tokens': 0, 'reasoning_tokens': 0, 'accepted_prediction_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-040b7a4e-472a-45bc-8881-6dccf689cf74-0', usage_metadata={'input_tokens': 23, 'output_tokens': 30, 'total_tokens': 53})

This provides a shorthand notation for injecting prompts into a model in a controlled manner.

Having more control over the prompt object allows the implementation of prompt-specific operations. For example, here is how we can build a few-shots example prompt in Langchain:

from langchain_core.prompts import FewShotPromptTemplate
from langchain_core.prompts import PromptTemplate

# Define the example template
example_prompt = PromptTemplate.from_template(
    "Question: {question}\n{answer}"
)

# Examples
examples = [
    {"question": "What's 2+2?", "answer": "2+2 = 4"},
    {"question": "What's 3+3?", "answer": "3+3 = 6"}
]

# Build the prompt
prompt = FewShotPromptTemplate(
    examples=examples,
    example_prompt=example_prompt,
    suffix="Question: {input}",
    input_variables=["input"],
)

prompt.invoke({"input": "What's 5+2?"}).to_string()

> Question: What's 2+2?
2+2 = 4

Question: What's 3+3?
3+3 = 6

Question: What's 5+2?

And here is how you would do the same thing in Jinja2:

from jinja2 import Template

# Define the example template
example_template = Template("Question: {{ question }}\n{{ answer }}")

# Define the full prompt template
prompt_template = Template(
"""{% for example in examples %}
{{ example_template.render(question=example.question, answer=example.answer) }}
{% endfor %}

Question: {{ input }}"""
)

# Render the prompt
prompt = prompt_template.render(
    examples=examples,
    example_template=example_template,
    input="What's 5+2?"
)

Input preprocessing and output postprocessing

Another important aspect of micro-orchestration is the set of utility functions available to preprocess and post-process the data coming in and out of models. Most frameworks provide functionalities to load local data:

# LlamaIndex
from llama_index.core import SimpleDirectoryReader
loader = SimpleDirectoryReader("./book")
documents = loader.load_data()

# LangChain
from langchain.document_loaders import DirectoryLoader
loader = DirectoryLoader("./book")
documents = loader.load()

# Haystack
from haystack.components.converters import TextFileToDocument
from pathlib import Path
text_converter = TextFileToDocument()
documents = text_converter.run(
    sources=[str(p) for p in Path("./book").glob("*.txt")]
)

> {'documents': [Document(id=cdd554d8c6fb6987d37481b471114eadce6457a2ced36dbdc821d8f0dfdb4b32, content: '
  Chapter I.]
  
  
  It is a truth universally acknowledged, that a single man in possession
  of a goo...', meta: {'file_path': 'book/pride-and-prejudice.txt'})]}

Those frameworks provide support for a wide variety of data types and file extentions such as .txt, .pdf, .HTML, .md, .json, .csv, .docx, .xlsx, .pptx, … and they make it easy to inject data in the application. All those frameworks maintain a framework-specific Document class to handle text data and its metadata moving around. All those frameworks provide text splitters capabilities to split the text in smaller manageable chunks of text that can be passed to LLMs:

# LangChain
from langchain.text_splitter import RecursiveCharacterTextSplitter
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
docs = splitter.split_documents(documents)

# LlamaIndex
from llama_index.core.node_parser import SentenceSplitter
splitter = SentenceSplitter(chunk_size=1200, chunk_overlap=100)
nodes = splitter.get_nodes_from_documents(documents)

# Haystack
from haystack.components.preprocessors import DocumentSplitter
splitter = DocumentSplitter(split_by="sentence", split_length=3)
docs = splitter.run(documents['documents'])

# AdalFlow
from adalflow.components.data_process.text_splitter import TextSplitter
splitter = TextSplitter(split_by="word",chunk_size=50, chunk_overlap=1)
docs = splitter.call(documents=docs)

None of those methods are hard to implement, but they are often useful utilities that are worth using.

Post-processors can be very useful to convert the free-form text output of LLMs into structured data that can be used programmatically. All those frameworks contain multiple types of parser. Here, for example, how we could parse a JSON formatted string into a Pydantic model in LangChain and LlamaIndex:

from pydantic import BaseModel, Field
from typing import List

class Actor(BaseModel):
    name: str = Field(description="name of an actor")
    film_names: List[str] = Field(description="list of names of films they starred in")

json_str = '{"name": "Tom Hanks", "film_names": ["Forrest Gump"]}'

# Langchain
from langchain_core.output_parsers import PydanticOutputParser
parser = PydanticOutputParser(pydantic_object=Actor)
parser.parse(json_str)

# llamaindex
from llama_index.core.output_parsers import PydanticOutputParser
parser = PydanticOutputParser(output_cls=Actor)
parsed = parser.parse(json_str)

> Actor(name='Tom Hanks', film_names=['Forrest Gump'])

In Langchain, we can even use the help of another LLM to correct the format in case the previous misformatted the output. For example, the following is not a correct jSON string:

misformatted = "{'name': 'Tom Hanks', 'film_names': 'Forrest Gump']"

But we can create a new parser to reformat the output correctly:

from langchain.output_parsers import OutputFixingParser
new_parser = OutputFixingParser.from_llm(
    parser=parser, llm=ChatOpenAI()
)
new_parser.parse(misformatted)

> Actor(name='Tom Hanks', film_names=['Forrest Gump'])

Another useful post-processing is when we need to rerank the documents coming from a datastore retrieval, for example.

Here is how we can rerank documents in Haystack to induce more diversity in the provided document based on a specific query:

from haystack.components.rankers import SentenceTransformersDiversityRanker

ranker = SentenceTransformersDiversityRanker(
    model="sentence-transformers/all-MiniLM-L6-v2", 
    similarity="cosine"
)
ranker.warm_up()

query = "How can I maintain physical fitness?"
docs = ranker.run(query=query, documents=docs['documents'])

Handling of model-specific parameters and configurations

One important aspect of those frameworks is to abstract away the specificity of the third-party APIs or models you chose to build your pipelines. The way the models are used is uniformized across the different APIs. For example, in LangChain, we can instantiate an LLM object interacting with the OpenAI API:

from langchain_openai import ChatOpenAI

# OpenAI model
llm = ChatOpenAI(
   model="gpt-4o-mini",
   temperature=0.7,
)

but we can do the same thing for a local model when using Llama.cpp:

from langchain_community.llms import LlamaCpp

# Local model using llama.cpp
llm = LlamaCpp(
   model_path="./models/mistral-7b.gguf",
   temperature=0.7,
   max_tokens=500,
   n_ctx=2048,
   n_gpu_layers=1  # Number of layers to offload to GPU
)

As far as the rest of the code is concerned, we just use an LLM object that is independent from the underlying model, and we can predict without thinking of the specific API:

chain = prompt | llm
chain.invoke('cat')

That is where the value of those frameworks becomes interesting. When we build pipelines, we need to integrate multiple tools together, such as LLM APIs or datastore, and we want to use different LLMs or tools for different cases without needing to adapt the code for it. So, those frameworks provide a uniformized platform that will take away the complexity of integration. Most of those frameworks support those LLM providers:

Those integrations will be able to handle the model-specific configurations when instantiating.

Chaining of multiple LLM calls within a single logical operation

Keep reading with a 7-day free trial

Subscribe to The AiEdge Newsletter to keep reading this post and get 7 days of free access to the full post archives.