Building an Engineering Knowledge Agent with LlamaIndex, Ollama, and Intent-Driven Engineering

Mark Kendall
2 hours ago
4 min read

Building an Engineering Knowledge Agent with LlamaIndex, Ollama, and Intent-Driven Engineering

Introduction

One of the biggest challenges in modern software development is not writing code — it’s understanding systems.

Developers constantly ask questions like:

How does this integration work?
Where is the architecture documented?
Why was this service designed this way?
Which runbook applies to this incident?

The answers often exist somewhere — in documentation, Git repositories, Jira tickets, or architecture diagrams — but finding them can take hours.

Using Intent-Driven Engineering, combined with tools like LlamaIndex, LangChain, and local LLMs, we can build something powerful:

An Engineering Knowledge Agent.

This article walks through a complete playbook for building a TeamBrain-style AI assistant that allows engineers to ask questions about their systems and receive answers grounded in real documentation.

The entire system runs locally, requires no API keys, and can be generated using a single intent artifact with Claude Code.

What Is an Engineering Knowledge Agent?

An Engineering Knowledge Agent is an AI system that understands your engineering documentation and can answer questions about it.

Instead of manually searching documentation, developers simply ask questions like:

“How does the Nautobot to Salesforce integration work?”

The system retrieves relevant documentation, analyzes it, and produces an answer.

This approach is powered by Retrieval Augmented Generation (RAG).

RAG combines three capabilities:

Document indexing
Semantic search
Large language model reasoning

The result is an AI assistant that answers questions using your actual documentation, not generic training data.

Architecture Overview

The architecture behind this system is surprisingly simple.

It consists of four main components.

1. Documentation Sources

These are the documents you want the system to understand.

Examples include:

architecture documentation
integration specifications
runbooks
design documents
markdown documentation

These documents are stored locally in a data directory.

2. Knowledge Index (LlamaIndex)

LlamaIndex converts documentation into a searchable knowledge base.

The process works like this:

Documents → chunks → embeddings → vector index

This allows the system to retrieve relevant pieces of documentation based on meaning rather than keyword matching.

3. Local LLM (Ollama)

Instead of sending data to cloud APIs, the system uses a local language model.

This example uses:

Ollama + Llama 3

This allows the entire assistant to run locally without exposing internal documentation to external services.

4. Query Interface

Developers interact with the assistant through a simple command-line interface.

They ask questions, and the system retrieves documentation and generates answers.

Implementation Playbook

This section provides a step-by-step guide to implementing the system.

Step 1 — Install Ollama

First install Ollama.

Visit:

https://ollama.ai

After installation, start a model.

Example:

ollama run llama3

This launches the local language model used by the assistant.

Step 2 — Create the Project Structure

Create the following directory structure:

teambrain-agent/

app.py

ingest.py

requirements.txt

data/

architecture.md

The data folder will contain the documentation to index.

Step 3 — Install Dependencies

Create a file called requirements.txt.

llama-index

llama-index-llms-ollama

llama-index-embeddings-ollama

ollama

Install dependencies:

pip install -r requirements.txt

Step 4 — Add Documentation

Create a markdown document inside the data directory.

Example: architecture.md

# Nautobot to Salesforce Integration

The Nautobot to Salesforce integration is implemented as a Spring Boot microservice.

The service receives webhook events from Nautobot when location data changes.

The microservice transforms the Nautobot data model into the Salesforce location schema.

Kafka is used as the transport layer to guarantee delivery and reliability.

A dead letter queue handles message failures.

Observability is implemented using structured logging and metrics.

This document will become part of the assistant’s knowledge base.

Step 5 — Build the Knowledge Index

Create a Python script called ingest.py.

This script loads documents and builds the vector index.

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

from llama_index.llms.ollama import Ollama

from llama_index.embeddings.ollama import OllamaEmbedding

from llama_index.core.settings import Settings

Settings.llm = Ollama(model="llama3")

Settings.embed_model = OllamaEmbedding(model_name="nomic-embed-text")

documents = SimpleDirectoryReader("data").load_data()

index = VectorStoreIndex.from_documents(documents)

index.storage_context.persist(persist_dir="./storage")

Run the ingestion process:

python ingest.py

This builds the searchable knowledge index.

Step 6 — Launch the AI Assistant

Create a file called app.py.

This script loads the index and allows developers to ask questions.

from llama_index.core import StorageContext, load_index_from_storage

from llama_index.llms.ollama import Ollama

from llama_index.embeddings.ollama import OllamaEmbedding

from llama_index.core.settings import Settings

Settings.llm = Ollama(model="llama3")

Settings.embed_model = OllamaEmbedding(model_name="nomic-embed-text")

storage_context = StorageContext.from_defaults(persist_dir="./storage")

index = load_index_from_storage(storage_context)

query_engine = index.as_query_engine()

print("TeamBrain Assistant Ready")

while True:

question = input("Ask a question: ")

if question.lower() in ["exit","quit"]:

break

response = query_engine.query(question)

print(response)

Run the assistant:

python app.py

Step 7 — Ask Questions

You can now interact with the system.

Example:

Ask a question:

How does the Nautobot to Salesforce integration work?

The assistant will retrieve relevant documentation and generate an answer.

Using Claude Code and Intent-Driven Engineering

Instead of manually creating the project, you can generate the entire system using Claude Code.

Create an intent file that describes the system requirements.

Claude Code will then:

Generate the project structure
Write the code
Install dependencies
Build the knowledge index
Launch the assistant

This demonstrates the power of Intent-Driven Engineering, where developers describe what they want rather than writing everything manually.

Why This Matters

Engineering organizations suffer from knowledge fragmentation.

Information is spread across:

documentation
Git repositories
ticketing systems
internal chats
runbooks

An Engineering Knowledge Agent solves this by creating a central intelligence layer for engineering teams.

Developers can quickly understand systems, troubleshoot problems, and learn architectures.

The result is faster onboarding, fewer production issues, and stronger architectural understanding across teams.

Key Takeaways

AI assistants can be grounded in real engineering documentation using Retrieval Augmented Generation.
LlamaIndex provides a powerful framework for indexing and retrieving documentation.
Ollama allows organizations to run AI models locally without external APIs.
Intent-Driven Engineering enables systems like this to be generated automatically using tools like Claude Code.
Engineering Knowledge Agents can become the foundation for organizational engineering intelligence.

Building an Engineering Knowledge Agent with LlamaIndex, Ollama, and Intent-Driven Engineering

Recent Posts

Comments

Subscribe Form