agent·llm

Hands-on guide · built from real code

How an
LLM coding agent works

You hear it everywhere — “agent”, “copilot”, “AI that codes on its own”. Behind the word, the mechanism comes down to one simple idea: a language model placed in a loop, with tools to act. This guide takes it apart — from a real agent written in Go, in ~1000 lines and zero dependencies.

Free · Local · Offline

The same mechanism that powers the big coding assistants — here in the clear and open. Plug in a local model and get an agent that reads your files, runs your commands and fixes its own mistakes.

the engine of Cursor Claude Code Copilot Codex
  • No subscription — open-source (MIT)
  • 100% on your machine (LM Studio, Ollama…)
  • Works even without an internet connection
01

What is an agent?

A language model (LLM) on its own can only produce text. Ask it a question and it answers — but it cannot do anything: it cannot read a file, run a command, or check its own work. It is a brain without hands.

An agent is that same model given two things: tools (hands to act on the world) and a loop (the ability to act several times in a row, observing the result of each action before deciding the next). Nothing more. The “magic” of coding agents lives entirely in these two ingredients.

Chatbot

You type → it replies. One pass. Text is the end of the road.

message reply
Agent

You type → it acts, observes, repeats, until the task is actually done.

message thinking action observation

This whole guide is based on a real, minimal agent. It talks to any OpenAI-compatible server (LM Studio, Ollama, a local model, a cloud API). No framework, no SDK: just Go's standard library, so that every mechanism stays readable and demystified.

02

The agentic loop

The heart of every agent is a loop. On each turn (one user message), the agent repeats a step as long as it has something to do:

🧑 → 🤖 we send the historyto the model 💭 decides reply?call a tool? 🔧 acts runs the requestedtool 📥 observes the result goes backinto the history ↻ repeat
One step = one round-trip with the model. As long as the model asks for a tool, we run it and loop again. When it replies without asking for a tool, the turn is over.

In the code, this is the handleTurn function. Read it like a recipe: ask the model, check whether it wants a tool, if so run it and put the result back into the history, then repeat.

for step := 0; step < a.maxSteps; step++ {
    a.compactContext(ctx)                       // keep the context under control

    msg, err := a.client.Chat(ctx, a.history, a.registry.Schemas())   // 1. ask the model
    if err != nil { /* hand control back cleanly */ return }
    a.history = append(a.history, msg)          // remember its reply

    if len(msg.ToolCalls) > 0 {                 // 2. does the model want to act?
        for _, call := range msg.ToolCalls {
            result := a.runTool(ctx, call.Function.Name, parseJSONArgs(call.Function.Arguments))
            a.history = append(a.history, Message{   // 3. the observation goes back to the model
                Role: "tool", ToolCallID: call.ID,
                Name: call.Function.Name, Content: result,
            })
        }
        continue                                // 4. loop again
    }
    return                                      // no tool requested → turn done
}
Why a step limit? An agent can get it wrong and repeat the same action forever. A bound (maxSteps) guarantees a turn always ends. The code even adds a loop detector: if the agent asks for exactly the same action as the previous step, we stop — it is no longer making progress.
03

The tools: the agent's hands

A tool is simply a function the model can decide to call. In this agent, a tool is a pure function (context, arguments) → (result, error). It does no display of its own: it computes a result, full stop. That is what makes it easy to write, test and reuse.

type Tool struct {
    Name        string                  // e.g. "read_file"
    Description string                  // read by the model to know when to use it
    Parameters  map[string]any          // the JSON schema of the arguments
    Confirm     func(args) (bool, string)  // optional guard (risky actions)
    Run         ToolFunc                // (ctx, args) → (result, error)
}

The agent ships three basic tools — that is all you need to code:

📖

read_file

Reads the contents of a text file.

✏️

write_file

Writes or overwrites a file.

execute_shell

Runs a command (build, tests, git…).

How does the model know which tools exist? We describe each one in a schema sent with every request. The model reads the name, the description and the parameters, then it chooses which to call.

r.Register(Tool{
    Name:        "read_file",
    Description: "Reads the contents of a text file.",
    Parameters: map[string]any{
        "type": "object",
        "properties": map[string]any{
            "path": map[string]any{"type": "string", "description": "Path of the file to read"},
        },
        "required": []string{"path"},
    },
    Run: toolReadFile,
})
Adding a tool = adding a capability. Web search, a database query, an API call… you just register a new Tool. The rest of the loop doesn't change. That is the whole power of the model: the tool registry is extensible without touching the engine.
04

Talking to the model

All that's left is to connect the model. The conversation is just a list of messages (system, user, assistant, tool results) that we send back in full at each step — the model has no memory of its own, the context is the memory.

Function calling: how the model “calls” a tool

The model doesn't run the tool itself. It returns a structured intent — “I want to call read_file with path=math.go” — and it's our code that runs it, then returns the result.

your code the model history + list of tools tool_call: read_file(path="math.go") run the tool result: "func Sum(a, b int)…"
The model decides what to call; your code decides how to run it. The separation is clean.

Streaming: watching the answer take shape

Rather than waiting for the full answer, we read a stream (SSE) and display each fragment as it arrives. Tool calls, for their part, arrive in pieces that we reassemble by their index.

for _, d := range delta.ToolCalls {
    tc := toolCalls[d.Index]                 // one fragment per tool index
    if d.Function.Name != "" { tc.Function.Name = d.Function.Name }
    tc.Function.Arguments += d.Function.Arguments   // arguments arrive in chunks
}

The safety net: what if the model can't call tools?

Not every model supports native function calling. So the agent provides a fallback: if the model writes its intent as text (Action: read_file(path="…")), we detect it with a regular expression. A subtlety: we only accept the Action: prefix at the start of a line — otherwise, when the model recaps its actions, we would re-run them in a loop.

// (?m): ^ anchors at the start of a line — avoids re-running an action quoted
//       in a recap ("1. Action: write_file(...)").
pattern := `(?sm)^[ \t]*Action\s*:\s*(` + strings.Join(names, "|") + `)\s*\(\s*(.*)\)`
05

Safety rails

Letting a model run shell commands is powerful — and dangerous. A good agent isn't just a loop: it's a cautious loop. Four protections, simple but essential:

1

Confirming risky actions

A command matching a dangerous pattern (rm -rf, sudo, dd, a fork bomb…) asks for human approval before running.

2

Loop detection

If the agent repeats exactly the same action, we stop the turn: it isn't making progress, no point burning tokens.

3

Timeouts

Every command and every model call has a time limit. A stuck command never freezes the agent.

4

Output truncation

A tool's result is capped before being fed back: a huge output won't saturate the context.

var dangerousPatterns = []*regexp.Regexp{
    regexp.MustCompile(`\brm\s+-[a-zA-Z]*[rf]`),   // rm -rf
    regexp.MustCompile(`\bdd\s+if=`),
    regexp.MustCompile(`:\s*\(\)\s*\{`),           // fork bomb
    regexp.MustCompile(`\b(shutdown|reboot|halt)\b`),
    regexp.MustCompile(`\bsudo\b`),
    // …
}
An error isn't a disaster. When a tool fails, the agent doesn't crash: it returns the error to the model as an observation. The model reads the message, understands what happened, and fixes it on its own at the next step. Self-correction emerges from the loop.
06

Memory & context

At each step the history grows: messages, tool calls, results. But a model's context window is limited. Do nothing and you eventually overflow it. The agent's solution: compact.

When the history exceeds a token budget, we keep the recent messages intact (the work in progress), and we ask the model to summarize the older ones in a few bullet points. The summary replaces the old messages. Long-term memory becomes compact, short-term memory stays precise.

before after summary 4 old messages recent kept
Older exchanges are condensed into a summary; recent messages stay intact.
if totalTokens(a.history) <= a.maxCtx { return }   // under budget: nothing to do

older  := rest[:keepFrom]                          // the older messages
recent := rest[keepFrom:]                           // the ~60% most recent, kept as-is

summary, err := a.client.Summarize(ctx, older)      // the model summarizes the old ones
// → [initial system] + [summary] + [recent messages]
07

Putting it together

We have all the pieces. The main program wires them up: it creates the tool registry, the model client, and starts a read loop (a REPL). Each user message triggers an agent turn — the loop we just dissected.

And the agent's “personality”? It lives in a system prompt: a few sentences reminding it of its mission, its tools and its rules.

`You are an autonomous coding agent.
Rules:
1. Break missions into steps and call the tools you need.
2. Analyze each tool result; on error, fix it and retry.
3. A destructive action may require user confirmation.
4. When the mission is done, give a short recap and hand control back.`

In one sentence

A coding agent is a loop that sends the history to a model, runs the tools it asks for, returns the results to it, and repeats — all wrapped in safety rails and context management. No magic: just readable engineering.

08

Frequently asked questions

What is the difference between an agent and a chatbot?

A chatbot answers in a single pass: message, then reply. An agent acts, observes the result of each action and loops again, using tools, until the task is actually done.

Do you need a framework to build a coding agent?

No. The agent in this guide is written in Go with the standard library only, in ~1000 lines and zero dependencies. It works with any OpenAI-compatible server (LM Studio, Ollama, vLLM, a cloud API).

Is it free? Does it work offline?

Yes to both. The code is open-source (MIT license), so it's free. And because the agent talks to an OpenAI-compatible server, you can point it at a model running on your machine (LM Studio, Ollama…): no internet connection and no subscription required. It's the same principle as Cursor, Claude Code, Copilot or Codex, but 100% local.

How does a language model execute actions?

It doesn't execute them itself. Through function calling, the model returns a structured intent (tool name and arguments); it's the agent code that actually runs the tool, then returns the result to the model.

How do you stop an agent from looping or running a dangerous action?

With safety rails: a step limit per turn, repeated-action detection, human confirmation for risky commands, timeouts, and truncation of overly long outputs. Errors are fed back to the model so it can self-correct.

How do you handle a limited context window?

By compacting the history: when it exceeds a token budget, recent messages are kept intact while the oldest ones are summarized by the model and replaced with that summary.

09

The code & going further

This whole guide describes a real, complete agent. The code is open, commented line by line, and runs with any OpenAI-compatible server. The best way to understand it: read it, run it, change it.

Nhilo94/comprendre-agent-llm

A coding agent in Go — LLM loop + tools, ~1000 lines, zero dependencies.

Go stdlib only OpenAI-compatible

Run the agent in 30 seconds

git clone https://github.com/Nhilo94/comprendre-agent-llm
cd comprendre-agent-llm
go run .            # then pick your model and start chatting

A few ways to extend it

  • Add a web search or API call tool.
  • Replace the summary with a real vector memory.
  • Let the agent plan before acting.
  • Wire in a second agent that reviews the first one's work.