How it works

What is maxOS

June 15, 2026 · 5 min read

maxOS is an open-source, AI-native platform by ООО «Гарипофф» that runs on local models. Its open-source core is an agent: it reads and edits files in a project and executes terminal commands, but does so using a language model running directly on your machine. Neither your code, queries, nor file contents ever leave for a third-party cloud.

The core is open-source under the Apache-2.0 license: github.com/LLC-Garipoff/maxos. Let's start there.

An Agent, Not Autocomplete

Autocomplete suggests the next line. An agent solves the entire task: it analyzes the project, makes edits across multiple files, runs builds and tests, reads the output, and proceeds accordingly.

Inside maxOS is a simple loop. The model is provided with a task and a description of the available tools. Instead of text, the model responds with a decision to call a tool. There are four tools:

read_file — read a file;
list_dir — view directory contents;
write_file — create or overwrite a file;
run_bash — execute a shell command (build, test, run).

maxOS executes the call in the working directory, returns the result to the model, and the loop repeats until the model stops calling tools and returns a final answer. All paths are restricted to the working directory — any call attempting to escape it is rejected. By default, maxOS prompts for confirmation before writing a file or executing a command.

That is the entire principle. No hidden cloud orchestrator — the loop, tools, and prompt fit into a few hundred lines with zero runtime dependencies.

Local Models Only

maxOS does not run neural network inference itself. It delegates inference to a local runtime with an OpenAI-compatible API — such as Ollama or llama.cpp. You spin up the model locally:

ollama pull qwen2.5-coder:7b

— and maxOS communicates with it via http://localhost. There is no cloud endpoint in this architecture at all: if your machine can handle the model, the agent runs completely offline.

There is a technical nuance here that required writing a dedicated layer. Some local models properly return tool calls in the structured tool_calls field, as expected by the OpenAI-compatible protocol. However, many — including qwen2.5-coder — write the call as JSON text directly within the response instead. maxOS recognizes both cases: if the structured field is missing, it extracts the call from the text (including formats with <tool_call> tags) and continues the loop. This allows the framework to work with a variety of models rather than just a single "correct" one.

Where Coding Models Connect

maxOS is an orchestrator: prompt, tools, loop, sandbox, confirmations. The actual code is written and edited by the connected model. For coding, specialized models are loaded into the runtime — qwen2.5-coder, llama3.1, and similar ones that support tool calling. Changing the model takes just a single line (--model) or an environment variable; the rest of the agent remains unchanged.

This separation is intentional. The agent loop is bottlenecked by disk and network I/O, not computation, which is why it is written in TypeScript — allowing for rapid iteration and easy installation. Performance-critical and system-level components, which are still in development, are planned to be offloaded to a native Rust core: indexing large repositories, searching through them, and providing a single binary with no Node dependency. The logic remains the same — using the right tool for the job rather than rewriting I/O-bound processes in Rust.

Where SpotMax Comes In

SpotMax is the team's desktop application that grew out of the same philosophy: the model helping you should run right next to you, not in someone else's data center. The name itself is Spotlight + max: a tool tightly integrated into the operating system, always at your fingertips.

We originally built it for sales — for our own presentations and pitches: the assistant listens to the conversation and provides real-time prompts to help keep track of the context and flow. It turned out to be useful in a broader scope, proving valuable for both job applicants and HR. Today, it is increasingly used for negotiations with foreign counterparties: SpotMax recognizes and transcribes speech directly on the device, helping guide the conversation without losing the thread. How exactly it listens to a call, transcribes speech, and provides prompts is detailed separately: How SpotMax Works: Transcription, Translation, and Prompts During a Call.

Speech recognition and speaker diarization run locally — these are the models running on your device. maxOS serves as the shared, open foundation beneath this: the agent layer we open-sourced so it can be audited, verified, and improved.

Why Local?

Cloud-based coding agents send your source code, queries, and often terminal output to a third-party server. For a large portion of work — proprietary code, regulated data, NDAs, or simply a poor internet connection — this is unacceptable. maxOS makes the local path the only path: you can verify this directly in the source code, as it is completely open.

Code, bug reports, and improvements are on GitHub: github.com/LLC-Garipoff/maxos. And for specific improvements to local models, we have a bounty program.