---
title: Installer
section: Ora — System papers
status: review
description: "Provisioning in two phases: a universal base that gives any machine a working AI through one orchestrator, then an additive local-capability phase gated on hardware."
authors:
  - The Ora Foundation
downloads:
  md: /papers/white/installer.md
license: https://creativecommons.org/publicdomain/zero/1.0/
---

# Installer

**The design claim: provisioning is two phases, not one — a universal base that gives every machine a working AI through one orchestrator, and an additive local-capability phase that is gated on detected hardware and validates that a model will actually fit before it downloads it.** The same architecture runs on a Chromebook-class laptop and a 128 GB Mac Studio; only Phase 2 differs, and it differs by what the hardware can carry, not by a different product. The two reliability-critical decisions are made before any model is downloaded: *can this machine run a local model at all*, and *will this specific model fit in this machine's memory*. Getting either wrong produces the worst failure mode in provisioning — a long download that ends in a model that will not load.

There are two ways to provision a local-first AI system. The naive way installs a model first and discovers its problems at load time — wrong size for the RAM, missing companion files, a chat-template mismatch that garbles output. The reliable way evaluates the machine, validates the fit, and only then commits to the download. This document specifies the second: the phase structure, the hardware gate, the tiering, the fit validation, inference-engine selection, integrity verification, and the deployment profiles — as a mechanism a competent engineer could rebuild.

---

## Phase 1 — the universal base

Phase 1 runs on every machine regardless of hardware, and at its end the user has a working browser-based AI at `localhost:5000` with tool execution. Six layers: a Python environment; the workspace directory structure (the vault, the conversation store, the ChromaDB index, `routing-config.json`); the framework library (cloned from the repository); the orchestrator (`boot.py`, `boot.md`, `mind.md`, the tool implementations); the API-key framework (staged); and the universal chat server.

The architectural commitment Phase 1 establishes is the **orchestrator-in-the-loop contract**: the model never touches the machine directly. The model emits a tool request in a defined format; the orchestrator's Python watches for it, executes the corresponding function, and injects the result back into the conversation. `boot.md` is the contract; the orchestrator is the party that honors it; a model without the orchestrator is issuing tool calls into silence. (This is the harness control loop, specified in [Harness Control Loop](/papers/harness-control-loop), established at install time.) The practical consequence: every serious interaction flows through `localhost:5000`, where Python is in the loop — not through a raw model interface.

Phase 1 reaches the model through API routing — OpenRouter required, direct provider APIs optional. An earlier design routed through browser automation against the user's logged-in subscription accounts; that dispatcher was retired (install Chunk 1) because the reliability and cost-management properties of APIs proved decisive over the "use what you already pay for" advantage of browser sessions. The orchestrator's Python-in-the-loop contract is unchanged; only the path from orchestrator to model differs.

---

## The hardware-evaluation gate

After Phase 1, a brief check decides whether Phase 2 runs at all. The rule is explicit: **if total RAM is below 8 GB, stop at Phase 1; if RAM is 8 GB or more and free disk is 5 GB or more, proceed to Phase 2.** Stopping is not a failure state — it is **Tier 0**, the same architecture on different hardware. A Tier-0 machine has commercial-AI intelligence reaching its local files through the orchestrator, every framework installed, and every methodology working identically; it simply has no local model. The gate is designed so the low end is a complete system, not a degraded one.

This is the first reliability decision, and it is made before anything is downloaded. A machine that cannot run a local model is told so and given a complete cloud-routed system instead of a half-installed local one.

---

## Hardware tiers — behavior matched to capability

Phase 2 tiers its behavior to detected hardware. Detection (Phase 2, Layer 1) reads OS, total and available RAM, disk, processor, and GPU. The tiers:

- **Tier 0 (cloud-only, < 8 GB):** Phase 1 system; commercial AI through the orchestrator. No local model.
- **Scout (≈ 8 GB):** a small local model as the primary endpoint. Capable of basic work; flagged for the tool-reliability limit below.
- **Workhorse (≈ 8–64 GB):** a mid-size local model; solid general use and analysis.
- **Sovereign (64 GB+):** large or multiple local models. This tier unlocks the **Model Switcher** (Phase 2, Layer 9), which configures which model fills each pipeline slot — the install-time entry point to the adversarial-diversity choice the [The Adversarial Pipeline](/papers/the-adversarial-pipeline) depends on.

The tiers determine which model class auto-selection targets and which capabilities install; the architecture does not change across them. Tiering is *additive*: Phase 2 registers a local endpoint alongside the Phase-1 system rather than replacing it, so a local model is a new endpoint in the same chat server, never a separate product.

---

## Validating model fit before download

The second reliability decision — *will this model fit* — is made before the download commits, and it is made from the model's parameter count and quantization level, **never from its file size**. The fit math (RAM ≈ parameters × per-quantization factor + overhead, against a budget of 75% of system RAM; the MoE total-vs-active distinction; the dense-vs-MoE capability-per-gigabyte frontier) is specified in [Model Selection](/papers/model-selection) and is not repeated here. What matters architecturally is *when* it runs: at selection, against the detected hardware, so a model that cannot fit is rejected or downsized before the user waits for gigabytes that will not load.

Selection itself is script-driven and transparent. The install script auto-populates model configurations using the Pareto-frontier + intelligence-floor + cost-sort algorithm (again, [Model Selection](/papers/model-selection)) rather than a hand-picked default — so the model chosen for a machine is the best fit on the cost/capability frontier for that hardware, derived rather than guessed.

---

## Inference engine and integrity verification

The inference engine is selected by architecture: **MLX** on Apple Silicon (the safetensors/MLX path), **Ollama** elsewhere (the GGUF path), with vllm-mlx available for the multi-model Sovereign case. Engine selection follows from the format the validated model is available in; a model not available in the machine's format triggers a search for a converted version rather than a failed install.

Download (Phase 2, Layer 5) is followed by **integrity verification before the endpoint registers**: the required companion files must be present (config for parameters, at least one tokenizer file, all weight shards referenced in the index), and the chat template is checked — its absence is a named, non-fatal warning rather than a silent default that garbles output later. Only a model that loads and passes a smoke round-trip registers as an endpoint; a model that fails the load invariant stops Phase 2 with a specific recovery declaration rather than leaving a broken endpoint in the routing config.

The provisioning architecture names its failure modes so the installer can detect and report them rather than producing a confidently-broken system:

- **RAM-from-file-size** — estimating memory from the download size instead of from parameters × quantization. The defining error this architecture is built against; fit is always computed from parameters.
- **XET pointer** — a repository whose listed files are kilobyte pointers, not the real weights, defeating size-based estimation (another reason fit is parameter-based).
- **Chat-template mismatch** — a missing or wrong template producing garbled output; checked and surfaced at install, not discovered in use.
- **Small-model tool reliability** (the Local Model Tool Reliability Trap) — below roughly 13B at 4-bit, tool-call reliability degrades; a Scout-tier selection ships with this caveat attached rather than implied.

A Missing-Information Declaration and a Recovery Declaration are mandatory outputs: any hardware characteristic assumed, any metadata estimated, any test skipped, and any unresolved failure with its specific next action are stated explicitly. A response that acknowledges a limitation is preferred over one that implies more capability than was installed.

---

## Deployment profiles

The post-overhaul installer is script-driven (`scripts/install.py`) and prompts for a **deployment profile** that shapes what Phase 2 provisions:

- **Solo** — a single local (MLX) worker, no API worker pool; the privacy-focused local default, and the only profile fully enabled today.
- **Hybrid** — a local worker plus a small API worker pool for failover.
- **Organization** — pure API workers, no local model, scaling to many concurrent processes (an interim API-only server path exists today; the full profile is gated on the concurrency-architecture work).

The profile is a provisioning-shape decision layered on top of the tiering: tiering says what the hardware can run; the profile says what mix of local and API workers to stand up for the deployment's purpose. A single-user laptop and an organizational inference node run the same installer with different profiles.

---

## Open problems

- **The installer documentation is mid-reconciliation.** The operational layers (`~/ora/installer/`) still cite the retired pre-customization first-boot document as their "canonical source," and the Model-Switcher layer still describes the pre-configuration bucket/slot model that the configuration architecture replaced. The provisioning *architecture* is current; the layer files have residual drift, and bringing them fully current is tracked as its own install-overhaul workstream.
- **Two profiles are scaffolded, not shipped.** Hybrid and Organization are designed and partially stubbed but gated on the concurrency-architecture work; only Solo is fully enabled, with an interim API-only server path standing in for Organization. The architecture admits all three; the implementation does not yet deliver all three.
- **Fit validation trusts model metadata.** Parameter count and quantization are read from the repository's published config; a model whose config misreports them (or omits them, forcing an estimate) can still pass validation and fail at load. The estimate is conservative, but it is an estimate.
- **The tool-reliability floor is a heuristic, not a measured per-model property.** "≈ 13B at 4-bit" is a rule of thumb; a small model that happens to call tools reliably is still flagged, and a larger one that calls them badly is not — the installer warns by size, not by a tested capability.
- **First-boot assumes a coding-agent or scripted execution environment.** The install runs as a script or under a coding agent with terminal access; it cannot provision from a plain chat interface, which bounds who can run it unattended.

None of these blocks the installer from taking a bare machine to a working system. Each marks a seam where the current mechanism is a working approximation rather than a finished answer.
