SigilixSigilix

How it Works

Five minds. One verdict. Zero noise.

Sigilix is not a wrapper around a single language model. It is an ensemble of four domain specialists and a synthesizer, each tuned for a job no generalist can do well. Below is the full system, top to bottom.

01System architecture

Four specialists. One synthesizer. Built in.

Every Sigilix review begins with retrieval, runs through four specialist models in parallel, and ends with Core — a single synthesizer that owns the verdict. The architecture is intentional: depth is a function of specialization, and signal is a function of synthesis.

Diff hunks

Surrounding files

PR metadata

Repo conventions

Glyph

Architecture

Warden

Security

Spark

Performance

Weave

Semantics

Core

Synthesizer

Retrieve

Diff + repo context fetched at scoped read.

Specialize

Four domain models run in parallel.

Synthesize

Core resolves contradictions and ranks.

02Specialist topology

What each specialist actually does.

Each specialist gets its own model selection, its own system prompt, its own retrieval scope. They are not parallel copies of the same generalist — they are four different jobs running concurrently.

Glyph

Architecture

DeepSeek v4-Pro

Reads the structural shape of a change. Looks for boundary violations, leaked abstractions, dead code, dependency-graph regressions, and refactors that look local but ripple through the codebase.

Typical catches

  • ·Cross-package import cycles introduced by a new helper
  • ·Type widenings that break downstream contracts
  • ·Resurrected legacy code paths after a partial refactor

Warden

Security

DeepSeek v4-Pro

OWASP-trained, lockfile-aware, and primed to read CVE feeds. Treats every diff as a potential vulnerability surface and validates auth, input, secrets, and supply-chain risk.

Typical catches

  • ·SQL injection where input is concatenated, not parameterized
  • ·CVE-class transitive dependencies re-introduced by a routine bump
  • ·Auth checks against stale cache reads

Spark

Performance

Kimi K2.6

Reads code as a runtime story. Spots N+1 queries hidden behind clean async, accidental algorithmic blowups, hot loops, and resource exhaustion that won't show up until production load.

Typical catches

  • ·N+1 queries inside Promise.all that look idiomatic
  • ·Quadratic loops over collections that grow with users
  • ·Missing indexes on new query patterns

Weave

Semantics

Kimi K2.6

The naming, contracts, and intent reviewer. Reads function signatures, docstrings, and the gap between what the code says it does and what it actually does.

Typical catches

  • ·Function names that lie about side effects
  • ·Type contracts violated under TypeScript's structural typing
  • ·Tests that assert the wrong invariant

Core

Synthesizer

Kimi K2.6

Receives all four specialist outputs, resolves contradictions, drops duplicates, ranks by impact, and writes one coherent review. Owns the verdict (approve / comment / request changes) and is the only voice the PR author hears.

Typical catches

  • ·Cross-specialist races that no single specialist saw
  • ·Duplicate findings phrased four different ways
  • ·Critical findings buried under noise
03Retrieval layer

Specialists are only as good as their context.

Before any model runs, Sigilix builds a tailored retrieval bundle: the diff hunks, the files those hunks live in, the imports they touch, and the conventions of the surrounding repo. Each specialist gets a different slice — Warden sees lockfiles and CVE feeds; Glyph sees the dependency graph; Spark sees query plans where available.

Diff hunks

Authoritative source. The literal changeset under review.

File context

The surrounding files, type-aware imports, neighboring tests.

Repo conventions

Sample of recent merged PRs to learn the team's idioms.

04Model choices

Why two providers. Why these two.

The ensemble splits across DeepSeek v4-Pro and Kimi K2.6. We chose them after a month-long bake-off: DeepSeek dominates on dense, proof-style reasoning; Kimi is the stronger writer and synthesizer. Splitting roles across providers also means a single provider outage degrades the review gracefully — never silently.

DeepSeek v4-Pro

Dense reasoning, low hallucination on hard math and library APIs.

Powers Glyph and Warden. Strong on code-graph reasoning, CVE chains, and lockfile resolution. Admits uncertainty instead of fabricating APIs.

Kimi K2.6

Stronger writer, better at synthesizing across signals.

Powers Spark, Weave, and Core. Superior on naming, contract intent, and the final synthesis step where four specialist outputs become one coherent review.

05Synthesizer pipeline

Core does the work no specialist can do alone.

Once the four specialists return, Core runs a four-step pipeline: collect, cross-reference, calibrate, render. Findings get deduplicated, conflicts get resolved with rationale, severity gets calibrated against repo history, and the final review is rendered as one inline-anchored comment with a single verdict.
  1. 01

    Collect

    All specialist outputs are merged into one structured pool, tagged by source.

  2. 02

    Cross-reference

    Findings are compared. Duplicates collapse; partial findings that compose into a critical (e.g., a race condition) are upgraded.

  3. 03

    Calibrate

    Severity is recalibrated against the repo's bar and the PR's risk profile. ‘Style preferences’ are downgraded; production-class bugs are upgraded.

  4. 04

    Render

    Findings render as inline-anchored comments with file + line + suggested patch. The review body lists the verdict and a one-paragraph summary.

06Failure modes

What breaks reviews. How we catch it.

Honest engineering means naming the failure modes before they bite. Below are four ways review systems break in practice, and how Sigilix handles each.

Single-agent hallucination

Generalist models invent functions, props, or APIs that don't exist. Sigilix grounds every claim in retrieved file context and rejects findings that can't be cited to a real path and line.

Specialist disagreement

When two specialists conflict (e.g., Spark says ‘ship faster’, Glyph says ‘refactor first’), Core resolves with the trade-off explicit, never silently picking a side.

Stale head SHA

If the PR moves while review is in flight, Sigilix detects the drift, drops the in-flight review, and re-runs against the new head — never posting findings against code that no longer exists.

Provider degradation

Each specialist has a cross-provider fallback. If a primary model returns 5xx or saturates, the specialist re-runs against a backup model with identical prompts before reporting partial output.

07Why ensemble

Why ensemble beats single-agent.

The single-agent shortcut is real: one model, one prompt, one round-trip per PR. But the cost is uniform mediocrity. An ensemble pays a small latency tax for a categorical quality jump.

Single-agent reviewer

  • Coverage

    One model carries every domain. Strong at none.

  • Hallucination

    Generalist guesses. Cites APIs that don't exist.

  • Synthesis

    Lists every finding it has. No ranking, no dedup.

  • Provider risk

    One outage drops the whole review.

Sigilix ensemble

  • Coverage

    Four specialists, each tuned for a real domain.

  • Hallucination

    Findings are grounded in retrieved context. Ungrounded claims are dropped.

  • Synthesis

    Core dedups, ranks, and synthesizes cross-specialist findings.

  • Provider risk

    Cross-provider fallback per specialist. Graceful degradation.

Read the proof, not just the architecture. See six real reviews.

Last updated 2026-05-04