← Back to homepage

Questions we get on
every discovery call.

Direct answers. If there's a caveat, we name it.

Approach
Everything begins with an Architecture & Security Audit. Before we write AI code, we map your current system — existing auth boundaries, data tenancy model, where LLM failures would propagate into your core product. The audit runs 2–4 weeks and produces a written report: risk assessment, architecture decisions, and a prioritised feature scope with effort estimates. Most clients move into a fixed-price engagement from there. The audit is the entry point — not a sales step.
Scoping depends on what's already in place. A well-structured backend with a single AI integration point might be a 6–8 week fixed engagement. A multi-tenant system with compliance requirements typically runs 10–14 weeks. We fix the price after the audit — not before — because scope defined without seeing the system is guesswork. We don't do open-ended retainers as a primary model. If the audit surfaces more complexity than expected, we tell you before we're halfway through.
Two principals means every decision is made by the people writing the code — no handoffs, no translators, no junior work on critical paths. The tradeoff is capacity: we run 3–4 clients at a time, not 30. If you need a 15-person delivery factory, we're not the right fit. If you need two engineers with 18+ years of production backend experience owning every architecture call — that's exactly what we are. If you're uncertain whether we're the right fit, the audit is a low-risk way to find out.
Technology
Current integrations: OpenAI, Anthropic (Claude), Mistral, Azure OpenAI, and AWS Bedrock. We build provider abstraction by default — a routing layer normalises API differences so you can swap or weight providers with a config change, no downstream code changes required. No lock-in is an architecture decision, not a promise. Caveat: if your compliance requirements mandate a specific provider or region for data residency, we design around that constraint from the start, which may limit fallback routing options.
Yes, where the LLM provider supports it. We've worked with Azure OpenAI private endpoints, AWS Bedrock, and open-source models (Mistral, Llama variants) on client infrastructure. On-prem LLM deployment adds real complexity — inference latency management, GPU provisioning, model version governance — and we scope that honestly. If your requirement is "no data leaves our VPC," that's a solvable problem, but it affects timeline and the fallback routing options available.
Usually yes, but we audit it first either way. Prototypes that work in staging often have production gaps: direct LLM calls with no fallback, missing tenant isolation, secrets not routed through a vault. We identify what's production-safe and what needs rework before extending. Continuing from a prototype is usually faster than a greenfield start — but only if the foundations hold. The audit tells you which situation you're in before we commit to a scope.
Security & Compliance
No. We work inside your infrastructure — we don't proxy your LLM calls, store your vectors, or receive your data. We write the code; you run it. LLM API calls go directly from your infrastructure to your chosen provider. If your provider requires a Data Processing Agreement for GDPR or HIPAA compliance, we'll identify which providers offer DPA coverage and configure it correctly. Your data boundary stays yours.
Access policy is enforced as a pre-filter at the vector DB query layer — not as a post-retrieval filter. The similarity search only runs over documents the requesting user is authorised to see. Every embedding index is namespaced per tenant. Every retrieval operation is logged: tenant ID, user ID, document IDs returned. If a document shouldn't be in the context window for a given user, it's never retrieved — not filtered out after the fact.
Handover
You own everything. All code written during an engagement is transferred at handover — no license dependencies, no retained IP. We don't reuse client-specific implementations across engagements. The patterns we apply (circuit breaker design, semantic cache architecture) are industry-standard; the implementation is yours.
Handover is scoped from the start, not improvised at the end. The codebase is written to be readable and extended by your team — named abstractions, tested boundaries, structured logs queryable without tribal knowledge. Every engagement ends with a runbook: how to monitor the AI layer, how to swap LLM providers, how to roll back a feature. We run a working session with your engineers — typically 2–4 hours — covering architecture decisions and the operational playbook. If we've done the job right, you don't need us to keep it running.