COS Platform · Uber Speckit
Internal · Engineering & Product
Spec-driven development, enhanced

From a one-line idea to shipped & marketed — with a paper trail.

Uber Speckit is our structured lifecycle for building a feature: research it, shape it with a human, spec it, gate it, build it, and produce the marketing for it — as a single connected flow. Every phase leaves an artifact, so the "why" is never lost.

The operating principle: agents brief & execute, humans decide.

The flow

Nine phases, each a gate with a deliverable

Two phases make this "Uber": a discovery front-end that researches product-market fit, and a marketing back-end that turns that research into collateral. Everything between is disciplined spec-driven engineering.

upgraded
01
discover

Research the problem & market, then shape the solution with the human.

02
specify

User stories, functional requirements, measurable success criteria.

03
clarify

Pin the ambiguities that would change the design — before planning.

04
plan

The technical approach, grounded in the real codebase + schema.

05
tasks

Break the plan into ordered, gated units of work.

06
analyze

Adversarial cross-check: coverage, consistency, measurability. A quality gate.

07
remediate

Fix what analyze found — before a line of code is written.

08
implement

Build phase by phase, each gated on tests / evals.

the addition
09
marketing

Turn the discovery research + spec into real collateral.

The dual-deliverable rule

Every step produces two things: the technical artifact (spec, plan, code…) and a marketing micro-artifact (a positioning seed, a why-now angle, a differentiation note). Marketing isn't bolted on at the end — it accumulates from phase one.

The new discovery phase

Discovery researches product-market fit — and feeds marketing

The old reflex was to jump straight to a spec. Discovery makes us research the market first: is this a real job worth doing, who else does it, and what's the wedge? That research is the same material the marketing phase ships.

PMF research

It studies the market, not just the code

Customer-artifact sourcing, competitive / prior-art scans, Jobs-To-Be-Done, and an opportunity (OST) pass — surfacing the category gap, the customer's real job in their own words, and the defensible differentiation. Plus a system-reality pass so the design fits what's actually there.

Adaptive interaction

A design conversation, not a questionnaire

How much it asks you scales with appetite × post-research uncertainty — a quick confirm, one design round, or an open design conversation. The agent brings options with trade-offs and a recommendation, proposes the depth, and you dial it up or down. No rote 20-questions.

Research → marketing

The research is the marketing raw material

For customer-facing features the full research is persisted verbatim, with sources, in the feature folder. That's not bureaucracy — it's the exact input the marketing phase compiles into positioning, differentiation, and demo copy. Discovery and marketing are two ends of one thread.

Discovery research
Category gap · JTBD · customer voice · competitive proof — captured in full.
Spec marketing-seeds
Each phase appends positioning / why-now / differentiation seeds.
Marketing phase
Compiles it into shipped collateral — a brief, a landing page.
Post-build walkthrough · a real run

How we built Feature 038 — Cross-Artifact Session Merge

A live example, start to finish. The point: see exactly where a human decided and where the agent ran on its own. (038 = the onboarding agent reconciling a roster CSV + a whiteboard photo + sticky notes into one schedule, instead of double-counting them.)

Human — a decision, a judgement call, a correction
Autonomous — the agent did it unattended
Gate — a human go/approve
00The spark
While testing onboarding, a session-eval showed the agent extracts each artifact independently and never merges them — the same crew counted twice (accuracy 0.915 vs a reconciled 1.000).
HumanReframed it: "the deliverable is the reconstructed schedule, and it's an interactive multi-artifact build" → "scope it as a speckit feature."
AutoQuantified the gap with a session eval; created feature 038 + branch.
01Discover
PMF research, then a design conversation.
AutoRan two research streams (competitive prior-art across HR / CRM / record-linkage / ag tools; entity-resolution + incremental-onboarding technique). Found the category gap — "give it your data however messy; nobody ships this" — and the verbatim wedge: "coming off the field and entering chicken scratch into a spreadsheet." Persisted full research to the feature folder.
HumanSet appetite: medium. Answered 3 design-round questions: confidence-banded merges, a web side-by-side review, and batched ask-for-more.
02–03Specify · Clarify
Requirements written; ambiguities pinned.
AutoWrote spec.md — 6 user stories, grouped FRs, 9 measurable success criteria (the session eval as the acceptance test).
HumanCaught a skipped phase — "I thought our flow included clarify after specify?" Then answered 4 clarifications: resolve against the session + existing DB; CURP-conflicts ask a human; all merges reversible; a batched review queue.
04–07Plan · Tasks · Analyze · Remediate
Approach, breakdown, and the quality gate that earned its keep.
AutoWrote plan.md + tasks.md grounded in the real schema. Analyze caught a CRITICAL: the headline gate was measuring a reference merge, not the reconciler being built (plus a band-threshold typo and an over-promised gap class). Remediate fixed all of it — before any code.
Human"Proceed autonomously through implementation, then marketing." Also caught the agent skipping remediate in its plan-out.
08Implement
Four phases, each gated on tests / evals.
AutoP1 reconciler (eval F1 0.982, crews 4→2, 13 unit tests) · P2 survivorship + provenance + undo · P3 review-queue API + gap detector + a React review screen · P4 live wiring + write-by-canonical-id + 4 migrations applied to the production DB + a rolled-back integration test.
GateHuman approvals at the seams: "merge to main", "proceed to Phase 4", "use the production environment" — and a domain unblock (the working DB proxy path) when the agent hit a stale credential.
09Marketing
The discovery research became shipped collateral.
AutoBuilt & deployed a real landing page — cos-onboarding.farmagent.fruitscout.ai — from the discovery research (the category gap, the JTBD wedge, the proof numbers), on-brand, with SSL.
GateHuman said "make it a real hosted site" — the agent did DNS + nginx + Certbot, then reported back.
What landed

The final output

Working code, in prod. Reconciler shipped & deployed; 4 reconciliation tables live in the production digital_twin DB.
The full paper trail. discover → spec → clarify → plan → tasks → analyze report, + verbatim research, all committed to the feature.
A validated eval. The session-merge eval (naive 0.915 → reconciled 1.000) is now the regression test.
Live marketing. A hosted, on-brand landing page generated from the same research that drove the build.
Phase 09 output · live See the marketing page the flow produced cos-onboarding.farmagent.fruitscout.ai 

The division of labor

What the human did
  • Set the appetite and reframed the problem
  • ~7 design & scope decisions (3 design-round + 4 clarifications)
  • ~5 go / approve gates (proceed, merge, prod, host)
  • 2 process catches (skipped clarify; forgotten remediate)
  • 1 domain unblock (the real DB-proxy path)
What the agent did
  • 2 PMF research streams + full persistence
  • 7 lifecycle docs (spec, plan, tasks, analyze, research ×2, discover)
  • 4 implementation phases across ~6 sub-builds
  • ~40 unit/eval validations + a DB integration test
  • 1 prod deploy, 4 migrations, 1 hosted marketing site

The human spent their time on judgement, taste, and approval.
The agent spent its time on research, building, validating, and shipping.