Voice-Driven Dynamic UX — The End State

The architectural direction for how Starlight's interface evolves. This is the longest-arc page in the docs — it's not a feature, it's the shape the product converges toward. None of this is in v1. What matters in v1 is that the architecture doesn't foreclose this direction.

Erik's framing (May 2026): "Voice will be the primary input language. As I'm talking, I would like to see the interface dynamically update itself as possible."

The end state is not "we add a voice interface to the existing app." The end state is the interface regenerates itself in real time per task and context, with voice as the natural way to express intent.

The thesis, briefly

Starlight is one expression of a broader Erik thesis about software (full memory: erik-software-thesis.md):

The presentation layer is collapsing. Code generation is cheap enough that businesses will stop stitching together 18 SaaS products and start running on single-pane-of-glass apps tailored to their exact workflow.
The capability layer is strengthening. APIs are the substrate. Stripe, Twilio, DoseSpot, Claude, Health Gorilla — all stay specialized. They sell APIs, not SaaS. The user never sees them.
The end state is dynamic. Instead of fixed dashboards designed once and shipped forever, agentic systems hydrate the UI on demand based on what the user is currently trying to do. Voice is the natural input modality; the UI is the natural output modality; both adapt continuously.

Starlight is the proof-of-concept for that thesis applied to healthcare. If we get it right, the pattern replicates across every vertical drowning in SaaS sprawl.

What it looks like, concretely, for Starlight

Today (v1 plan)

A doctor opens the Starlight web app. Top nav: Today · Patients · Calendar · Inbox · Documents · Prescriptions · Billing · Reports · Marketing. They click Today, then click a patient, then click Notes, then click "+ New Note." The interface is fixed and the doctor learns it.

This is a great v1. It is not the end state.

End state

Yogini walks into the office on a Tuesday morning with the iPad. She says:

"What's on my morning?"

The screen renders her schedule for the next four hours, with the AI risk panel summarizing what to pre-read for each visit, and the SMS thread for her first patient pinned at the top because that's where the relevant context lives.

She says:

"Sofia's coming in at 10:30. What do I need to know?"

The schedule fades. Sofia's chart hero card appears with three quick-fact cards (allergies, active Rx, prior visits). The right rail shows Sofia's recent SMS thread with her mom about the cough. The AI risk panel shows the 47-day Epi-pen expiration, the well-visit overdue, and the two prior viral-URI patterns that match today's chief complaint.

She says:

"Start the visit."

Recording starts. The chart hero compresses to a small banner; the live transcript pane takes the right rail. Yogini's hands are on Sofia, not on a keyboard.

Mid-visit, she says:

"Show me her growth chart."

The transcript pauses; the growth chart appears full-screen for three seconds; she nods and the chart re-collapses; the transcript resumes.

After the visit, she says:

"Sign the note. Send Sofia's mom a summary. Book a recheck in 7 days if she's not better."

Three downstream effects fire. The note signs. The parent-app summary generates and pushes. The follow-up task creates. None of those required a click.

Why this is genuinely better, not gimmicky

Cognitive load drops. The doctor stops navigating and starts speaking. The mental model is "what do I need right now?" rather than "which menu has this?"
The interface anticipates. A great executive assistant doesn't ask "would you like to see the calendar or the inbox?" — they put the right thing in front of you because they understand the context. We are building that assistant.
Hands-free matters in clinical work. Pediatric visits with squirmy kids, newborn home visits with a baby on the exam pad, post-procedure check-ins — there are many clinical contexts where keyboard-and-mouse is friction. Voice + dynamic UI is real ergonomic value.
It scales across roles. A nurse, a billing manager, a parent — all interact with the same data through different surfaces. Static UI design forces three separate apps. Dynamic UI lets one app serve all three with the right surface for each.

Why it's a long-arc bet, not a v1 ask

Three reasons:

The technology isn't quite there. Real-time speech-to-intent at clinical-grade latency, with structured tool-calling and UI-regeneration, is bleeding-edge in 2026. It's possible but not boring. v1 demands boring tech for the things that have to work.
The UX research isn't done. What does a doctor want the UI to do when they say "show me her labs"? Probably it depends on what was just on screen. We need real users in real workflows to learn the rules. v1 gives us the substrate to run those experiments later.
The fixed-UI v1 already wins the market. We don't need dynamic UX to displace Atlas.MD. The v1 launch briefing prototype is already a generationally better product. Dynamic UX is the moat we widen later, not the wedge we lead with.

What v1 must not foreclose

The v1 architecture has to stay compatible with a future where the UI is generated, not designed. Concretely:

Every screen state has to be expressible as data. A view is a (currentView, filters, selectedPatientId, …) tuple — not a hand-coded JSX path. The launch-briefing prototype's updateAiContext() function is a primitive version of this; productionize the pattern.
Every action has to be callable. No "you have to click this button to make this happen" — every user action is also a tool/function the AI can invoke, with the same effect. This is the same discipline that makes the ⌘K command palette work, just generalized.
Every server response has to be machine-readable. Don't ship endpoints that return HTML. Return structured data; let the client (or the UI generator) shape it.
The data classification system must thread through. A dynamically generated UI that doesn't know which fields are PHI is a leak waiting to happen. The classification at the schema level becomes load-bearing.
Audit logging is per-action, not per-view. "Yogini viewed Sofia's growth chart at 10:32" is the right granularity, regardless of whether it took a click or a voice command to render.
API contracts are stable and documented. Per Erik's broader thesis, the capability layer (APIs) is the substrate that everything else regenerates against.

These are mostly already v1 architecture goals — they're just clearer when you know where the product is heading.

Voice-as-input technology stack (sketch)

Not in v1, not yet committed, but the rough shape:

Layer	Candidate tech	Notes
Speech-to-text	Whisper (already on the v1 stack for the AI scribe)	Same engine, different consumer
Intent extraction	Claude with tool use	Define the toolset = define the surface area; same pattern as the AI scribe and triage assistant
UI regeneration	React with state derived from intent + chart-RAG	Most of the rendering surface already exists; the trick is making it driven from intent rather than navigation
Latency budget	< 500ms perceptual response time, < 2s for chart pivots	Real engineering discipline required to hit

Notably nothing here requires a new vendor — Whisper and Claude are already in scope. The work is in the orchestration, not the components.

Where dynamic UX shows up first

Even before a fully voice-driven interface, the principles can show up incrementally:

The ⌘K command palette is already a degenerate dynamic-UX surface (text input, intent matching, action invocation).
The Ask AI panel with updateAiContext() is already a context-aware assistant.
The post-sign worker queue that auto-generates parent summary, routes Rx, creates follow-up task — all driven by one user action — is already an example of "one intent, many downstream effects."

Each of these can be deepened over time without a big-bang dynamic-UX release. The end state is what you get if you keep walking that direction for years.

Cross-references

Parent Triage — applies the same intent-driven model to the parent app side.
Launch Briefing — clinician app — the v1 surface that's the seed of all this.
Compliance · regulated SDLC — the data classification work that makes dynamic UX safe.

The thesis, briefly​

What it looks like, concretely, for Starlight​

Today (v1 plan)​

End state​

Why this is genuinely better, not gimmicky​

Why it's a long-arc bet, not a v1 ask​

What v1 must not foreclose​

Voice-as-input technology stack (sketch)​

Where dynamic UX shows up first​

Cross-references​