Architecture
The sober draft. Pruned from the Ideation page; targeted at "what we'll actually build." The pattern follows Polaris's packages/scripts/personas/ very closely; the domain mapping is what makes it ours.
Layered model
┌─────────────────────────────────────────────────────────────┐
│ L4 · CLINICAL ARTIFACT STREAM │
│ Visits · Vaccines · Notes · Rx · Docs · Messages │
│ (derived from the persona's structural shape) │
├─────────────────────────────────────────────────────────────┤
│ L3 · PERSONA SPEC │
│ Family + Guardians + Children + Sentiment + Voice │
├─────────────────────────────────────────────────────────────┤
│ L2 · SAMPLING (deterministic, hierarchical) │
│ familyArchetype → demographics → children → conditions │
├─────────────────────────────────────────────────────────────┤
│ L1 · CATALOGS │
│ familyArchetypes · conditions · regions · names · … │
├─────────────────────────────────────────────────────────────┤
│ L0 · PRIMITIVES │
│ mulberry32 PRNG · weighted-pick · plausibility filters │
└─────────────────────────────────────────────────────────────┘
L0–L2 are deterministic and pure-TypeScript. L3 is the materialized persona object. L4 is the clinical-artifact stream — partially deterministic (vaccines, scheduled well-visits) and partially LLM-generated (note narratives, message threads).
The fundamental object — FamilyPersonaSpec
Polaris generates one persona per "row." We generate one Family per row, with N children and M guardians underneath. This shape is deliberate — it's how the actual Starlight data model works (Family + Guardian + Patient).
export interface FamilyPersonaSpec {
/** Stable id deterministic from cohort seed + index. */
id: string;
cohortSeed: number;
/** Structural archetype that drove generation. */
familyArchetype: FamilyArchetypeSlug;
/** Household-level facts. */
household: Household;
/** One or more guardians. Order is significant: index 0 is "primary." */
guardians: GuardianSpec[];
/** One or more children. Each is a fully-formed PatientSpec. */
children: PatientSpec[];
/** Subscription + billing posture. */
billing: BillingSpec;
/** Generated-at + summary. */
generatedAt: string;
}
Household
export interface Household {
region: Region; // state, metro, ZIP-area, urban/rural
primaryLanguage: LanguageCode; // 'en' | 'es' | 'vi' | 'zh' | …
spokenLanguages: LanguageCode[]; // for translation needs
sesProxy: 0 | 1 | 2 | 3; // 0=very-low … 3=affluent (drives plan choice + adherence)
transportationAccess: 'good' | 'limited' | 'rural';
splitHousehold: boolean; // shared custody → two physical homes
}
GuardianSpec
export interface GuardianSpec {
firstName: string;
lastName: string;
relationshipToChildren: GuardianRelationship[]; // per-child relationship
// 'mother' | 'father' | 'step-mother' | 'step-father' |
// 'grandmother' | 'grandfather' | 'foster-parent' |
// 'court-appointed' | 'aunt' | 'uncle' | 'sibling-guardian'
contact: { mobile: string; email: string };
preferredChannel: 'app' | 'sms' | 'email' | 'phone';
/** Voice / sentiment for the LLM harness. */
sentiment: GuardianSentiment;
voice: GuardianVoice;
/** Hard constraints for the harness + UX. */
constraints: GuardianConstraint[];
}
export type GuardianConstraint =
| 'limited-english'
| 'health-literacy-low'
| 'health-literacy-clinician' // parent is themselves a clinician
| 'high-anxiety'
| 'vaccine-hesitant'
| 'financially-strained'
| 'evening-only-availability'
| 'co-parent-conflict' // parents disagree on care
| 'cps-history'
| 'has-court-order';
export interface GuardianSentiment {
engagement: 0 | 1 | 2 | 3; // 0=disengaged … 3=hyper-engaged
trustInMedicine: 0 | 1 | 2 | 3; // 0=actively-skeptical … 3=full-trust
anxietyLevel: 0 | 1 | 2 | 3;
financialPressure: 0 | 1 | 2 | 3;
}
export interface GuardianVoice {
tone: 'warm' | 'direct' | 'anxious' | 'skeptical' | 'exhausted' | 'casual';
readingLevel: 'plain' | 'professional' | 'medical';
backstory: string; // ~2 sentences for LLM-prompt preamble
}
PatientSpec (the child)
export interface PatientSpec {
firstName: string;
lastName: string; // not always same as guardian (step / divorced / adopted)
sex: 'F' | 'M';
dob: string; // ISO date, deterministic from cohort+index
ageBand: AgeBand; // 'newborn' | 'infant' | 'toddler' | 'preschool' | 'school' | 'tween' | 'adolescent'
/** Birth history — present always; richer for newborns/infants. */
birth: BirthHistory;
/** Active conditions / allergies. Coherent with ageBand. */
conditionProfile: ConditionProfile;
/** Vaccine status — what's complete, what's pending, what's declined. */
vaccineStatus: VaccineStatus;
/** Growth-percentile trajectory at the cohort's "today." */
growth: GrowthSnapshot;
/** Constraints that aren't conditions but matter for UX. */
patientConstraints: PatientConstraint[];
/** Per-child seed for deriving stochastic timeline events. */
seed: number;
}
export type AgeBand =
| 'newborn' // 0–28 days
| 'infant' // 1–11 months
| 'toddler' // 1–2 years
| 'preschool' // 3–4 years
| 'school' // 5–10 years
| 'tween' // 11–13 years
| 'adolescent'; // 14–17 years
export interface BirthHistory {
gestationalAgeWeeks: number; // 24–42
deliveryMode: 'vaginal' | 'csection' | 'vacuum-assisted' | 'forceps';
birthWeightKg: number;
birthLengthCm: number;
apgar1: number;
apgar5: number;
nicuStay: boolean;
feeding: 'exclusive-breast' | 'mixed' | 'formula' | 'transitioned-solids';
bilirubinPeak?: number;
}
export interface ConditionProfile {
allergies: AllergySlug[]; // 'peanut-severe' | 'tree-nut-mod' | 'penicillin' | 'NKDA' | …
chronicConditions: ChronicSlug[]; // 'asthma-mild' | 'asthma-moderate' | 'eczema' | 'adhd' | 't1d' | …
developmental: DevelopmentalSlug[]; // 'speech-delay' | 'autism-mild' | 'iep-on-file' | …
mentalHealth: MentalHealthSlug[]; // 'anxiety' | 'depression' | 'eating-disorder' | …
/** Active medications keyed by chronic-condition slug. */
activeRx: RxSpec[];
}
export type PatientConstraint =
| 'cii-rx-on-file' // ADHD CII Schedule prescription
| 'epi-pen-on-file'
| 'glucagon-on-file'
| 'iep-on-file'
| '504-on-file'
| 'school-physical-due'
| 'sport-physical-due'
| 'transitioning-to-adult'
| 'recent-er-visit'
| 'prior-cps-flag'
| 'gender-incongruence-conversation'
| 'newborn-home-visit-due'
| 'travel-vaccines-needed'
| 'recent-international-arrival';
export interface VaccineStatus {
posture: 'on-schedule' | 'mild-delay' | 'partial-catch-up-needed' | 'religious-exemption' | 'philosophical-exemption' | 'partial-international';
doses: VaccineDose[]; // structured list of given doses with date + lot (synth)
declinedSlugs: VaccineSlug[]; // explicitly declined
pendingSlugs: VaccineSlug[]; // due / overdue
}
export interface GrowthSnapshot {
weightKg: number;
heightCm: number;
weightPercentile: number; // 0–100
heightPercentile: number;
bmiPercentile?: number;
trajectory: 'tracking' | 'rising' | 'falling' | 'volatile';
}
BillingSpec
export interface BillingSpec {
plan: 'monthly' | 'quarterly' | 'annual';
rateUSD: number;
cardOnFile: boolean;
status: 'active' | 'past-due-30' | 'past-due-60' | 'cancelled-pending';
splitBilling?: { mom: number; dad: number }; // 0..100, must sum to 100
employerSponsored: boolean;
}
Cohort envelope
export interface FamilyCohort {
seed: number;
size: number; // # of families (children = ~1.6× this)
generatedAt: string;
families: FamilyPersonaSpec[];
summary: CohortSummary;
}
export interface CohortSummary {
byFamilyArchetype: Record<FamilyArchetypeSlug, number>;
byPrimaryLanguage: Record<LanguageCode, number>;
byRegion: Record<string, number>;
byChildAgeBand: Record<AgeBand, number>;
byChronicCondition: Record<ChronicSlug, number>;
byVaccinePosture: Record<VaccineStatus['posture'], number>;
byBillingStatus: Record<BillingSpec['status'], number>;
totalFamilies: number;
totalChildren: number;
totalGuardians: number;
}
Sampling (L2) — hierarchical and deterministic
The sampling cascade per family:
seed → mulberry32 RNG
↓
draw familyArchetype (weighted, US distribution)
↓
draw region (population × industry-style affinity, see below)
↓
derive household (region drives language probability, SES proxy)
↓
draw # of children (Poisson-ish capped 1–5, weighted by archetype)
↓
draw guardians (count + relationship from archetype)
↓
for each child:
↓
draw ageBand (uniform-ish across the practice's panel)
↓
derive birth (age-band-appropriate)
↓
draw conditions (age-band plausibility filter applied)
↓
derive vaccines (age + posture + condition-modifying)
↓
derive growth (age + condition + family pattern)
↓
draw billing
Plausibility filters at every step, mirroring Polaris's "no CIO at solopreneur" pattern:
- A 2-year-old can't have ADHD diagnosed (DSM-5 minimum age is 4).
- A newborn can't have a school physical.
- An adolescent can have HEEADSSS-confidential threads; a 5-year-old cannot.
- A "religious exemption" vaccine posture is incompatible with "fully on schedule."
- T1D requires a chronic-Rx (insulin) — generator must coemit.
- An immunocompromised flag must defer live vaccines.
- Recent international arrival is incompatible with "fully on US schedule."
Filters are encoded as TypeScript predicates and applied at draw time — exactly the Polaris approach.
Region weighting
Identical to Polaris's strategy:
- Base weight per state = 2020 Census population in millions. California (~39M) is ~25× more likely than Wyoming.
- Family-archetype × region affinity — e.g., "recently-arrived-immigrant-family" has multipliers for TX/CA/FL/NY; "rural-multigenerational" has multipliers for the Mountain West and Appalachia. Affinity is a multiplier, not a hard restriction — every state can still produce every archetype, just at different probabilities.
LLM-generated content (L4)
Most of the persona is structurally generated. Only natural-language texture uses LLMs:
| Field | Generation method |
|---|---|
voice.backstory (per guardian) | Hand-curated templates, lightly randomized. (Polaris pattern.) |
| Visit-note narratives in the timeline stream | LLM (Claude), conditioned on the persona's structured fields, age, and the visit type. |
| Parent-app message thread content | LLM, conditioned on tone + reading level + topic + the child's actual record. |
| Document content (lab reports, imaging, school physicals) | LLM, conditioned on the structured panel/result. |
| Audio conversation lines (for AI scribe demos) | Hand-curated initially → LLM-generated + TTS later. |
Crucial design rule: the LLM never invents structured fields. The LLM only renders structured fields into prose. This keeps the generator's distribution properties intact even when natural language is added — the structure is the source of truth, the prose is decoration.
Distribution invariants (CI tests)
Mirroring Polaris's test suite. Any cohort of 1,000+ families must satisfy:
- Coverage: every age band has ≥ 50 children. Every primary-language code in the catalog appears at least once. Every census region has at least 5 families.
- Concentration: no single condition slug exceeds 30% of children. No family archetype exceeds 25% of families. No state exceeds 25% of families (California will be the binding case here).
- Coherence: every child's age is within their age-band's range. Every chronic Rx is paired with the corresponding chronic condition. Every vaccine posture is internally consistent (no "fully-on-schedule" + 3-year gaps in
doses). EverysplitHousehold: truefamily has > 1 guardian and at least one with a custody-related constraint. - Determinism: same
(seed, size, options)produces bit-identicalfamilies[]. PRNG ismulberry32. NoMath.random()allowed. - Long-tail presence: at least 1 newborn home-visit pattern, at least 1 adolescent HEEADSSS-confidential pattern, at least 1 vaccine-hesitant family, at least 1 split-custody family, at least 1 limited-English family in any cohort ≥ 200.
Cohort modes
Like Polaris's industriesAllow etc., we'll support stress-testing modes:
export interface GenerateCohortOptions {
seed: number;
size: number; // # of families
familyArchetypeWeights?: Partial<Record<FamilyArchetypeSlug, number>>;
archetypesAllow?: FamilyArchetypeSlug[];
conditionsAllow?: ChronicSlug[];
ageBandWeights?: Partial<Record<AgeBand, number>>;
/** Adversarial mode — bias toward edge cases, suicidality, abuse, drug-interaction, fraud-attempt patterns. */
adversarialBias?: number; // 0..1 — share of cohort that is adversarial
/** Newborn-heavy mode — generate a higher proportion of newborns (for home-visit testing). */
newbornBias?: number; // 0..1
}
Module layout (code)
Mirrors Polaris closely, in scripts/synthetic-personas/:
scripts/synthetic-personas/
├── README.md
├── package.json (or use root tsx)
├── types.ts — All exported types (above)
├── data.ts — Catalogs: familyArchetypes, conditions, regions, names, vaccines
├── generate.ts — Main generator: cohort + family + child
├── generate.test.ts — Distribution + plausibility tests
├── timeline.ts — Derive visit / vaccine / Rx / message stream from PersonaSpec
├── llm.ts — Optional Claude calls for narrative texture
├── cli.ts — Command-line entrypoint: `pnpm synth --seed=42 --size=500`
└── index.ts — Public API surface
The first PR ships everything except llm.ts — natural-language rendering can come later. The structural cohort generator is the load-bearing piece.
What this gives us, immediately
- A deterministic 500-family cohort the backend test suite can load on every CI run.
- Stress-test modes for newborn flows, adolescent confidentiality, vaccine hesitancy, custody complexity.
- A staging-environment seed that lets a new engineer spin up a realistic-feeling Starlight without ever needing access to real PHI.
- An evaluation harness for AI features: run any new chart-side AI feature against the cohort and compute precision/recall by condition.
- Demo material that's defensibly synthetic — every screenshot we put on the website or in marketing comes from a cohort row with a deterministic seed; we can reproduce any image and prove it's not a real patient.
The next page, Implementation Plan, sequences the build by gate.