Ideation — Wildly High-Temperature

Posture: generate-many-then-prune. This page is intentionally over-imagined. Some of it is good. Some of it is wrong. The goal is to enumerate what could be before we anchor on what we'll build. Architecture and Implementation pages are the soberer drafts.

What's a "synthetic persona" when the persona is a kid?

The Polaris generator produces a buyer — one identity, one organization, one decision context. A pediatric "persona" is messier and more interesting:

A child is the patient. But the child is rarely the decision-maker.
A family is the unit of subscription, billing, and most communication. Multiple kids per family.
One or more guardians make decisions, hold the app, talk to the doctor, sign forms. Their relationship to each child can vary (mother, father, step-parent, grandparent-as-guardian, foster, court-appointed).
A household has demographics (state, ZIP, SES proxy, language preferences, transportation access).
The clinical timeline is a stream — well-child cadence, sick visits, follow-ups, vaccines, growth measurements, screenings, document uploads.

So a Starlight "persona" is really a Family with N children + M guardians + a clinical history per child + a billing relationship + a relational dynamic. That's the fundamental object.

A cornucopia of variation, brainstormed wide

We want a generator that can produce the long tail, not just the textbook center. Here's the wildly-overgenerated list of dimensions to vary, before we sober up.

Family structure

Two-parent nuclear, both biological.
Two-parent nuclear, one step-parent.
Single mother, sole custody.
Single father, sole custody.
Divorced parents, 50/50 shared custody, two households, sometimes two ZIP codes.
Divorced parents, primary + visitation, court order on file.
Grandmother as primary guardian (grandparenting due to parents' situation).
Foster family — kid recently placed, records partial.
Adoptive family — international adoption with limited birth history.
Two-mom or two-dad household.
Multigenerational household (parents + grandparents + kids together).
Military family, frequent moves, records from multiple states.
Family of recently-arrived immigrants — vaccine records from another country, language preferences not English.
Teen mom (pediatric patient herself, with her own infant in the practice).
Same family with a child who has high acuity AND a sibling who's always-fine — parent attention is asymmetric.

Children themselves

Newborn (0–4 weeks, just out of birth hospital, jaundice/feeding watch).
Infant (4 weeks – 12 months).
Toddler.
Preschool (3–5).
School-age (5–10).
Tween (10–13).
Adolescent (13–17, HEEADSSS-confidential, possibly transitioning to adult care soon).
Premature graduate (NICU history, currently 1y, complex trajectory).
Internationally adopted child (15-month-old joining a family that already has a 4-year-old bio kid).
Child of the practice's pediatrician — extra-touchy demographic.
Foster child whose records are incomplete.
Identical twins (do their charts diverge over time? yes — different sick visits, different growth percentiles).
Sibling with developmental delay alongside neurotypical sibling.
Child athlete needing UIL physicals every season.
Child who travels internationally (travel-vax needs, returning from somewhere with endemic illness).

Conditions, allergies, special situations

The textbook list and the long-tail. Mix freely.

Common chronic: asthma (mild intermittent → moderate persistent), eczema (atopic dermatitis), food allergies (peanut, tree nut, milk, egg, sesame, shellfish), seasonal allergies, ADHD, anxiety, depression, mild-to-moderate developmental delay, mild autism spectrum, T1D, growth-failure or short stature.

Less common but each will show up in any reasonable practice: epilepsy, congenital heart defects (post-repair), nephrotic syndrome, sickle-cell trait or disease, cystic fibrosis carrier or affected, juvenile idiopathic arthritis, Tourette syndrome, IBD (Crohn's/UC), migraine, PCOS in adolescents, gender-incongruence consultation needs, eating disorders (HEEADSSS visible), suicidality screening positive.

Acute / time-bounded: otitis media, viral URI, viral GI, hand-foot-mouth, croup, RSV bronchiolitis, strep, pneumonia, mononucleosis, fractures, sports concussion, dental abscess (for the adjacent dental future), insect-borne illness (regional — Lyme in NE, RMSF, West Nile).

Specific situations that drive pediatric workflow: tongue tie referral, undescended testicle, breastfeeding difficulty, postpartum depression flagged in mom (pediatrician catches), infant feeding/weight loss, school-readiness assessment, IEP / 504 plan paperwork, sports clearance, immigration physical, custody-transfer health summary, court-ordered psych eval, child protective services notification.

Vaccine status spectrum

Fully on schedule (boring center of the curve).
One dose late, otherwise on schedule.
Several catches needed because family transferred from another practice.
Vaccine-hesitant family — partial schedule, declined some boosters, well-articulated requests for spacing.
Religious / philosophical exemption on file (state-specific rules).
Internationally vaccinated, partial records, needs translation + reconciliation.
Premature infant on a modified schedule.
Immunocompromised — modified schedule, certain live vaccines deferred.
Recently re-engaged after a long gap — needs catch-up plan.

Parent-side variation (the human texture that LLM harnesses ride)

Engagement: super-engaged researcher who reads every paper / engaged + busy / engaged but anxious / disengaged / hostile.
Communication channel: text-first, email-first, phone-only, can-only-respond-in-evenings, doesn't open the parent app.
Language: English / Spanish / Vietnamese / Mandarin / ASL / etc.
Reading level / health literacy: plain / professional / medical (if parent is a clinician).
Anxiety: low / moderate / high (frequent calls about minor things) / catastrophizing.
Trust in medicine: high / mixed / low / actively skeptical (vaccine-hesitant, alt-med-curious).
Financial pressure: none / moderate / high (asks about generic alternatives, splits visits, late on subscription).
Time constraint: flexible / rigid (nights and weekends only).
Co-parenting dynamic: aligned / mostly aligned / actively at odds (one parent vaccinates, one objects).

Clinical timeline patterns

The textbook well-baby trajectory: birth → 1mo → 2mo → 4mo → 6mo → 9mo → 12mo → 15mo → 18mo → 2y → annually.
The "moved in" trajectory: came in at age 4 with partial records from another state.
The "high-acuity year" trajectory: T1D diagnosis, then weekly visits for a quarter, then quarterly steady-state.
The "concerning trajectory" trajectory: weight tracking dropped percentiles, leading to feeding eval, eventual referral.
The "perfect kid" trajectory — yearly well-visits and that's it.
The "ER-as-PCP" trajectory — gaps with ER visits documented, family in transition.

Document and artifact streams

Birth records, hospital discharge summary (auto-routable per the doc auto-routing).
Vaccine records from prior providers.
School physicals (annual, sport-specific, UIL).
Specialist consult letters (cardio, neuro, derm, GI, allergy, ophthalmology, ENT, ortho, psych).
ER visit summaries (when not Starlight-affiliated).
Lab results from external labs (Quest, LabCorp).
Imaging reports (X-rays, ultrasounds, CTs, MRIs).
IEP / 504 / school-team meeting notes.
Court documents (custody, child protective services contacts).
Insurance documents (even DPC families sometimes file for catastrophic).
Parent-uploaded photos (rashes, tongue, ears).
Audio recordings (cough sounds — useful for an asthma case).

Conversational artifacts (for the LLM harness side)

SMS threads with the doctor.
Parent-app message threads.
AI-triage conversations.
Voice memos / phone-call recordings.
Direct doctor calls.

What weird / spicy variants force tests we'd otherwise skip?

The high-temp generator should sometimes sample these:

A 14-year-old comes in alone. Mom is in the waiting room. Visit is HEEADSSS-confidential. App account is on Mom's phone but the conversation isn't visible to her. Test: confidentiality flow.
A divorced family — Mom holds primary, but Dad has access to records per court order. Both have parent-app accounts. Test: shared-record access without leaking conversation between Mom and Dad to each other.
A foster child arrived 3 months ago. Records from prior pediatrician haven't transferred. Test: partial-records fallback in the chart UI.
An internationally-adopted infant has a vaccine list in Korean. Test: foreign-language doc handling, translation pipeline.
Twins where one has T1D and the other doesn't. Test: chart switcher / family view that doesn't conflate.
A teen patient discloses suicidality in the parent app. Test: agentic safety monitor firing, doctor SMS within seconds.
A parent who is also a clinician asks technical questions in the app. Test: AI assistant doesn't dumb down.
A parent with limited English asks questions in Spanish. Test: language-aware response.
A parent who hasn't paid the subscription in two months tries to schedule a visit. Test: billing-state UX.
A 17-year-old whose chart is about to age out of pediatrics. Test: transition-of-care workflow.
A child with a CII Schedule prescription (ADHD Concerta) where Dad is asking for an early refill via the parent app. Test: refusal + clinical-judgment routing.
A child arrives with the parent who has an allegation of abuse documented elsewhere. Test: mandatory-reporting policy and clinician alert.

These are exactly the cases that "production from real data" wouldn't have on day one but we will hit in production. Generating them in synthetic data is how we find our bugs before our users do.

Wild ideas that may or may not survive

In keeping with high-temperature spirit:

Family-tree generator with marriages / divorces / step-relations so the same kid can show up in two practice records (Dad's account vs Mom's account) when shared custody is generated.
Twin-generator that diverges over time. Identical twins start with identical genetics, but their visit histories should diverge realistically (one gets RSV, the other doesn't; one wears glasses, the other has 20/20).
Disease-trajectory engine. Given an initial condition, plausibly progress it through visits — a kid diagnosed with mild asthma at age 4 has a different visit cadence and medication trajectory than a kid diagnosed at age 14 with EIA.
Medication-adherence stochasticity. Some families fill on time. Some skip refills. Some split doses. Synthetic data should model these patterns so the AI risk panel can learn to flag them.
Voice + audio synthesis. Couple this with TTS to generate the 6 ElevenLabs-style audio clips needed for AI scribe demos — synthetic patient + parent voices for the recording flow.
Conversation-trajectory generator. For each persona, generate a plausible 3-month parent-app message stream — varies in volume, tone, urgency, content.
"Day in the life" simulator. Generate a Tuesday: 4 visits scheduled, 9 inbox messages, 8 docs to triage. Real synthetic load for the clinician app prototype.
Generator-as-evaluator. When new chart-side AI features are built, run them against a 5,000-persona cohort and compute accuracy / false-positive / false-negative rates per condition. The cohort is the test set.
The "catastrophic case" stress test. Once a quarter, generate 100 persona-cases that should trip every escalation path (acute medical emergency, abuse pattern, suicidality, severe drug interaction, vaccine reaction, allergic reaction). Confirm every path actually triggers.
"Atlas migration" generator. Generate a synthetic incoming Atlas.MD export — partial PDFs, inconsistent fields, free-text dump — and test our migration pipeline against it.
Family persona cohorts for the marketing funnel. Marketing's pipeline shows the prospect → first-charge funnel (clinician-app marketing tab). A family-persona generator can stress-test the full onboarding flow with realistic refusal / drop-off behaviors.
Adversarial persona library. Deliberately try to trip the AI: parents who lie ("yes I gave the meds" — but the chart shows otherwise), parents who use medical jargon to sound informed but get the terms wrong, vaccine-hesitant parents who frame their objections as "just asking questions," teens who use slang and code words, agentic actors trying to extract another patient's PHI through clever prompts. Every adversarial persona is a security test case.
The "Atlas screen-scrape diff" persona. Generate a Starlight chart and the equivalent Atlas chart side-by-side, programmatically, for vs-Atlas marketing imagery — but never with real patient data.

What we're explicitly NOT doing

Not anonymizing real PHI. That's always sketchy and HIPAA's de-identification standards are strict. Pure generation has no derivation risk.
Not LLM-generating without seeded structure. A pure "generate 1000 patients" prompt to Claude clusters tightly around the mode. We need a structural skeleton (Polaris-style hierarchical sampling) that forces diversity, then use LLMs only where natural-language texture is needed (backstories, message threads, note narratives).
Not building face-realistic synthetic photos. Synthetic medical imagery (X-rays, photos of a rash) is a separate, harder problem and likely a follow-on system that uses diffusion models — keep this generator to text + structured data.
Not making this a product. The synthetic-persona library is internal infrastructure. The diagnostic-game and CME products use this infrastructure but aren't the same thing.

What survives into the architecture

After this overgeneration, the things that earn a slot in the actual architecture:

Survives	Reason
Family-as-first-class entity with N children + M guardians	Fundamental shape; cannot model anything realistic without it.
Hierarchical sampling: family-archetype → demographics → children → conditions → timeline	Polaris-pattern adapted; gives us deterministic distributions.
Closed-enum slugs (familyArchetype, conditionProfile, vaccineStatus, …)	Type-safety + auditability; avoids LLM drift in the structural layer.
Closed-enum constraints (peanutAllergy, T1D, custodyComplexity, languagePreference, …)	Drives plausibility filters and downstream UI behavior.
Sentiment/voice/backstory per guardian	LLM harness needs prompt material; same Polaris pattern.
Distribution invariants tested in CI	Same Polaris pattern; protects the long tail from being silently dropped.
Clinical timeline as a derived stream	Visit/vaccine/note/document streams generated from the persona's structural shape.
Cohort seed → reproducible test data	Same Polaris pattern; non-negotiable for CI.
Separate "high-temp adversarial" cohort generator	Specific edge-case mode for security/escalation tests.

The next page, Architecture, turns these into a concrete type system + module layout. After that, Implementation Plan sequences the build.

What's a "synthetic persona" when the persona is a kid?​

A cornucopia of variation, brainstormed wide​

Family structure​

Children themselves​

Conditions, allergies, special situations​

Vaccine status spectrum​

Parent-side variation (the human texture that LLM harnesses ride)​

Clinical timeline patterns​

Document and artifact streams​

Conversational artifacts (for the LLM harness side)​

What weird / spicy variants force tests we'd otherwise skip?​

Wild ideas that may or may not survive​

What we're explicitly NOT doing​

What survives into the architecture​