Serelora Agentic AI Foundation

Spencer Wozniak

Technology | January 20, 2026

Agentic AI System Overview

1. High-Level Architecture (Observed)

The system implements a router-orchestrator pattern with specialized expert agents. The architecture consists of:

Entry Point: ChatBot component (src/app/platform/demo/_components/ChatBot.tsx) - React component that handles user interactions
Orchestrator: /api/demo/manager/route.ts - Routes requests to expert agents after classification
Classifier: /api/demo/manager/classify/route.ts - Determines which expert should handle a request
Expert Agents: Five specialized agents under /api/demo/experts/:
- general/route.ts - Clinical reasoning and broad medical questions
- medications/route.ts - Medication-related queries
- documents/route.ts - Provider documentation queries
- lab/route.ts - Lab results and diagnostic reports
- non-medical/route.ts - Greetings and non-clinical interactions
Supporting Agents: Two utility agents:
- suggestions/route.ts - Generates follow-up question suggestions
- autocomplete/route.ts - Completes partial user queries
System Prompt: src/prompts/system_prompt.txt - Defines the AI assistant's identity and constraints
Expert Metadata: src/data/experts.json - Defines available experts and their routing rules

2. Orchestrator / Coordinator

Primary Orchestrator: `/api/demo/manager/route.ts`

Entry Point: The POST function receives requests with:

messages: Conversation history array
patient: Patient object with id and name
expert: Expert name (required, must be classified first)
chartContext: Optional context about what the user is viewing

Decision Making: The orchestrator does not make routing decisions itself. It requires the expert field to be pre-classified by the classifier endpoint. If expert is missing, it returns a 400 error.

Context Compression: Before forwarding to experts, the orchestrator compresses conversation history if it exceeds token limits:

Maximum model tokens: 1,047,576
Maximum output tokens: 256
Token headroom: 2,000
Chunk size: 10 messages per summarization

The compressHistory function:

Estimates tokens using a rough formula: wordCount * 1.5 + messageCount * 4
If tokens exceed limit, takes oldest CHUNK_SIZE messages
Summarizes them using gpt-5-nano with a 3-5 bullet point prompt
Replaces the chunk with a single system message containing the summary
Repeats until within token limits

Control Flow to Sub-Agents:

Compresses message history if needed
Constructs expert URL: ${req.nextUrl.origin}/api/demo/experts/${expert}
Makes HTTP POST request to expert endpoint with:
- messages: Compressed or original message array
- patient: Patient object
- chartContext: Chart context object
Streams the expert's response back to the client using ReadableStream
If expert does not return a stream, falls back to JSON response

Streaming: The orchestrator pipes the expert's streaming response through to the client without modification, maintaining the streaming format.

Agentic AI System Overview

1. High-Level Architecture (Observed)

2. Orchestrator / Coordinator

Primary Orchestrator: /api/demo/manager/route.ts

Primary Orchestrator: `/api/demo/manager/route.ts`