Serelora Agentic AI Foundation
Spencer Wozniak
Technology | January 20, 2026
Agentic AI System Overview
1. High-Level Architecture (Observed)
The system implements a router-orchestrator pattern with specialized expert agents. The architecture consists of:
- Entry Point:
ChatBotcomponent (src/app/platform/demo/_components/ChatBot.tsx) - React component that handles user interactions - Orchestrator:
/api/demo/manager/route.ts- Routes requests to expert agents after classification - Classifier:
/api/demo/manager/classify/route.ts- Determines which expert should handle a request - Expert Agents: Five specialized agents under
/api/demo/experts/:general/route.ts- Clinical reasoning and broad medical questionsmedications/route.ts- Medication-related queriesdocuments/route.ts- Provider documentation querieslab/route.ts- Lab results and diagnostic reportsnon-medical/route.ts- Greetings and non-clinical interactions
- Supporting Agents: Two utility agents:
suggestions/route.ts- Generates follow-up question suggestionsautocomplete/route.ts- Completes partial user queries
- System Prompt:
src/prompts/system_prompt.txt- Defines the AI assistant's identity and constraints - Expert Metadata:
src/data/experts.json- Defines available experts and their routing rules
2. Orchestrator / Coordinator
Primary Orchestrator: /api/demo/manager/route.ts
Entry Point: The POST function receives requests with:
messages: Conversation history arraypatient: Patient object withidandnameexpert: Expert name (required, must be classified first)chartContext: Optional context about what the user is viewing
Decision Making: The orchestrator does not make routing decisions itself. It requires the expert field to be pre-classified by the classifier endpoint. If expert is missing, it returns a 400 error.
Context Compression: Before forwarding to experts, the orchestrator compresses conversation history if it exceeds token limits:
- Maximum model tokens: 1,047,576
- Maximum output tokens: 256
- Token headroom: 2,000
- Chunk size: 10 messages per summarization
The compressHistory function:
- Estimates tokens using a rough formula:
wordCount * 1.5 + messageCount * 4 - If tokens exceed limit, takes oldest
CHUNK_SIZEmessages - Summarizes them using
gpt-5-nanowith a 3-5 bullet point prompt - Replaces the chunk with a single system message containing the summary
- Repeats until within token limits
Control Flow to Sub-Agents:
- Compresses message history if needed
- Constructs expert URL:
${req.nextUrl.origin}/api/demo/experts/${expert} - Makes HTTP POST request to expert endpoint with:
messages: Compressed or original message arraypatient: Patient objectchartContext: Chart context object
- Streams the expert's response back to the client using
ReadableStream - If expert does not return a stream, falls back to JSON response
Streaming: The orchestrator pipes the expert's streaming response through to the client without modification, maintaining the streaming format.