Serelora Agentic AI Foundation

Spencer Wozniak

Technology | January 20, 2026

Agentic AI System Overview

1. High-Level Architecture (Observed)

The system implements a router-orchestrator pattern with specialized expert agents. The architecture consists of:

  • Entry Point: ChatBot component (src/app/platform/demo/_components/ChatBot.tsx) - React component that handles user interactions
  • Orchestrator: /api/demo/manager/route.ts - Routes requests to expert agents after classification
  • Classifier: /api/demo/manager/classify/route.ts - Determines which expert should handle a request
  • Expert Agents: Five specialized agents under /api/demo/experts/:
    • general/route.ts - Clinical reasoning and broad medical questions
    • medications/route.ts - Medication-related queries
    • documents/route.ts - Provider documentation queries
    • lab/route.ts - Lab results and diagnostic reports
    • non-medical/route.ts - Greetings and non-clinical interactions
  • Supporting Agents: Two utility agents:
    • suggestions/route.ts - Generates follow-up question suggestions
    • autocomplete/route.ts - Completes partial user queries
  • System Prompt: src/prompts/system_prompt.txt - Defines the AI assistant's identity and constraints
  • Expert Metadata: src/data/experts.json - Defines available experts and their routing rules

2. Orchestrator / Coordinator

Primary Orchestrator: /api/demo/manager/route.ts

Entry Point: The POST function receives requests with:

  • messages: Conversation history array
  • patient: Patient object with id and name
  • expert: Expert name (required, must be classified first)
  • chartContext: Optional context about what the user is viewing

Decision Making: The orchestrator does not make routing decisions itself. It requires the expert field to be pre-classified by the classifier endpoint. If expert is missing, it returns a 400 error.

Context Compression: Before forwarding to experts, the orchestrator compresses conversation history if it exceeds token limits:

  • Maximum model tokens: 1,047,576
  • Maximum output tokens: 256
  • Token headroom: 2,000
  • Chunk size: 10 messages per summarization

The compressHistory function:

  1. Estimates tokens using a rough formula: wordCount * 1.5 + messageCount * 4
  2. If tokens exceed limit, takes oldest CHUNK_SIZE messages
  3. Summarizes them using gpt-5-nano with a 3-5 bullet point prompt
  4. Replaces the chunk with a single system message containing the summary
  5. Repeats until within token limits

Control Flow to Sub-Agents:

  1. Compresses message history if needed
  2. Constructs expert URL: ${req.nextUrl.origin}/api/demo/experts/${expert}
  3. Makes HTTP POST request to expert endpoint with:
    • messages: Compressed or original message array
    • patient: Patient object
    • chartContext: Chart context object
  4. Streams the expert's response back to the client using ReadableStream
  5. If expert does not return a stream, falls back to JSON response

Streaming: The orchestrator pipes the expert's streaming response through to the client without modification, maintaining the streaming format.