← Back to Index

7/24 Office

Self-Evolving AI Agent System — 26 Tools, 3500 Lines Pure Python, MCP/Skill Plugins, Three-Layer Memory, Self-Repair, 24/7 Production Organization: wangziqi06 (independent developer) Published: March 2026 Type: repo Report Type: PhD-Level Technical Analysis Report Date: April 2026

Table of Contents

1 Full Title and Attribution

Full Title: 7/24 Office — Self-Evolving AI Agent System

Repository: github.com/wangziqi06/724-office

License: MIT

Status: Production-running, actively developed (March–April 2026)

Stars: 1,136 (as of April 2026)

Languages: Python (100%)

Size: ~3,500 lines of pure Python across 8 files

Dependencies: 3 packages: croniter (cron parsing), lancedb (vector storage), websocket-client (ASR)

Design Philosophy:

Zero framework dependency. Every line is visible and debuggable. No LangChain, no LlamaIndex, no CrewAI — just the standard library + 3 small packages.

The name "7/24 Office" (七二四办公室) references 24/7 availability — the system is designed to run continuously as a personal AI agent that handles scheduling, file management, web search, video processing, memory recall, and self-diagnostics autonomously.

2 Authors and Team

7/24 Office is developed by wangziqi06, an independent developer operating as a solo author with AI co-development tools. The project was built in under 3 months and is running in production 24/7.

The author explicitly positions 7/24 Office as a counter-thesis to framework-heavy agent architectures: "No LangChain, no LlamaIndex, no CrewAI — just the standard library + 3 small packages." The codebase is deliberately compact (~3,500 lines) to remain fully comprehensible by a single developer, with every line visible and debuggable.

The project appears to originate from the Chinese developer ecosystem, with WeChat Work (Enterprise WeChat) as the primary messaging integration and iFlytek-compatible ASR for voice recognition. The README and code comments contain bilingual content (Chinese and English).

Development Methodology

The project was built "solo with AI co-development tools in under 3 months." This positions 7/24 Office as both a product of and a testament to AI-assisted development — the system itself is an AI agent, and it was built with AI agents.

3 Core Contribution

Key Contribution: 7/24 Office demonstrates that a production-grade, self-evolving AI agent system can be built in ~3,500 lines of pure Python with zero framework dependencies, featuring runtime tool creation, three-layer memory, MCP protocol integration, self-repair diagnostics, and 24/7 autonomous operation — proving that the complexity typically associated with agent frameworks (LangChain, LlamaIndex, CrewAI) is largely unnecessary.

What 7/24 Office Provides

  1. Tool Use Loop — OpenAI-compatible function calling with automatic retry, up to 20 iterations per conversation
  2. Three-Layer Memory — Session history (short-term) + LLM-compressed long-term memory + LanceDB vector retrieval (active recall)
  3. MCP Protocol Client — Self-implemented JSON-RPC (no MCP SDK), connects external MCP servers via stdio or HTTP transport
  4. Runtime Tool Creation — The agent can write, save, and load new Python tools at runtime via create_tool
  5. Self-Repair — Daily self-check, session health diagnostics, error log analysis, auto-notification on failure
  6. Cron Scheduling — One-shot and recurring tasks, persistent across restarts, timezone-aware
  7. Multi-Tenant Router — Docker-based auto-provisioning, one container per user, health-checked
  8. Multimodal — Image/video/file/voice/link handling, ASR (speech-to-text), vision via base64
  9. Web Search — Multi-engine (Tavily, web search, GitHub, HuggingFace) with automatic source routing
  10. Video Processing — Trim, add BGM, AI video generation via ffmpeg + API, exposed as tools

Key Innovation: Self-Evolution Through Runtime Tool Creation

The most architecturally significant feature is runtime tool creation — the agent can extend its own capabilities by writing new Python tools that persist across restarts. This creates a genuine self-evolution loop:

User request for new capability
        │
        ▼
Agent writes Python function
with @tool decorator
        │
        ▼
Function saved to plugins/ directory
        │
        ▼
Function loaded via exec() and
registered in tool registry
        │
        ▼
New tool available to LLM
in subsequent conversations
        │
        ▼
Agent can now handle requests
that were previously impossible

This is a meaningful implementation of open-ended tool evolution — the agent's capability space grows over time based on the tasks it encounters.

Architectural Philosophy

The project adheres to five design principles:

Principle Implementation
Zero framework dependency No LangChain/LlamaIndex/CrewAI; stdlib + 3 packages
Single-file tools Adding a capability = adding one function with @tool decorator
Edge-deployable Targets Jetson Orin Nano (8GB RAM, ARM64 + GPU); RAM budget <2GB
Self-evolving Runtime tool creation, self-diagnostics, auto-notification
Offline-capable Core works without cloud APIs (except LLM itself); local embeddings supported

4 Supported Solutions

Solution Type Support Level Tool(s) Used
24/7 personal AI assistant Primary use case Full system
Task scheduling and automation Built-in schedule, list_schedules, remove_schedule
File management Built-in read_file, write_file, edit_file, list_files
Web research Built-in web_search (multi-engine: Tavily, web, GitHub, HuggingFace)
Video processing Built-in trim_video, add_bgm, generate_video
Memory and recall Built-in search_memory, recall (vector semantic search)
System diagnostics Built-in self_check, diagnose
Media sending Built-in send_image, send_file, send_video, send_link
Shell execution Built-in exec (with timeout, default 60s, max 300s)
MCP tool extension Plugin system reload_mcp + any MCP-compatible server
Custom tool creation Self-evolution create_tool, list_custom_tools, remove_tool
Voice interaction Built-in WebSocket ASR pipeline (iFlytek-compatible)
Multi-tenant deployment Built-in Docker-based per-user isolation via router.py

Tool Categorization

Category Count Tools
Core 2 exec, message
Files 4 read_file, write_file, edit_file, list_files
Scheduling 3 schedule, list_schedules, remove_schedule
Media Send 4 send_image, send_file, send_video, send_link
Video 3 trim_video, add_bgm, generate_video
Search 1 web_search (multi-engine, auto-routing)
Memory 2 search_memory, recall
Diagnostics 2 self_check, diagnose
Plugins 3 create_tool, list_custom_tools, remove_tool
MCP 1 reload_mcp
Total 26 (includes tools registered by MCP servers at runtime)

5 LLM Integration

OpenAI-Compatible Function Calling

7/24 Office uses the OpenAI chat completions API format with function calling. It is designed to work with any OpenAI-compatible API provider.

Provider Configuration:

{
  "models": {
    "default": "deepseek-chat",
    "providers": {
      "deepseek-chat": {
        "api_base": "https://api.deepseek.com/v1",
        "api_key": "...",
        "model": "deepseek-chat",
        "max_tokens": 8192
      },
      "openai-gpt4": {
        "api_base": "https://api.openai.com/v1",
        "api_key": "...",
        "model": "gpt-4o",
        "max_tokens": 8192
      }
    }
  }
}

Multi-Provider Architecture:

The system supports multiple LLM providers and routes different tasks to different models:

Task Model Selection Strategy
Main conversation default provider (configurable)
Memory compression Prefers cheaper model (e.g., deepseek-chat) to avoid compatibility issues with thinking models
Embeddings Dedicated embedding API (e.g., text-embedding-3-small, 1024 dimensions)

This is a practical cost optimization — memory compression is a background task that doesn't need the most capable model, so it routes to cheaper providers.

Tool Use Loop

The core interaction pattern is a synchronous tool use loop with up to 20 iterations:

User message
    │
    ▼
Build system prompt (SOUL.md + AGENT.md + USER.md + time)
    │
    ▼
Inject retrieved memories into system prompt
    │
    ▼
Inject cross-session scheduler context
    │
    ▼
┌──►  Call LLM with messages + tool definitions
│       │
│       ├── No tool_calls → Return text response
│       │
│       └── Has tool_calls → Execute each tool
│               │
│               ▼
│           Append tool results to messages
│               │
└───────────────┘ (up to 20 iterations)
    │
    ▼
Save session (with overflow → memory compression)

Key implementation details:

  • Thread safety: Each session has its own lock (threading.Lock). Concurrent messages to the same session are serialized.
  • Performance tracking: Every chat call logs prep_time, llm_total_time, tool_count, and total_time in milliseconds.
  • Error handling: LLM API errors are caught and return user-friendly error messages. Tool execution errors are caught per-tool and returned as [error] strings to the LLM.
  • Image stripping: Before saving sessions, base64 image URLs are replaced with [image] text markers to prevent API errors in history replay and reduce storage.

Raw HTTP via urllib

A notable implementation choice: the system uses Python's urllib.request directly instead of httpx, requests, or any HTTP client library. All API calls — LLM, embedding, search, video generation — use raw urllib.request.Request with manual JSON serialization.

req = urllib.request.Request(url, data=data, headers=headers)
with urllib.request.urlopen(req, timeout=timeout) as resp:
    return json.loads(resp.read())

This is consistent with the zero-dependency philosophy but trades developer ergonomics for minimal footprint.

Reasoning Content Preservation

The system preserves reasoning_content from models that support chain-of-thought (e.g., DeepSeek's reasoning models):

def _serialize_assistant_msg(msg_data):
    result = {"role": "assistant"}
    result["content"] = msg_data.get("content") or None
    reasoning = msg_data.get("reasoning_content")
    if reasoning:
        result["reasoning_content"] = reasoning
    # ...

If the model returns reasoning content, it's preserved in the session history. If the model uses tool calls but doesn't return reasoning content, a placeholder "ok" is inserted to maintain API compatibility with certain providers.

6 Key Results

Production Metrics

Metric Value
Codebase size ~3,500 lines across 8 files
Built-in tools 26
Framework dependencies 0 (LangChain, LlamaIndex, CrewAI)
Package dependencies 3 (croniter, lancedb, websocket-client)
Production uptime target 24/7
Development time <3 months (solo developer + AI co-development)
GitHub stars 1,136 (April 2026)
Max tool loop iterations 20 per conversation
Session message limit 40 (overflow triggers memory compression)
Memory deduplication threshold 0.92 cosine similarity

Self-Diagnostics Report Format

The self_check tool generates comprehensive system health reports:

Diagnostic Area Metrics Collected
Session activity Active sessions today, total user/assistant/tool_call counts
System health Disk usage, memory usage, process status
Error logs Last 24h errors from application log
Scheduled tasks Active job count, next trigger times
Memory system Total memories stored, storage size
Session health Empty sessions, high tool_call ratios, potential issues

File Size Breakdown

File Size Purpose
tools.py 48KB Tool registry + 26 tool implementations + plugin system + MCP bridge
xiaowang.py 22KB Entry point, HTTP server, callbacks, debounce, ASR pipeline
router.py 17KB Multi-tenant Docker routing, container lifecycle
llm.py 14KB LLM API calls, tool use loop, session management
memory.py 13KB Three-layer memory: compress, deduplicate, retrieve
mcp_client.py 12KB MCP protocol client (JSON-RPC, stdio/HTTP transport)
scheduler.py 7KB Cron + one-shot scheduling, persistent jobs
self_check_tool.py 2KB Self-check diagnostic report generation
Total ~135KB ~3,500 lines

7 Reproducibility

Installation

git clone https://github.com/wangziqi06/724-office.git
cd 724-office
cp config.example.json config.json
# Edit config.json with your API keys

pip install croniter lancedb websocket-client
# Optional: pilk (for WeChat silk audio decoding)

mkdir -p workspace/memory workspace/files

python3 xiaowang.py

Configuration Requirements

Requirement Necessity Notes
OpenAI-compatible LLM API Required DeepSeek, OpenAI, Anthropic, or any compatible provider
Embedding API Required (for memory) OpenAI text-embedding-3-small or compatible
Messaging platform Required (for chat) WeChat Work credentials (token, guid, api_url)
Tavily API key Optional For high-quality web search
Search API key Optional For general web search
ASR credentials Optional For voice message transcription
Video generation API Optional For AI video generation
MCP servers Optional External tool servers (stdio or HTTP)

Personality Configuration

The system uses three optional Markdown files for personality and behavior:

File Purpose Content Type
SOUL.md Agent personality and behavior rules Character definition, communication style
AGENT.md Operational procedures and troubleshooting How-to guides, error handling procedures
USER.md User preferences and context Personal info, scheduling preferences

These files are read at every conversation turn and injected into the system prompt, allowing the agent's behavior to be customized without code changes.

Reproducibility Assessment

Factor Assessment
Code availability Fully open-source, MIT license
Minimal dependencies Only 3 packages, all pip-installable
Platform dependency WeChat Work integration is China-specific; would need adaptation for Slack/Discord/Telegram
LLM dependency Requires an OpenAI-compatible API; behavior varies by model
Hardware target Designed for edge deployment (Jetson Orin Nano, 8GB RAM)
Configuration complexity Multiple API keys required; config.example.json provides template
Documentation README covers architecture and setup; no extensive docs site

Limitations on Reproducibility

  1. Messaging platform coupling: The callback handler (handle_callback) is tightly coupled to WeChat Work's message format (cmd codes, msgTypes, fileId/fileAeskey fields). Adapting to other platforms requires rewriting xiaowang.py callback handling.
  2. Chinese ecosystem tools: ASR uses iFlytek-compatible WebSocket protocol; video generation uses a specific API format. These may not be available outside China.
  3. No test suite: Unlike Ouro Loop's 507 tests, 7/24 Office has no automated tests. Production validation relies on manual testing and 24/7 monitoring.
  4. Security surface: The exec tool executes arbitrary shell commands. The create_tool feature uses exec() to load arbitrary Python code. Production deployment requires trust boundaries.

8 Compute and API Costs

Runtime Resource Requirements

Resource Requirement
RAM target <2GB (designed for edge deployment)
CPU Minimal (event-driven, I/O-bound)
GPU Optional (for local inference or embedding)
Disk LanceDB storage + session JSON files + media files
Network Required for LLM API calls; optional for other features
Hardware target Jetson Orin Nano (8GB RAM, ARM64 + GPU)

Per-Conversation LLM Costs

Component Estimated Tokens Cost (DeepSeek Chat) Cost (GPT-4o)
System prompt (SOUL + AGENT + USER + time) ~1,000-3,000 ~$0.001 ~$0.01
Memory injection (top-K retrieved) ~200-1,000 ~$0.001 ~$0.005
Tool definitions (26 tools) ~3,000-4,000 ~$0.003 ~$0.02
User message + conversation history ~500-5,000 ~$0.005 ~$0.03
Per-iteration LLM response ~200-2,000 ~$0.002 ~$0.01
Tool loop (1-5 iterations typical) ~2,000-10,000 ~$0.01 ~$0.05
Total per conversation ~7,000-25,000 ~$0.02 ~$0.13

Background Processing Costs

Process Frequency Estimated Cost
Memory compression On session overflow (>40 messages) ~$0.01-$0.05 per compression
Embedding generation Per memory fact + per user query ~$0.001 per call
Self-check report Daily (scheduled task) ~$0.05-$0.10 per report
Scheduled task execution Per cron trigger ~$0.02-$0.10 per task

Cost Optimization Strategies

The system implements several cost-reducing patterns:

  1. Cheaper model for compression: Memory compression explicitly prefers deepseek-chat over the default model
  2. Message limit (40): Prevents unbounded context growth; overflow triggers background compression
  3. Image stripping: Base64 images removed from session history to reduce token count
  4. Deduplication (0.92 threshold): Prevents storing near-identical memories
  5. Truncated tool output: read_file caps output at 10,000 characters
  6. Debounce: Merges rapid-fire messages (3-second window) into single LLM calls

24/7 Running Cost Estimate

Usage Pattern Daily Conversations Estimated Daily Cost (DeepSeek) Estimated Daily Cost (GPT-4o)
Light (personal) 10-20 $0.20-$0.50 $1.30-$2.60
Moderate 50-100 $1.00-$2.50 $6.50-$13.00
Heavy (production) 200+ $4.00+ $26.00+

The edge-deployment design (Jetson Orin Nano target) suggests the hardware cost is a one-time ~$200-$500 investment, with ongoing costs dominated by LLM API fees.

9 Architecture Solution

System Architecture

The system follows a pipeline architecture with clear data flow from messaging platform to LLM to tools:

                    ┌─────────────────┐
                    │  WeChat Work    │
                    │  (Messaging     │
                    │   Platform)     │
                    └────────┬────────┘
                             │ HTTP callback
                    ┌────────▼────────┐
                    │  router.py      │
                    │                 │
                    │ Multi-tenant    │
                    │ Docker routing  │
                    │ Per-user        │
                    │ containers      │
                    └────────┬────────┘
                             │
                    ┌────────▼────────┐
                    │  xiaowang.py    │  Entry Point
                    │                 │
                    │ ┌─ HTTP server (ThreadingMixIn)
                    │ ├─ Callback dispatch (cmd/msgType)
                    │ ├─ Debounce (3s window, per-sender)
                    │ ├─ Media download (3 fallback paths)
                    │ ├─ ASR pipeline (WebSocket streaming)
                    │ └─ File persistence (monthly dirs)
                    └────────┬────────┘
                             │
                    ┌────────▼────────┐
                    │    llm.py       │  Core Loop
                    │                 │
                    │ ┌─ System prompt construction
                    │ │   (SOUL.md + AGENT.md + USER.md)
                    │ ├─ Memory retrieval + injection
                    │ ├─ Cross-session context bridge
                    │ ├─ Tool use loop (max 20 iterations)
                    │ ├─ Session management (40 msg limit)
                    │ └─ Image stripping for storage
                    └────────┬────────┘
                             │
              ┌──────────────┼──────────────┐
              │              │              │
     ┌────────▼───┐  ┌──────▼──────┐  ┌────▼────────┐
     │  tools.py   │  │ memory.py   │  │scheduler.py │
     │             │  │             │  │             │
     │ 26 built-in │  │ Compress:   │  │ Cron + once │
     │ tools       │  │  LLM extract│  │ jobs.json   │
     │ Plugin dir  │  │ Deduplicate:│  │ Persistent  │
     │ @tool deco  │  │  cosine sim │  │ TZ-aware    │
     │ MCP bridge  │  │ Retrieve:   │  │ Auto-notify │
     └──────┬──────┘  │  vector     │  │ on failure  │
            │         │  search     │  └─────────────┘
     ┌──────▼──────┐  └─────────────┘
     │mcp_client.py│
     │             │
     │ JSON-RPC    │
     │ stdio/HTTP  │
     │ Auto-       │
     │ reconnect   │
     │ Namespace:  │
     │ srv__tool   │
     └─────────────┘

Threading Model

The system uses a multi-threaded architecture (no async/await):

Thread Purpose Lifecycle
Main thread HTTP server (ThreadingMixIn spawns per-request threads) Process lifetime
Per-request threads Handle incoming webhooks → dispatch to callback handler Per HTTP request
Debounce timers threading.Timer per sender; fires after 3s of silence Created/cancelled per message
Chat lock threads Serialize concurrent messages to same session Per chat call
Memory compression Background thread for LLM-based compression of evicted messages Daemon, per overflow
Scheduler loop Background thread checking jobs every 10 seconds Daemon, process lifetime
Scheduler triggers Per-job execution threads Daemon, per trigger
MCP stdio readers Timeout-wrapped reader threads for subprocess communication Per MCP request
ASR streaming Audio streaming thread + WebSocket client thread Per voice message

The use of threading over asyncio is a deliberate design choice — it avoids the "function coloring problem" (async infecting the entire call stack) and keeps the codebase accessible to developers unfamiliar with async Python.

Data Persistence

project_root/
├── config.json            ← Master configuration
├── jobs.json              ← Persistent scheduler state (atomic writes)
├── sessions/              ← Session history (one JSON file per session)
│   ├── dm_USER_ID.json    ← DM session
│   ├── scheduler.json     ← Scheduler session
│   └── test.json          ← Test session
├── memory_db/             ← LanceDB vector storage
│   └── memories/          ← Vector table files
├── workspace/
│   ├── SOUL.md            ← Agent personality
│   ├── AGENT.md           ← Operational procedures
│   ├── USER.md            ← User context
│   ├── memory/            ← Keyword-searchable memory files
│   │   └── MEMORY.md      ← Long-term memory document
│   └── files/             ← Received/generated media
│       ├── index.json     ← File metadata index
│       └── 2026-03/       ← Monthly organized media
└── plugins/               ← Runtime-created tools
    └── *.py               ← Custom tool files

All persistent state uses atomic writes (write to .tmp then os.replace()) to prevent corruption on crash.

10 Component Breakdown

Component 1: xiaowang.py — Entry Point (22KB)

The application entry point serving multiple responsibilities:

Subcomponent LOC (est.) Purpose
Configuration loading ~30 JSON config, environment variables, directory setup
Module initialization ~20 Init messaging, LLM, scheduler, tools, memory in dependency order
File persistence ~50 Monthly-organized media storage with metadata index
ASR pipeline ~120 WebSocket streaming speech-to-text with HMAC authentication
Message debouncing ~60 3-second per-sender debounce with fragment merging
Callback handler ~120 Message type dispatch (text, image, video, file, voice, link, location)
Media download ~80 Three-path media download: enterprise → personal → direct HTTP
HTTP server ~40 Threaded HTTP server with GET (health) and POST (callback) endpoints

Debounce Architecture:

Message 1 ──► Buffer[sender_id] = [msg1]     Timer(3s) started
Message 2 ──► Buffer[sender_id] = [msg1,msg2] Timer reset
Message 3 ──► Buffer[sender_id] = [msg1,msg2,msg3] Timer reset
              ... 3 seconds pass ...
Timer fires ──► Flush: merge texts, collect images
              ──► llm.chat(merged_text, session_key, images)
              ──► Split reply into ≤1800-byte chunks
              ──► Send chunks with 0.5s spacing

This prevents rapid-fire messages from creating multiple independent LLM calls, which would be wasteful and produce fragmented responses.

Media Download — Three-Path Fallback:

Has fileId + fileAeskey?
    │
    YES → Enterprise download API (wxWorkDownload)
    │     │
    │     ├── Success → return path
    │     └── Fail → continue
    │
Has fileAuthKey?
    │
    YES → Personal download API (wxDownload)
    │     │
    │     ├── Success → return path
    │     └── Fail → continue
    │
Has fileHttpUrl?
    │
    YES → Direct HTTP download (urllib.request.urlretrieve)
    │     │
    │     ├── Success → return path
    │     └── Fail → all methods failed
    │
    NO → all methods failed

Component 2: llm.py — Core Loop (14KB)

The LLM interaction layer implementing the tool use loop:

Subcomponent LOC (est.) Purpose
Provider management ~30 Load config, get default provider
LLM API calling ~30 Raw urllib request to chat completions endpoint
Session management ~80 Load/save sessions, handle overflow, strip images
Multimodal building ~40 Image-to-base64 encoding, multimodal message construction
System prompt ~60 SOUL/AGENT/USER loading + scheduler context injection
Tool use loop ~70 Core loop: LLM call → tool execution → repeat (max 20)

Cross-Session Context Bridge:

A notable design pattern — the scheduler runs tasks in its own session (scheduler), but the user sees results in their DM session (dm_USER_ID). To maintain context:

def _get_recent_scheduler_context():
    # Read scheduler session file
    # Check freshness (2-hour window)
    # Find last message tool call content
    # Inject into DM session system prompt

This allows the user to respond to scheduled task output (e.g., a self-check report) in their normal chat flow, with the LLM aware of what was sent.

Component 3: tools.py — Tool Registry (48KB)

The largest file, containing the complete tool system:

Subcomponent LOC (est.) Purpose
Registry + decorator ~40 @tool decorator, get_definitions(), execute()
Core tools (exec, message) ~40 Shell execution with timeout; message sending with chunking
File tools ~70 read/write/edit/list with workspace-relative paths
Scheduler tools ~30 CRUD for scheduled tasks
Media send tools ~60 Image/file/video/link sending via messaging API
Video processing tools ~100 ffmpeg trim, BGM mixing, AI video generation
Web search ~150 Multi-engine search (Tavily, web, GitHub, HuggingFace)
Memory search tools ~50 Keyword search (grep) + semantic search (vector retrieval)
Self-check tool ~50 System diagnostics report generation
Plugin system ~80 Plugin loading, runtime tool creation, MCP bridge

The @tool Decorator:

def tool(name, description, properties, required=None):
    def decorator(fn):
        _registry[name] = {
            "fn": fn,
            "definition": {
                "type": "function",
                "function": {
                    "name": name,
                    "description": description,
                    "parameters": {
                        "type": "object",
                        "properties": properties,
                        **({"required": required} if required else {}),
                    },
                },
            },
        }
        return fn
    return decorator

This single decorator handles both tool registration (for execution) and definition generation (for LLM function calling), keeping tool declaration co-located with implementation.

Multi-Engine Web Search:

The web_search tool implements intelligent source routing:

Query → Auto-detect source
         │
         ├── Contains "huggingface/hf model" → HuggingFace API
         ├── Contains "github.com/github repo" → GitHub API
         ├── Contains "verify/exist/plugin/mcp" → All engines
         └── Default → Dual-engine (Tavily + web)

Each engine:
├── Tavily: Advanced search with AI summary + relevance scores
├── Web: General search API with snippets
├── GitHub: Repo search (stars-sorted) + code search fallback
└── HuggingFace: Model search (downloads-sorted) with pipeline tags

Component 4: memory.py — Three-Layer Memory (13KB)

The memory system implementing the compress-deduplicate-retrieve pipeline:

Subcomponent LOC (est.) Purpose
Init + public API ~60 LanceDB connection, table creation, 4 public functions
Embedding ~30 OpenAI-compatible embedding API calls
Compression ~100 LLM-based structured memory extraction
Deduplication ~40 Cosine similarity against existing memories
Storage ~30 LanceDB vector table operations
Retrieval ~30 Vector search + result formatting

Memory Schema (LanceDB):

{
    "id": "uuid",           # Unique memory identifier
    "fact": "string",       # Complete factual statement
    "keywords": "[json]",   # Keyword array (JSON-serialized)
    "persons": "[json]",    # Person names involved
    "timestamp": "string",  # YYYY-MM-DD HH:MM or empty
    "topic": "string",      # Topic category
    "session_key": "string",# Source session
    "created_at": float,    # Unix timestamp
    "vector": [float*1024]  # 1024-dim embedding
}

Component 5: mcp_client.py — MCP Protocol Client (12KB)

A self-implemented MCP client with zero SDK dependency:

Subcomponent LOC (est.) Purpose
MCPServer class ~200 Single server lifecycle, JSON-RPC, tool discovery
Stdio transport ~60 Subprocess stdin/stdout communication with timeout
HTTP transport ~20 POST JSON-RPC to HTTP endpoint
Protocol methods ~40 initialize, tools/list, tools/call
Module-level API ~60 init(), get_all_tool_defs(), execute(), reload(), shutdown()

MCP Protocol Implementation:

The client implements only the three essential MCP methods:

  1. initialize — handshake with protocol version and client info
  2. tools/list — discover available tools on the server
  3. tools/call — execute a tool with arguments

Tool Namespacing:

MCP tools are namespaced with double underscore: servername__toolname. This prevents name collisions between MCP servers and built-in tools.

Auto-Reconnect:

On ConnectionError or TimeoutError during tools/call, the client automatically: 1. Shuts down the current process 2. Starts a new subprocess 3. Re-runs initialize and tools/list 4. Retries the original call

Component 6: scheduler.py — Task Scheduling (7KB)

Persistent task scheduling with cron support:

Feature Implementation
One-shot tasks delay_seconds → trigger at time.time() + delay
Recurring tasks cron_expr → croniter-based next-trigger calculation
One-shot cron cron_expr + once=True → triggers once at next cron match
Persistence jobs.json with atomic writes
Check interval 10-second polling loop (background thread)
Timezone CST (UTC+8) aware — croniter uses local timezone datetime
Heartbeat Log task status every 30 minutes
Failure handling On task failure, sends notification via LLM chat

Scheduler → LLM Integration:

When a scheduled task triggers, it calls chat_fn(message, "scheduler") — sending the task's message to the LLM as if it were a user message in the "scheduler" session. The LLM can then use any tool (including message to notify the owner). This creates a powerful automation loop:

Cron trigger → scheduler._trigger()
    → chat_fn("Run self-check and send report to owner", "scheduler")
        → LLM invokes self_check tool
        → LLM invokes message tool with report
        → Owner receives daily diagnostic report

Component 7: router.py — Multi-Tenant Docker Router (17KB)

Docker-based multi-tenant isolation:

Feature Implementation
Per-user containers Auto-provision Docker container on first message
Health checks Periodic container health verification
Request routing Forward HTTP callbacks to appropriate container
Container lifecycle Start, stop, restart, cleanup
Resource isolation Docker-level resource limits per user

This component enables the system to serve multiple users with complete isolation — each user gets their own container instance with independent sessions, memory, and workspace.

11 Core Mechanisms (Detailed)

Mechanism 1: Three-Layer Memory Pipeline

The memory system is the most architecturally sophisticated component, implementing a three-stage pipeline inspired by human memory models:

Layer 1 — Session Memory (Short-Term):

┌─────────────────────────────────────┐
│ Session file: dm_USER_ID.json       │
│                                     │
│ Last 40 messages (JSON array)       │
│ ├── user messages                   │
│ ├── assistant messages              │
│ ├── tool call results               │
│ └── system context                  │
│                                     │
│ On overflow (>40 messages):         │
│   evicted = messages[:-40]          │
│   messages = messages[-40:]         │
│   compress_async(evicted)           │
│                                     │
│ On load:                            │
│   Skip to first user message        │
│   (clean truncation boundary)       │
└─────────────────────────────────────┘

Layer 2 — Compressed Memory (Long-Term):

┌──────────────────────────────────────────────────┐
│ Background compression thread                     │
│                                                   │
│ 1. Format evicted messages into dialogue text     │
│ 2. Send to LLM with COMPRESS_PROMPT              │
│ 3. LLM extracts structured facts:                │
│    {                                              │
│      "fact": "User prefers meetings at 10am",    │
│      "keywords": ["meeting", "schedule"],         │
│      "persons": ["User"],                         │
│      "timestamp": "2026-03-15 10:00",            │
│      "topic": "preferences"                       │
│    }                                              │
│ 4. Generate embeddings for each fact              │
│ 5. Deduplicate against existing memories          │
│    (cosine similarity > 0.92 → skip)             │
│ 6. Store in LanceDB vector table                  │
└──────────────────────────────────────────────────┘

Layer 3 — Retrieval (Active Recall):

┌──────────────────────────────────────────────────┐
│ On every user message:                            │
│                                                   │
│ 1. Embed user message text                        │
│ 2. Vector search in LanceDB (top-K=5)           │
│ 3. Filter out seed data and low-quality results  │
│ 4. Format as "[Relevant Memories]" block         │
│ 5. Inject into system prompt                      │
│                                                   │
│ Result (injected before LLM call):               │
│ [Relevant Memories]                               │
│ - User prefers meetings at 10am (2026-03-15)     │
│ - Client deadline is April 15 (2026-03-10)       │
│ - Project uses React + TypeScript (2026-03-01)   │
└──────────────────────────────────────────────────┘

Compression Prompt Engineering:

The COMPRESS_PROMPT is carefully designed to extract only long-term-valuable information:

Rules:
- Only extract information with long-term value
  (preferences, plans, contacts, decisions, facts)
- Skip chitchat, greetings, repeated confirmations,
  pure tool call results
- Replace "he/she/I" with specific names
- Replace "tomorrow/next week" with specific dates
- If nothing worth remembering, return empty array []

This filtering is critical — without it, the memory would fill with low-value conversational noise, degrading retrieval quality.

Deduplication via Cosine Similarity:

Before storing a new memory, it is compared against the most similar existing memory using vector cosine similarity. If similarity exceeds 0.92, the new memory is skipped. This prevents near-duplicate accumulation:

def _cosine_similarity(a, b):
    dot = sum(x * y for x, y in zip(a, b))
    norm_a = sum(x * x for x in a) ** 0.5
    norm_b = sum(x * x for x in b) ** 0.5
    if norm_a == 0 or norm_b == 0:
        return 0
    return dot / (norm_a * norm_b)

Note: The implementation uses a pure-Python dot product calculation rather than NumPy, consistent with the minimal-dependency philosophy.

Mechanism 2: Runtime Tool Creation (Self-Evolution)

The self-evolution mechanism allows the agent to create new tools at runtime:

Agent receives request for capability it doesn't have
    │
    ▼
Agent uses create_tool to write a new Python function
    │
    ▼
Function is written to plugins/ directory as .py file
    │
    ▼
File is loaded via exec() with @tool decorator available
    │
    ▼
New tool registered in _registry with OpenAI function schema
    │
    ▼
Subsequent LLM calls include new tool in tool_defs
    │
    ▼
Agent can now use the new tool in conversations
    │
    ▼
Tool persists across restarts (loaded from plugins/ on startup)

Plugin Loading Mechanism:

def _exec_plugin(code, source=" "):
    exec(compile(code, source, "exec"), {
        "__builtins__": __builtins__,
        "tool": tool,   # The @tool decorator
        "log": log,     # Logger
    })

The exec() call provides a controlled environment with access to: - __builtins__ — Python built-ins - tool — the decorator for tool registration - log — the application logger

Security Considerations:

This is the most security-sensitive component. The agent can write and execute arbitrary Python code. Mitigations include: - Single-user mode enforcement (OWNER_IDS whitelist) - Workspace-relative path resolution - Per-tool error handling (tool crashes don't crash the system) - No automatic execution on untrusted input (requires LLM decision)

However, there is no sandboxing, code review, or capability restriction on created tools — a genuinely powerful but potentially dangerous feature.

Mechanism 3: MCP Protocol Bridge

The MCP client bridges external MCP servers into the agent's tool ecosystem:

Agent tool_defs = built-in tools + plugin tools + MCP tools
                           │
                    ┌──────▼──────┐
                    │ LLM decides │
                    │ which tool  │
                    │ to call     │
                    └──────┬──────┘
                           │
              ┌────────────┼────────────┐
              │            │            │
         Built-in     Plugin tool    MCP tool
         tool          from          (server__name)
         (exec,        plugins/       │
         message)      dir            │
              │            │     ┌────▼────┐
              │            │     │ Parse   │
              │            │     │ name    │
              │            │     │ split   │
              │            │     │ on "__" │
              │            │     └────┬────┘
              │            │          │
              │            │     ┌────▼────┐
              │            │     │ Route   │
              │            │     │ to MCP  │
              │            │     │ server  │
              │            │     └────┬────┘
              │            │          │
              │            │     JSON-RPC
              │            │     tools/call
              │            │          │
              │            │     Response
              │            │     content
              ▼            ▼          ▼
         Execute      Execute    Parse MCP
         Python fn    Python fn  response
                                 (text
                                  concat)

MCP to OpenAI Schema Conversion:

# MCP format
{
    "name": "search_notes",
    "description": "Search through notes",
    "inputSchema": {"type": "object", "properties": {...}}
}

# Converted to OpenAI format
{
    "type": "function",
    "function": {
        "name": "notes_server__search_notes",  # Namespaced
        "description": "Search through notes",
        "parameters": {"type": "object", "properties": {...}}
    }
}

The inputSchema from MCP maps directly to parameters in OpenAI format — same JSON Schema structure, just renamed.

Hot-Reload Support:

The reload_mcp tool allows runtime reconfiguration:

def reload(config):
    old_names = set(_servers.keys())
    shutdown()          # Close all existing connections
    init(config)        # Connect with new config
    new_names = set(_servers.keys())
    added = new_names - old_names
    removed = old_names - new_names
    return added, removed, len(_servers)

Mechanism 4: Self-Repair Diagnostics

The self-check mechanism runs daily (via scheduled task) and generates comprehensive system health reports:

Daily self-check scheduled task
    │
    ▼
tool_self_check() collects:
├── Session activity (today's active sessions, message counts)
├── System health (disk, memory, processes)
├── Error log analysis (last 24h from application log)
├── Scheduler status (active jobs, next trigger times)
├── Memory system status (memory count, storage size)
└── Session health diagnostics
    ├── Empty sessions detection
    ├── High tool_call ratio detection
    └── Potential issue flagging
    │
    ▼
Report formatted as structured text
    │
    ▼
LLM analyzes report + decides on actions
    │
    ▼
message tool sends summary to owner

Session Health Detection:

The self-check scans all sessions for potential issues: - Empty sessions — sessions with no messages (possibly corrupted) - High tool_call ratio — sessions where tool calls dominate (agent may be stuck in a loop) - Stale sessions — sessions not updated recently

Mechanism 5: Debounce and Message Aggregation

The debounce system prevents rapid-fire messages from creating multiple LLM calls:

# Per-sender buffer with thread-safe access
_debounce_buffers = {}  # sender_id -> [{"text": str, "images": [path]}]
_debounce_timers = {}   # sender_id -> threading.Timer
_debounce_lock = threading.Lock()

Timing diagram:

Time ──────────────────────────────────────────►
     │         │    │              │
     msg1      msg2 msg3          flush
     │         │    │              │
     ├─ timer ─┤    │              │
     │  (3s)   │    │              │
     │         ├────┤              │
     │         timer│              │
     │         (3s) │              │
     │              ├── timer(3s) ─┤
     │              │              │
     └──────────────┴──────────────┘
     Buffer: [msg1, msg2, msg3]
                                   └─► Flush:
                                       merge texts
                                       collect images
                                       single LLM call

This is particularly important for messaging platforms where users often send messages in rapid succession (split thoughts across multiple messages).

Mechanism 6: Cross-Session Context Bridging

A subtle but important design pattern that solves the problem of context fragmentation across sessions:

Problem: The scheduler runs tasks in the scheduler session, but the user chats in the dm_USER_ID session. When the scheduler sends a self-check report, the user may respond in their DM, but the DM session has no context about what was sent.

Solution:

def _get_recent_scheduler_context():
    # 1. Read scheduler session file
    # 2. Check freshness (2-hour window)
    # 3. Find last message tool call content
    # 4. Truncate to 800 chars
    # 5. Format with timestamp
    # 6. Return for injection into DM system prompt

The DM system prompt includes recent scheduler output:

[Agent recently sent via scheduled task (09:00)]
Today's self-check report:
- Sessions: 15 active
- Errors: 2 warnings
- Memory: 142 facts stored
...(truncated)

This is only injected when: - The scheduler session file was modified within the last 2 hours - The current session is NOT the scheduler session (prevents circular injection) - The scheduler actually sent a message (not just processed internally)

12 Programming Language

Implementation Language: Python (100%)

The entire system is written in Python with a deliberate minimal-dependency philosophy:

Standard Library Usage:

Module Usage
http.server, socketserver HTTP server with ThreadingMixIn
json All serialization (config, sessions, tools, API calls)
os, shutil File operations, directory management
subprocess Shell execution (exec tool), ffmpeg, git
threading Multi-threaded architecture (locks, timers, daemon threads)
urllib.request, urllib.parse, urllib.error All HTTP client operations
base64 Image encoding for multimodal LLM calls
hashlib, hmac HMAC-SHA256 for ASR authentication
logging Structured application logging
time, datetime Timestamps, timezone handling (CST/UTC)
uuid Memory record identifiers
struct Binary data handling (for media processing)
ssl SSL configuration for WebSocket ASR

External Dependencies (3 packages):

Package Purpose Size
croniter Cron expression parsing for scheduler Lightweight
lancedb Embedded vector database for memory Moderate (includes Lance format)
websocket-client WebSocket communication for ASR Lightweight
pilk (optional) WeChat SILK audio format decoding Lightweight

Code Style

The codebase uses a distinctive style that prioritizes readability and debuggability:

  1. Single-file modules — each file is a complete, self-contained module
  2. Module-level state — global variables with explicit init functions (no classes for state management)
  3. %-formatting over f-strings in many places — consistent with C/Unix tradition
  4. Explicit error handling — try/except blocks at every I/O boundary
  5. Comments in mixed languages — English for public API, Chinese for implementation details
  6. No type annotations — consistent with rapid prototyping approach
  7. No abstract base classes — concrete implementations only

Architecture Anti-Patterns (Intentional Tradeoffs)

The codebase makes several deliberate tradeoffs:

Pattern Convention 7/24 Office Choice Rationale
HTTP client requests or httpx urllib.request Zero dependency
Async I/O asyncio threading Simpler mental model
Type safety Type annotations None Rapid prototyping
Dependency injection Constructor injection Global state + init() Fewer abstraction layers
Testing pytest suite Manual testing Production as test environment
Configuration Pydantic/dataclass Raw dict from JSON Minimal boilerplate

These are reasonable tradeoffs for a solo-developer, production-running system where the author is the primary user and maintainer.

13 Memory Management

Memory Architecture Overview

7/24 Office implements a sophisticated three-layer memory system that maps roughly to human memory models:

┌─────────────────────────────────────────────────────┐
│                Human Analogy                         │
│                                                     │
│  Working Memory    ←→   Session (40 messages)       │
│  Episodic Memory   ←→   Compressed (LLM-extracted)  │
│  Semantic Memory   ←→   Retrieved (vector search)   │
└─────────────────────────────────────────────────────┘

Layer 1: Session Memory (Working Memory)

Storage: JSON files in sessions/ directory, one per session key.

Capacity: Last 40 messages per session.

Overflow handling:

if len(messages) > MAX_SESSION_MESSAGES:
    evicted = messages[:-MAX_SESSION_MESSAGES]
    messages = messages[-MAX_SESSION_MESSAGES:]
    mem_mod.compress_async(evicted, session_key)

Truncation boundary cleanup:

After truncation, orphan messages may appear at the start (tool results without matching assistant messages). The system skips to the first user or system message:

while messages and messages[0].get("role") not in ("user", "system"):
    messages.pop(0)

Image handling:

Before saving, base64 image URLs are replaced with [image] text markers:

# Before: {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,/9j/..."}}
# After:  {"type": "text", "text": "[image]"}

This prevents: 1. Session files growing unboundedly with large base64 strings 2. API errors from LLMs that don't accept image_url in history messages

Layer 2: Compressed Memory (Episodic/Long-Term)

Trigger: Automatic, whenever session messages exceed 40 and overflow occurs.

Process:

  1. Filter messages — keep only user and assistant text messages (skip tool calls, empty content)
  2. Format as dialogue"User: ...\nAssistant: ..." format
  3. LLM extraction — send dialogue to LLM with structured extraction prompt
  4. Parse JSON output — extract array of {fact, keywords, persons, timestamp, topic} objects
  5. Generate embeddings — embed each fact using OpenAI-compatible embedding API
  6. Deduplicate — compare each new fact against existing memories via cosine similarity
  7. Store — insert non-duplicate memories into LanceDB vector table

LLM Output Parsing (Robust):

The parser handles common LLM output variations:

text = content.strip()
if text.startswith("```"):
    # Remove markdown code fences
    lines = [l for l in text.split("\n") if not l.strip().startswith("```")]
    text = "\n".join(lines)

try:
    result = json.loads(text)
except json.JSONDecodeError:
    # Fallback: find content between [ and ]
    start = text.find("[")
    end = text.rfind("]")
    if start >= 0 and end > start:
        result = json.loads(text[start:end + 1])

Layer 3: Retrieved Memory (Active Recall)

Trigger: Every user message, before LLM call.

Process:

  1. Embed user message text
  2. Vector search in LanceDB (default top-K=5)
  3. Filter out seed data and low-quality results
  4. Format as [Relevant Memories] block
  5. Append to system prompt

Zero-latency cache:

For hardware/voice channels where latency is critical, the system maintains a pre-computed memory summary cache:

_context_cache = {}  # session_key -> str

def get_cached_context(session_key):
    return _context_cache.get(session_key, "")

Dual Memory Search Interface

The system provides two memory search tools with complementary strengths:

Tool Method Use Case
search_memory Keyword search (grep -r -i) in workspace/memory/ Exact term matching, file-level search
recall Vector semantic search in LanceDB Meaning-based recall, fuzzy matching

The keyword search is scoped by scope parameter: - all — search all memory files - long — search only MEMORY.md (persistent long-term document) - daily — search only daily log files (matching 2*.md pattern)

Memory System Comparison

Aspect LangChain Memory 7/24 Office Memory
Architecture Pluggable abstractions Fixed three-layer pipeline
Dependencies LangChain + vector store adapter LanceDB only
Compression User-implemented Built-in LLM extraction
Deduplication User-implemented Built-in (cosine 0.92)
LOC ~500+ (with abstractions) ~300
Configuration Code-level setup JSON config
Persistence Depends on adapter LanceDB files + JSON sessions

14 Continued Learning

Self-Evolution Through Runtime Tool Creation

The most distinctive learning mechanism in 7/24 Office is runtime tool creation — the agent genuinely extends its own capabilities based on encountered tasks.

Evolution loop:

Conversation 1: "Can you check Bitcoin price?"
    → Agent has no crypto tool
    → Agent creates crypto_price tool via create_tool
    → Tool saved to plugins/crypto_price.py

Conversation 2: "What's the price of ETH?"
    → Agent now has crypto_price tool
    → Executes it directly
    → No tool creation needed

System restart:
    → plugins/ directory scanned
    → crypto_price.py loaded via exec()
    → Tool available immediately

This is a genuine form of open-ended self-evolution — the agent's capability space grows monotonically based on user interactions. Unlike fine-tuning (which requires retraining), this is immediate and persistent.

Limitations:

  1. No tool improvement — existing tools are not automatically refined or optimized
  2. No tool composition — new tools don't automatically compose with existing ones
  3. Quality depends on LLM — the quality of created tools depends on the LLM's code generation capability
  4. No sandboxing — created tools run with full Python permissions
  5. No versioning — no history of tool modifications

Memory-Based Behavioral Adaptation

The three-layer memory system provides implicit behavioral learning:

  1. Preference learning — compressed memories include user preferences ("User prefers meetings at 10am"), which are retrieved and injected into future conversations
  2. Context accumulation — facts about projects, people, and deadlines persist across sessions, allowing the agent to maintain long-term context
  3. Pattern recognition — over time, the memory accumulates patterns that influence the agent's responses (e.g., remembering that a particular approach worked for a type of problem)

Personality Evolution via Markdown Files

The SOUL.md, AGENT.md, and USER.md files provide a manual learning mechanism:

  • SOUL.md — can be updated to refine agent personality and communication style
  • AGENT.md — can be updated with new troubleshooting procedures based on encountered issues
  • USER.md — can be updated with new user context as preferences change

These files are read on every conversation turn, so changes take effect immediately.

Self-Check as Learning Signal

The daily self-check report provides a learning signal:

  1. Error pattern detection — recurring errors in the log suggest systematic issues
  2. Session health metrics — high tool_call ratios may indicate inefficient tool use patterns
  3. Memory growth tracking — memory count trends indicate information accumulation rate

However, the system does not automatically act on these signals — the self-check report is sent to the owner for human review.

Comparison with Other Learning Approaches

System Learning Mechanism Persistence Scope
7/24 Office Runtime tool creation + memory compression Disk (plugins/ + LanceDB) Single instance
LangChain agents No built-in learning Session-only (default) Per-session
AutoGPT Task-based memory File system Per-task
OpenEvolve Evolutionary program improvement Program database Per-experiment
Ouro Loop Reflective log + BOUND evolution JSONL + CLAUDE.md Per-project
Devin Proprietary session memory Cloud Per-workspace

7/24 Office's runtime tool creation is unique in that it provides permanent capability expansion rather than just information retention.

15 Applications

Application 1: 24/7 Personal AI Assistant

The primary use case — a continuously running AI agent that handles daily tasks:

Task Type How It's Handled
Scheduling "Remind me to call client at 3pm" → creates cron task
File management "Save this as a report" → writes to workspace
Research "Find recent papers on RAG" → multi-engine web search
Media processing "Trim this video from 0:30 to 1:20" → ffmpeg operation
Memory "What did we discuss about the project last week?" → vector recall
System status Daily self-check reports sent automatically

Application 2: Edge-Deployed AI Agent

Designed for deployment on embedded hardware:

Target Jetson Orin Nano
RAM 8GB (agent uses <2GB)
CPU ARM64
GPU Available for local inference
Storage Local SSD for LanceDB + sessions
Connectivity WiFi/Ethernet for API calls

This enables AI agent deployment in scenarios where cloud hosting is undesirable (privacy, latency, cost) but LLM API access is available.

Application 3: Self-Evolving Tool Platform

The runtime tool creation enables organic capability growth:

Month 1: 26 built-in tools
Month 2: 26 + 5 custom tools (crypto prices, weather, RSS feeds, ...)
Month 3: 26 + 12 custom tools (project-specific automation, ...)
Month 6: 26 + 30+ custom tools (full personal automation suite)

Each tool is a single Python function with @tool decorator — no framework boilerplate, no configuration files, no deployment pipelines.

Application 4: MCP Tool Aggregator

By connecting multiple MCP servers, 7/24 Office becomes a unified interface to diverse tool ecosystems:

{
  "mcp_servers": {
    "filesystem": {
      "transport": "stdio",
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-filesystem"]
    },
    "github": {
      "transport": "stdio",
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-github"]
    },
    "database": {
      "transport": "http",
      "url": "http://localhost:3000/mcp"
    }
  }
}

All MCP tools appear alongside built-in tools in the LLM's tool definitions, creating a unified agent that can interact with file systems, APIs, databases, and custom services through a single conversation interface.

Application 5: Automated Monitoring and Notification

Combining the scheduler with the tool use loop creates a monitoring system:

Scheduled task: "Every hour, check server status and notify if issues"
    │
    ▼
Scheduler triggers → LLM receives message
    │
    ▼
LLM uses exec tool: "curl -s http://server/health"
    │
    ▼
LLM analyzes response → decides if notification needed
    │
    ▼
If issue: LLM uses message tool to alert owner
If OK: LLM does nothing (no notification spam)

Application 6: Multi-Tenant Production Service

Via router.py, the system can serve multiple users with complete isolation:

User A message → Router → Container A (own session, memory, workspace)
User B message → Router → Container B (own session, memory, workspace)
User C message → Router → Container C (auto-provisioned on first message)

Each container runs its own instance of the agent with independent: - Session files - Memory database - Workspace - Plugins - Scheduled tasks

System Architecture Dependencies Memory Self-Evolution Deployment
7/24 Office 8 files, ~3.5K LOC 3 packages 3-layer (session/compressed/retrieval) Runtime tool creation Edge/cloud, Docker multi-tenant
LangChain Framework (100K+ LOC) 100+ packages Pluggable adapters No built-in Cloud
AutoGPT Agent framework Many File-based Task-based learning Cloud
CrewAI Multi-agent framework LangChain + Shared memory No built-in Cloud
Ouro Loop Methodology framework (3 files) 0 Reflective log (30 entries) BOUND evolution Any agent
Devin Proprietary full agent Proprietary Cloud-based Proprietary Cloud only

7/24 Office occupies a unique niche: it is a complete, production-running AI agent system that is small enough to be fully understood by a single developer, yet capable enough to run 24/7 with self-repair, memory, scheduling, and tool evolution. Its zero-framework approach is a philosophical statement as much as an engineering choice — proving that agent systems don't need massive frameworks to be production-viable.

Open Questions and Future Directions

  1. Tool quality assurance: How to ensure runtime-created tools are correct and safe? Could the self-check system validate plugin health?
  2. Multi-agent coordination: The multi-tenant router isolates users, but could agents collaborate across containers?
  3. Memory optimization: The compression prompt could be improved to extract more structured information (relations, causality, temporal sequences).
  4. Offline LLM support: The edge deployment story would be strengthened by support for local LLMs (Ollama, llama.cpp) — currently the system requires an API endpoint.
  5. Observability: The system logs extensively but lacks structured metrics (Prometheus, OpenTelemetry) for production monitoring.
  6. Security hardening: The exec tool and create_tool feature need sandboxing for multi-user deployments. The current OWNER_IDS whitelist is insufficient for production multi-tenant use.
  7. Internationalization: Adapting the WeChat Work integration to Slack/Discord/Telegram would significantly expand the potential user base.