← Back to Index

7/24 Office

Self-Evolving AI Agent System — 26 Tools, 3500 Lines Pure Python, MCP/Skill Plugins, Three-Layer Memory, Self-Repair, 24/7 Production Organization: wangziqi06 (independent developer) Published: March 2026 Type: repo Report Type: PhD-Level Technical Analysis Report Date: April 2026

Full Title and Attribution
Authors and Team
Core Contribution
Supported Solutions
LLM Integration
Key Results
Reproducibility
Compute and API Costs
Architecture Solution
Component Breakdown
Core Mechanisms (Detailed)
Programming Language
Memory Management
Continued Learning
Applications

1 Full Title and Attribution

Full Title: 7/24 Office — Self-Evolving AI Agent System

Repository: github.com/wangziqi06/724-office

License: MIT

Status: Production-running, actively developed (March–April 2026)

Stars: 1,136 (as of April 2026)

Languages: Python (100%)

Size: ~3,500 lines of pure Python across 8 files

Dependencies: 3 packages: croniter (cron parsing), lancedb (vector storage), websocket-client (ASR)

Design Philosophy:

Zero framework dependency. Every line is visible and debuggable. No LangChain, no LlamaIndex, no CrewAI — just the standard library + 3 small packages.

The name "7/24 Office" (七二四办公室) references 24/7 availability — the system is designed to run continuously as a personal AI agent that handles scheduling, file management, web search, video processing, memory recall, and self-diagnostics autonomously.

2 Authors and Team

7/24 Office is developed by wangziqi06, an independent developer operating as a solo author with AI co-development tools. The project was built in under 3 months and is running in production 24/7.

The author explicitly positions 7/24 Office as a counter-thesis to framework-heavy agent architectures: "No LangChain, no LlamaIndex, no CrewAI — just the standard library + 3 small packages." The codebase is deliberately compact (~3,500 lines) to remain fully comprehensible by a single developer, with every line visible and debuggable.

The project appears to originate from the Chinese developer ecosystem, with WeChat Work (Enterprise WeChat) as the primary messaging integration and iFlytek-compatible ASR for voice recognition. The README and code comments contain bilingual content (Chinese and English).

Development Methodology

The project was built "solo with AI co-development tools in under 3 months." This positions 7/24 Office as both a product of and a testament to AI-assisted development — the system itself is an AI agent, and it was built with AI agents.

3 Core Contribution

Key Contribution: 7/24 Office demonstrates that a production-grade, self-evolving AI agent system can be built in ~3,500 lines of pure Python with zero framework dependencies, featuring runtime tool creation, three-layer memory, MCP protocol integration, self-repair diagnostics, and 24/7 autonomous operation — proving that the complexity typically associated with agent frameworks (LangChain, LlamaIndex, CrewAI) is largely unnecessary.

What 7/24 Office Provides

Tool Use Loop — OpenAI-compatible function calling with automatic retry, up to 20 iterations per conversation
Three-Layer Memory — Session history (short-term) + LLM-compressed long-term memory + LanceDB vector retrieval (active recall)
MCP Protocol Client — Self-implemented JSON-RPC (no MCP SDK), connects external MCP servers via stdio or HTTP transport
Runtime Tool Creation — The agent can write, save, and load new Python tools at runtime via create_tool
Self-Repair — Daily self-check, session health diagnostics, error log analysis, auto-notification on failure
Cron Scheduling — One-shot and recurring tasks, persistent across restarts, timezone-aware
Multi-Tenant Router — Docker-based auto-provisioning, one container per user, health-checked
Multimodal — Image/video/file/voice/link handling, ASR (speech-to-text), vision via base64
Web Search — Multi-engine (Tavily, web search, GitHub, HuggingFace) with automatic source routing
Video Processing — Trim, add BGM, AI video generation via ffmpeg + API, exposed as tools

Key Innovation: Self-Evolution Through Runtime Tool Creation

The most architecturally significant feature is runtime tool creation — the agent can extend its own capabilities by writing new Python tools that persist across restarts. This creates a genuine self-evolution loop:

User request for new capability
        │
        ▼
Agent writes Python function
with @tool decorator
        │
        ▼
Function saved to plugins/ directory
        │
        ▼
Function loaded via exec() and
registered in tool registry
        │
        ▼
New tool available to LLM
in subsequent conversations
        │
        ▼
Agent can now handle requests
that were previously impossible

This is a meaningful implementation of open-ended tool evolution — the agent's capability space grows over time based on the tasks it encounters.

Architectural Philosophy

The project adheres to five design principles:

Principle	Implementation
Zero framework dependency	No LangChain/LlamaIndex/CrewAI; stdlib + 3 packages
Single-file tools	Adding a capability = adding one function with `@tool` decorator
Edge-deployable	Targets Jetson Orin Nano (8GB RAM, ARM64 + GPU); RAM budget <2GB
Self-evolving	Runtime tool creation, self-diagnostics, auto-notification
Offline-capable	Core works without cloud APIs (except LLM itself); local embeddings supported

4 Supported Solutions

Solution Type	Support Level	Tool(s) Used
24/7 personal AI assistant	Primary use case	Full system
Task scheduling and automation	Built-in	`schedule`, `list_schedules`, `remove_schedule`
File management	Built-in	`read_file`, `write_file`, `edit_file`, `list_files`
Web research	Built-in	`web_search` (multi-engine: Tavily, web, GitHub, HuggingFace)
Video processing	Built-in	`trim_video`, `add_bgm`, `generate_video`
Memory and recall	Built-in	`search_memory`, `recall` (vector semantic search)
System diagnostics	Built-in	`self_check`, `diagnose`
Media sending	Built-in	`send_image`, `send_file`, `send_video`, `send_link`
Shell execution	Built-in	`exec` (with timeout, default 60s, max 300s)
MCP tool extension	Plugin system	`reload_mcp` + any MCP-compatible server
Custom tool creation	Self-evolution	`create_tool`, `list_custom_tools`, `remove_tool`
Voice interaction	Built-in	WebSocket ASR pipeline (iFlytek-compatible)
Multi-tenant deployment	Built-in	Docker-based per-user isolation via `router.py`

Tool Categorization

Category	Count	Tools
Core	2	`exec`, `message`
Files	4	`read_file`, `write_file`, `edit_file`, `list_files`
Scheduling	3	`schedule`, `list_schedules`, `remove_schedule`
Media Send	4	`send_image`, `send_file`, `send_video`, `send_link`
Video	3	`trim_video`, `add_bgm`, `generate_video`
Search	1	`web_search` (multi-engine, auto-routing)
Memory	2	`search_memory`, `recall`
Diagnostics	2	`self_check`, `diagnose`
Plugins	3	`create_tool`, `list_custom_tools`, `remove_tool`
MCP	1	`reload_mcp`
Total	26	(includes tools registered by MCP servers at runtime)

5 LLM Integration

OpenAI-Compatible Function Calling

7/24 Office uses the OpenAI chat completions API format with function calling. It is designed to work with any OpenAI-compatible API provider.

Provider Configuration:

{
  "models": {
    "default": "deepseek-chat",
    "providers": {
      "deepseek-chat": {
        "api_base": "https://api.deepseek.com/v1",
        "api_key": "...",
        "model": "deepseek-chat",
        "max_tokens": 8192
      },
      "openai-gpt4": {
        "api_base": "https://api.openai.com/v1",
        "api_key": "...",
        "model": "gpt-4o",
        "max_tokens": 8192
      }
    }
  }
}

Multi-Provider Architecture:

The system supports multiple LLM providers and routes different tasks to different models:

Task	Model Selection Strategy
Main conversation	`default` provider (configurable)
Memory compression	Prefers cheaper model (e.g., `deepseek-chat`) to avoid compatibility issues with thinking models
Embeddings	Dedicated embedding API (e.g., `text-embedding-3-small`, 1024 dimensions)

This is a practical cost optimization — memory compression is a background task that doesn't need the most capable model, so it routes to cheaper providers.

Tool Use Loop

The core interaction pattern is a synchronous tool use loop with up to 20 iterations:

User message
    │
    ▼
Build system prompt (SOUL.md + AGENT.md + USER.md + time)
    │
    ▼
Inject retrieved memories into system prompt
    │
    ▼
Inject cross-session scheduler context
    │
    ▼
┌──►  Call LLM with messages + tool definitions
│       │
│       ├── No tool_calls → Return text response
│       │
│       └── Has tool_calls → Execute each tool
│               │
│               ▼
│           Append tool results to messages
│               │
└───────────────┘ (up to 20 iterations)
    │
    ▼
Save session (with overflow → memory compression)

Key implementation details:

Thread safety: Each session has its own lock (threading.Lock). Concurrent messages to the same session are serialized.
Performance tracking: Every chat call logs prep_time, llm_total_time, tool_count, and total_time in milliseconds.
Error handling: LLM API errors are caught and return user-friendly error messages. Tool execution errors are caught per-tool and returned as [error] strings to the LLM.
Image stripping: Before saving sessions, base64 image URLs are replaced with [image] text markers to prevent API errors in history replay and reduce storage.

Raw HTTP via urllib

A notable implementation choice: the system uses Python's urllib.request directly instead of httpx, requests, or any HTTP client library. All API calls — LLM, embedding, search, video generation — use raw urllib.request.Request with manual JSON serialization.

req = urllib.request.Request(url, data=data, headers=headers)
with urllib.request.urlopen(req, timeout=timeout) as resp:
    return json.loads(resp.read())

This is consistent with the zero-dependency philosophy but trades developer ergonomics for minimal footprint.

Reasoning Content Preservation

The system preserves reasoning_content from models that support chain-of-thought (e.g., DeepSeek's reasoning models):

def _serialize_assistant_msg(msg_data):
    result = {"role": "assistant"}
    result["content"] = msg_data.get("content") or None
    reasoning = msg_data.get("reasoning_content")
    if reasoning:
        result["reasoning_content"] = reasoning
    # ...

If the model returns reasoning content, it's preserved in the session history. If the model uses tool calls but doesn't return reasoning content, a placeholder "ok" is inserted to maintain API compatibility with certain providers.

6 Key Results

Production Metrics

Metric	Value
Codebase size	~3,500 lines across 8 files
Built-in tools	26
Framework dependencies	0 (LangChain, LlamaIndex, CrewAI)
Package dependencies	3 (`croniter`, `lancedb`, `websocket-client`)
Production uptime target	24/7
Development time	<3 months (solo developer + AI co-development)
GitHub stars	1,136 (April 2026)
Max tool loop iterations	20 per conversation
Session message limit	40 (overflow triggers memory compression)
Memory deduplication threshold	0.92 cosine similarity

Self-Diagnostics Report Format

The self_check tool generates comprehensive system health reports:

Diagnostic Area	Metrics Collected
Session activity	Active sessions today, total user/assistant/tool_call counts
System health	Disk usage, memory usage, process status
Error logs	Last 24h errors from application log
Scheduled tasks	Active job count, next trigger times
Memory system	Total memories stored, storage size
Session health	Empty sessions, high tool_call ratios, potential issues

File Size Breakdown

File	Size	Purpose
`tools.py`	48KB	Tool registry + 26 tool implementations + plugin system + MCP bridge
`xiaowang.py`	22KB	Entry point, HTTP server, callbacks, debounce, ASR pipeline
`router.py`	17KB	Multi-tenant Docker routing, container lifecycle
`llm.py`	14KB	LLM API calls, tool use loop, session management
`memory.py`	13KB	Three-layer memory: compress, deduplicate, retrieve
`mcp_client.py`	12KB	MCP protocol client (JSON-RPC, stdio/HTTP transport)
`scheduler.py`	7KB	Cron + one-shot scheduling, persistent jobs
`self_check_tool.py`	2KB	Self-check diagnostic report generation
Total	~135KB	~3,500 lines

7 Reproducibility

Installation

git clone https://github.com/wangziqi06/724-office.git
cd 724-office
cp config.example.json config.json
# Edit config.json with your API keys

pip install croniter lancedb websocket-client
# Optional: pilk (for WeChat silk audio decoding)

mkdir -p workspace/memory workspace/files

python3 xiaowang.py

Configuration Requirements

Requirement	Necessity	Notes
OpenAI-compatible LLM API	Required	DeepSeek, OpenAI, Anthropic, or any compatible provider
Embedding API	Required (for memory)	OpenAI `text-embedding-3-small` or compatible
Messaging platform	Required (for chat)	WeChat Work credentials (token, guid, api_url)
Tavily API key	Optional	For high-quality web search
Search API key	Optional	For general web search
ASR credentials	Optional	For voice message transcription
Video generation API	Optional	For AI video generation
MCP servers	Optional	External tool servers (stdio or HTTP)

Personality Configuration

The system uses three optional Markdown files for personality and behavior:

File	Purpose	Content Type
`SOUL.md`	Agent personality and behavior rules	Character definition, communication style
`AGENT.md`	Operational procedures and troubleshooting	How-to guides, error handling procedures
`USER.md`	User preferences and context	Personal info, scheduling preferences

These files are read at every conversation turn and injected into the system prompt, allowing the agent's behavior to be customized without code changes.

Reproducibility Assessment

Factor	Assessment
Code availability	Fully open-source, MIT license
Minimal dependencies	Only 3 packages, all pip-installable
Platform dependency	WeChat Work integration is China-specific; would need adaptation for Slack/Discord/Telegram
LLM dependency	Requires an OpenAI-compatible API; behavior varies by model
Hardware target	Designed for edge deployment (Jetson Orin Nano, 8GB RAM)
Configuration complexity	Multiple API keys required; `config.example.json` provides template
Documentation	README covers architecture and setup; no extensive docs site

Limitations on Reproducibility

Messaging platform coupling: The callback handler (handle_callback) is tightly coupled to WeChat Work's message format (cmd codes, msgTypes, fileId/fileAeskey fields). Adapting to other platforms requires rewriting xiaowang.py callback handling.
Chinese ecosystem tools: ASR uses iFlytek-compatible WebSocket protocol; video generation uses a specific API format. These may not be available outside China.
No test suite: Unlike Ouro Loop's 507 tests, 7/24 Office has no automated tests. Production validation relies on manual testing and 24/7 monitoring.
Security surface: The exec tool executes arbitrary shell commands. The create_tool feature uses exec() to load arbitrary Python code. Production deployment requires trust boundaries.

8 Compute and API Costs

Runtime Resource Requirements

Resource	Requirement
RAM target	<2GB (designed for edge deployment)
CPU	Minimal (event-driven, I/O-bound)
GPU	Optional (for local inference or embedding)
Disk	LanceDB storage + session JSON files + media files
Network	Required for LLM API calls; optional for other features
Hardware target	Jetson Orin Nano (8GB RAM, ARM64 + GPU)

Per-Conversation LLM Costs

Component	Estimated Tokens	Cost (DeepSeek Chat)	Cost (GPT-4o)
System prompt (SOUL + AGENT + USER + time)	~1,000-3,000	~$0.001	~$0.01
Memory injection (top-K retrieved)	~200-1,000	~$0.001	~$0.005
Tool definitions (26 tools)	~3,000-4,000	~$0.003	~$0.02
User message + conversation history	~500-5,000	~$0.005	~$0.03
Per-iteration LLM response	~200-2,000	~$0.002	~$0.01
Tool loop (1-5 iterations typical)	~2,000-10,000	~$0.01	~$0.05
Total per conversation	~7,000-25,000	~$0.02	~$0.13

Background Processing Costs

Process	Frequency	Estimated Cost
Memory compression	On session overflow (>40 messages)	~$0.01-$0.05 per compression
Embedding generation	Per memory fact + per user query	~$0.001 per call
Self-check report	Daily (scheduled task)	~$0.05-$0.10 per report
Scheduled task execution	Per cron trigger	~$0.02-$0.10 per task

Cost Optimization Strategies

The system implements several cost-reducing patterns:

Cheaper model for compression: Memory compression explicitly prefers deepseek-chat over the default model
Message limit (40): Prevents unbounded context growth; overflow triggers background compression
Image stripping: Base64 images removed from session history to reduce token count
Deduplication (0.92 threshold): Prevents storing near-identical memories
Truncated tool output: read_file caps output at 10,000 characters
Debounce: Merges rapid-fire messages (3-second window) into single LLM calls

24/7 Running Cost Estimate

Usage Pattern	Daily Conversations	Estimated Daily Cost (DeepSeek)	Estimated Daily Cost (GPT-4o)
Light (personal)	10-20	$0.20-$0.50	$1.30-$2.60
Moderate	50-100	$1.00-$2.50	$6.50-$13.00
Heavy (production)	200+	$4.00+	$26.00+

The edge-deployment design (Jetson Orin Nano target) suggests the hardware cost is a one-time ~$200-$500 investment, with ongoing costs dominated by LLM API fees.

9 Architecture Solution

System Architecture

The system follows a pipeline architecture with clear data flow from messaging platform to LLM to tools:

                    ┌─────────────────┐
                    │  WeChat Work    │
                    │  (Messaging     │
                    │   Platform)     │
                    └────────┬────────┘
                             │ HTTP callback
                    ┌────────▼────────┐
                    │  router.py      │
                    │                 │
                    │ Multi-tenant    │
                    │ Docker routing  │
                    │ Per-user        │
                    │ containers      │
                    └────────┬────────┘
                             │
                    ┌────────▼────────┐
                    │  xiaowang.py    │  Entry Point
                    │                 │
                    │ ┌─ HTTP server (ThreadingMixIn)
                    │ ├─ Callback dispatch (cmd/msgType)
                    │ ├─ Debounce (3s window, per-sender)
                    │ ├─ Media download (3 fallback paths)
                    │ ├─ ASR pipeline (WebSocket streaming)
                    │ └─ File persistence (monthly dirs)
                    └────────┬────────┘
                             │
                    ┌────────▼────────┐
                    │    llm.py       │  Core Loop
                    │                 │
                    │ ┌─ System prompt construction
                    │ │   (SOUL.md + AGENT.md + USER.md)
                    │ ├─ Memory retrieval + injection
                    │ ├─ Cross-session context bridge
                    │ ├─ Tool use loop (max 20 iterations)
                    │ ├─ Session management (40 msg limit)
                    │ └─ Image stripping for storage
                    └────────┬────────┘
                             │
              ┌──────────────┼──────────────┐
              │              │              │
     ┌────────▼───┐  ┌──────▼──────┐  ┌────▼────────┐
     │  tools.py   │  │ memory.py   │  │scheduler.py │
     │             │  │             │  │             │
     │ 26 built-in │  │ Compress:   │  │ Cron + once │
     │ tools       │  │  LLM extract│  │ jobs.json   │
     │ Plugin dir  │  │ Deduplicate:│  │ Persistent  │
     │ @tool deco  │  │  cosine sim │  │ TZ-aware    │
     │ MCP bridge  │  │ Retrieve:   │  │ Auto-notify │
     └──────┬──────┘  │  vector     │  │ on failure  │
            │         │  search     │  └─────────────┘
     ┌──────▼──────┐  └─────────────┘
     │mcp_client.py│
     │             │
     │ JSON-RPC    │
     │ stdio/HTTP  │
     │ Auto-       │
     │ reconnect   │
     │ Namespace:  │
     │ srv__tool   │
     └─────────────┘

Threading Model

The system uses a multi-threaded architecture (no async/await):

Thread	Purpose	Lifecycle
Main thread	HTTP server (`ThreadingMixIn` spawns per-request threads)	Process lifetime
Per-request threads	Handle incoming webhooks → dispatch to callback handler	Per HTTP request
Debounce timers	`threading.Timer` per sender; fires after 3s of silence	Created/cancelled per message
Chat lock threads	Serialize concurrent messages to same session	Per chat call
Memory compression	Background thread for LLM-based compression of evicted messages	Daemon, per overflow
Scheduler loop	Background thread checking jobs every 10 seconds	Daemon, process lifetime
Scheduler triggers	Per-job execution threads	Daemon, per trigger
MCP stdio readers	Timeout-wrapped reader threads for subprocess communication	Per MCP request
ASR streaming	Audio streaming thread + WebSocket client thread	Per voice message

The use of threading over asyncio is a deliberate design choice — it avoids the "function coloring problem" (async infecting the entire call stack) and keeps the codebase accessible to developers unfamiliar with async Python.

Data Persistence

project_root/
├── config.json            ← Master configuration
├── jobs.json              ← Persistent scheduler state (atomic writes)
├── sessions/              ← Session history (one JSON file per session)
│   ├── dm_USER_ID.json    ← DM session
│   ├── scheduler.json     ← Scheduler session
│   └── test.json          ← Test session
├── memory_db/             ← LanceDB vector storage
│   └── memories/          ← Vector table files
├── workspace/
│   ├── SOUL.md            ← Agent personality
│   ├── AGENT.md           ← Operational procedures
│   ├── USER.md            ← User context
│   ├── memory/            ← Keyword-searchable memory files
│   │   └── MEMORY.md      ← Long-term memory document
│   └── files/             ← Received/generated media
│       ├── index.json     ← File metadata index
│       └── 2026-03/       ← Monthly organized media
└── plugins/               ← Runtime-created tools
    └── *.py               ← Custom tool files

All persistent state uses atomic writes (write to .tmp then os.replace()) to prevent corruption on crash.

10 Component Breakdown

Component 1: `xiaowang.py` — Entry Point (22KB)

The application entry point serving multiple responsibilities:

Subcomponent	LOC (est.)	Purpose
Configuration loading	~30	JSON config, environment variables, directory setup
Module initialization	~20	Init messaging, LLM, scheduler, tools, memory in dependency order
File persistence	~50	Monthly-organized media storage with metadata index
ASR pipeline	~120	WebSocket streaming speech-to-text with HMAC authentication
Message debouncing	~60	3-second per-sender debounce with fragment merging
Callback handler	~120	Message type dispatch (text, image, video, file, voice, link, location)
Media download	~80	Three-path media download: enterprise → personal → direct HTTP
HTTP server	~40	Threaded HTTP server with GET (health) and POST (callback) endpoints

Debounce Architecture:

Message 1 ──► Buffer[sender_id] = [msg1]     Timer(3s) started
Message 2 ──► Buffer[sender_id] = [msg1,msg2] Timer reset
Message 3 ──► Buffer[sender_id] = [msg1,msg2,msg3] Timer reset
              ... 3 seconds pass ...
Timer fires ──► Flush: merge texts, collect images
              ──► llm.chat(merged_text, session_key, images)
              ──► Split reply into ≤1800-byte chunks
              ──► Send chunks with 0.5s spacing

This prevents rapid-fire messages from creating multiple independent LLM calls, which would be wasteful and produce fragmented responses.

Media Download — Three-Path Fallback:

Has fileId + fileAeskey?
    │
    YES → Enterprise download API (wxWorkDownload)
    │     │
    │     ├── Success → return path
    │     └── Fail → continue
    │
Has fileAuthKey?
    │
    YES → Personal download API (wxDownload)
    │     │
    │     ├── Success → return path
    │     └── Fail → continue
    │
Has fileHttpUrl?
    │
    YES → Direct HTTP download (urllib.request.urlretrieve)
    │     │
    │     ├── Success → return path
    │     └── Fail → all methods failed
    │
    NO → all methods failed

Component 2: `llm.py` — Core Loop (14KB)

The LLM interaction layer implementing the tool use loop:

Subcomponent	LOC (est.)	Purpose
Provider management	~30	Load config, get default provider
LLM API calling	~30	Raw urllib request to chat completions endpoint
Session management	~80	Load/save sessions, handle overflow, strip images
Multimodal building	~40	Image-to-base64 encoding, multimodal message construction
System prompt	~60	SOUL/AGENT/USER loading + scheduler context injection
Tool use loop	~70	Core loop: LLM call → tool execution → repeat (max 20)

Cross-Session Context Bridge:

A notable design pattern — the scheduler runs tasks in its own session (scheduler), but the user sees results in their DM session (dm_USER_ID). To maintain context:

def _get_recent_scheduler_context():
    # Read scheduler session file
    # Check freshness (2-hour window)
    # Find last message tool call content
    # Inject into DM session system prompt

This allows the user to respond to scheduled task output (e.g., a self-check report) in their normal chat flow, with the LLM aware of what was sent.

Component 3: `tools.py` — Tool Registry (48KB)

The largest file, containing the complete tool system:

Subcomponent	LOC (est.)	Purpose
Registry + decorator	~40	`@tool` decorator, `get_definitions()`, `execute()`
Core tools (exec, message)	~40	Shell execution with timeout; message sending with chunking
File tools	~70	read/write/edit/list with workspace-relative paths
Scheduler tools	~30	CRUD for scheduled tasks
Media send tools	~60	Image/file/video/link sending via messaging API
Video processing tools	~100	ffmpeg trim, BGM mixing, AI video generation
Web search	~150	Multi-engine search (Tavily, web, GitHub, HuggingFace)
Memory search tools	~50	Keyword search (`grep`) + semantic search (vector retrieval)
Self-check tool	~50	System diagnostics report generation
Plugin system	~80	Plugin loading, runtime tool creation, MCP bridge

The @tool Decorator:

def tool(name, description, properties, required=None):
    def decorator(fn):
        _registry[name] = {
            "fn": fn,
            "definition": {
                "type": "function",
                "function": {
                    "name": name,
                    "description": description,
                    "parameters": {
                        "type": "object",
                        "properties": properties,
                        **({"required": required} if required else {}),
                    },
                },
            },
        }
        return fn
    return decorator

This single decorator handles both tool registration (for execution) and definition generation (for LLM function calling), keeping tool declaration co-located with implementation.

Multi-Engine Web Search:

The web_search tool implements intelligent source routing:

Query → Auto-detect source
         │
         ├── Contains "huggingface/hf model" → HuggingFace API
         ├── Contains "github.com/github repo" → GitHub API
         ├── Contains "verify/exist/plugin/mcp" → All engines
         └── Default → Dual-engine (Tavily + web)

Each engine:
├── Tavily: Advanced search with AI summary + relevance scores
├── Web: General search API with snippets
├── GitHub: Repo search (stars-sorted) + code search fallback
└── HuggingFace: Model search (downloads-sorted) with pipeline tags

Component 4: `memory.py` — Three-Layer Memory (13KB)

The memory system implementing the compress-deduplicate-retrieve pipeline:

Subcomponent	LOC (est.)	Purpose
Init + public API	~60	LanceDB connection, table creation, 4 public functions
Embedding	~30	OpenAI-compatible embedding API calls
Compression	~100	LLM-based structured memory extraction
Deduplication	~40	Cosine similarity against existing memories
Storage	~30	LanceDB vector table operations
Retrieval	~30	Vector search + result formatting

Memory Schema (LanceDB):

{
    "id": "uuid",           # Unique memory identifier
    "fact": "string",       # Complete factual statement
    "keywords": "[json]",   # Keyword array (JSON-serialized)
    "persons": "[json]",    # Person names involved
    "timestamp": "string",  # YYYY-MM-DD HH:MM or empty
    "topic": "string",      # Topic category
    "session_key": "string",# Source session
    "created_at": float,    # Unix timestamp
    "vector": [float*1024]  # 1024-dim embedding
}

Component 5: `mcp_client.py` — MCP Protocol Client (12KB)

A self-implemented MCP client with zero SDK dependency:

Subcomponent	LOC (est.)	Purpose
`MCPServer` class	~200	Single server lifecycle, JSON-RPC, tool discovery
Stdio transport	~60	Subprocess stdin/stdout communication with timeout
HTTP transport	~20	POST JSON-RPC to HTTP endpoint
Protocol methods	~40	`initialize`, `tools/list`, `tools/call`
Module-level API	~60	`init()`, `get_all_tool_defs()`, `execute()`, `reload()`, `shutdown()`

MCP Protocol Implementation:

The client implements only the three essential MCP methods:

initialize — handshake with protocol version and client info
tools/list — discover available tools on the server
tools/call — execute a tool with arguments

Tool Namespacing:

MCP tools are namespaced with double underscore: servername__toolname. This prevents name collisions between MCP servers and built-in tools.

Auto-Reconnect:

On ConnectionError or TimeoutError during tools/call, the client automatically: 1. Shuts down the current process 2. Starts a new subprocess 3. Re-runs initialize and tools/list 4. Retries the original call

Component 6: `scheduler.py` — Task Scheduling (7KB)

Persistent task scheduling with cron support:

Feature	Implementation
One-shot tasks	`delay_seconds` → trigger at `time.time() + delay`
Recurring tasks	`cron_expr` → croniter-based next-trigger calculation
One-shot cron	`cron_expr` + `once=True` → triggers once at next cron match
Persistence	`jobs.json` with atomic writes
Check interval	10-second polling loop (background thread)
Timezone	CST (UTC+8) aware — croniter uses local timezone datetime
Heartbeat	Log task status every 30 minutes
Failure handling	On task failure, sends notification via LLM chat

Scheduler → LLM Integration:

When a scheduled task triggers, it calls chat_fn(message, "scheduler") — sending the task's message to the LLM as if it were a user message in the "scheduler" session. The LLM can then use any tool (including message to notify the owner). This creates a powerful automation loop:

Cron trigger → scheduler._trigger()
    → chat_fn("Run self-check and send report to owner", "scheduler")
        → LLM invokes self_check tool
        → LLM invokes message tool with report
        → Owner receives daily diagnostic report

Component 7: `router.py` — Multi-Tenant Docker Router (17KB)

Docker-based multi-tenant isolation:

Feature	Implementation
Per-user containers	Auto-provision Docker container on first message
Health checks	Periodic container health verification
Request routing	Forward HTTP callbacks to appropriate container
Container lifecycle	Start, stop, restart, cleanup
Resource isolation	Docker-level resource limits per user

This component enables the system to serve multiple users with complete isolation — each user gets their own container instance with independent sessions, memory, and workspace.

11 Core Mechanisms (Detailed)

Mechanism 1: Three-Layer Memory Pipeline

The memory system is the most architecturally sophisticated component, implementing a three-stage pipeline inspired by human memory models:

Layer 1 — Session Memory (Short-Term):

┌─────────────────────────────────────┐
│ Session file: dm_USER_ID.json       │
│                                     │
│ Last 40 messages (JSON array)       │
│ ├── user messages                   │
│ ├── assistant messages              │
│ ├── tool call results               │
│ └── system context                  │
│                                     │
│ On overflow (>40 messages):         │
│   evicted = messages[:-40]          │
│   messages = messages[-40:]         │
│   compress_async(evicted)           │
│                                     │
│ On load:                            │
│   Skip to first user message        │
│   (clean truncation boundary)       │
└─────────────────────────────────────┘

Layer 2 — Compressed Memory (Long-Term):

┌──────────────────────────────────────────────────┐
│ Background compression thread                     │
│                                                   │
│ 1. Format evicted messages into dialogue text     │
│ 2. Send to LLM with COMPRESS_PROMPT              │
│ 3. LLM extracts structured facts:                │
│    {                                              │
│      "fact": "User prefers meetings at 10am",    │
│      "keywords": ["meeting", "schedule"],         │
│      "persons": ["User"],                         │
│      "timestamp": "2026-03-15 10:00",            │
│      "topic": "preferences"                       │
│    }                                              │
│ 4. Generate embeddings for each fact              │
│ 5. Deduplicate against existing memories          │
│    (cosine similarity > 0.92 → skip)             │
│ 6. Store in LanceDB vector table                  │
└──────────────────────────────────────────────────┘

Layer 3 — Retrieval (Active Recall):

┌──────────────────────────────────────────────────┐
│ On every user message:                            │
│                                                   │
│ 1. Embed user message text                        │
│ 2. Vector search in LanceDB (top-K=5)           │
│ 3. Filter out seed data and low-quality results  │
│ 4. Format as "[Relevant Memories]" block         │
│ 5. Inject into system prompt                      │
│                                                   │
│ Result (injected before LLM call):               │
│ [Relevant Memories]                               │
│ - User prefers meetings at 10am (2026-03-15)     │
│ - Client deadline is April 15 (2026-03-10)       │
│ - Project uses React + TypeScript (2026-03-01)   │
└──────────────────────────────────────────────────┘

Compression Prompt Engineering:

The COMPRESS_PROMPT is carefully designed to extract only long-term-valuable information:

Rules:
- Only extract information with long-term value
  (preferences, plans, contacts, decisions, facts)
- Skip chitchat, greetings, repeated confirmations,
  pure tool call results
- Replace "he/she/I" with specific names
- Replace "tomorrow/next week" with specific dates
- If nothing worth remembering, return empty array []

This filtering is critical — without it, the memory would fill with low-value conversational noise, degrading retrieval quality.

Deduplication via Cosine Similarity:

Before storing a new memory, it is compared against the most similar existing memory using vector cosine similarity. If similarity exceeds 0.92, the new memory is skipped. This prevents near-duplicate accumulation:

def _cosine_similarity(a, b):
    dot = sum(x * y for x, y in zip(a, b))
    norm_a = sum(x * x for x in a) ** 0.5
    norm_b = sum(x * x for x in b) ** 0.5
    if norm_a == 0 or norm_b == 0:
        return 0
    return dot / (norm_a * norm_b)

Note: The implementation uses a pure-Python dot product calculation rather than NumPy, consistent with the minimal-dependency philosophy.

Mechanism 2: Runtime Tool Creation (Self-Evolution)

The self-evolution mechanism allows the agent to create new tools at runtime:

Agent receives request for capability it doesn't have
    │
    ▼
Agent uses create_tool to write a new Python function
    │
    ▼
Function is written to plugins/ directory as .py file
    │
    ▼
File is loaded via exec() with @tool decorator available
    │
    ▼
New tool registered in _registry with OpenAI function schema
    │
    ▼
Subsequent LLM calls include new tool in tool_defs
    │
    ▼
Agent can now use the new tool in conversations
    │
    ▼
Tool persists across restarts (loaded from plugins/ on startup)

Plugin Loading Mechanism:

def _exec_plugin(code, source=" "):
    exec(compile(code, source, "exec"), {
        "__builtins__": __builtins__,
        "tool": tool,   # The @tool decorator
        "log": log,     # Logger
    })

The exec() call provides a controlled environment with access to: - __builtins__ — Python built-ins - tool — the decorator for tool registration - log — the application logger

Security Considerations:

This is the most security-sensitive component. The agent can write and execute arbitrary Python code. Mitigations include: - Single-user mode enforcement (OWNER_IDS whitelist) - Workspace-relative path resolution - Per-tool error handling (tool crashes don't crash the system) - No automatic execution on untrusted input (requires LLM decision)

However, there is no sandboxing, code review, or capability restriction on created tools — a genuinely powerful but potentially dangerous feature.

Mechanism 3: MCP Protocol Bridge

The MCP client bridges external MCP servers into the agent's tool ecosystem:

Agent tool_defs = built-in tools + plugin tools + MCP tools
                           │
                    ┌──────▼──────┐
                    │ LLM decides │
                    │ which tool  │
                    │ to call     │
                    └──────┬──────┘
                           │
              ┌────────────┼────────────┐
              │            │            │
         Built-in     Plugin tool    MCP tool
         tool          from          (server__name)
         (exec,        plugins/       │
         message)      dir            │
              │            │     ┌────▼────┐
              │            │     │ Parse   │
              │            │     │ name    │
              │            │     │ split   │
              │            │     │ on "__" │
              │            │     └────┬────┘
              │            │          │
              │            │     ┌────▼────┐
              │            │     │ Route   │
              │            │     │ to MCP  │
              │            │     │ server  │
              │            │     └────┬────┘
              │            │          │
              │            │     JSON-RPC
              │            │     tools/call
              │            │          │
              │            │     Response
              │            │     content
              ▼            ▼          ▼
         Execute      Execute    Parse MCP
         Python fn    Python fn  response
                                 (text
                                  concat)

MCP to OpenAI Schema Conversion:

# MCP format
{
    "name": "search_notes",
    "description": "Search through notes",
    "inputSchema": {"type": "object", "properties": {...}}
}

# Converted to OpenAI format
{
    "type": "function",
    "function": {
        "name": "notes_server__search_notes",  # Namespaced
        "description": "Search through notes",
        "parameters": {"type": "object", "properties": {...}}
    }
}

The inputSchema from MCP maps directly to parameters in OpenAI format — same JSON Schema structure, just renamed.

Hot-Reload Support:

The reload_mcp tool allows runtime reconfiguration:

def reload(config):
    old_names = set(_servers.keys())
    shutdown()          # Close all existing connections
    init(config)        # Connect with new config
    new_names = set(_servers.keys())
    added = new_names - old_names
    removed = old_names - new_names
    return added, removed, len(_servers)

Mechanism 4: Self-Repair Diagnostics

The self-check mechanism runs daily (via scheduled task) and generates comprehensive system health reports:

Daily self-check scheduled task
    │
    ▼
tool_self_check() collects:
├── Session activity (today's active sessions, message counts)
├── System health (disk, memory, processes)
├── Error log analysis (last 24h from application log)
├── Scheduler status (active jobs, next trigger times)
├── Memory system status (memory count, storage size)
└── Session health diagnostics
    ├── Empty sessions detection
    ├── High tool_call ratio detection
    └── Potential issue flagging
    │
    ▼
Report formatted as structured text
    │
    ▼
LLM analyzes report + decides on actions
    │
    ▼
message tool sends summary to owner

Session Health Detection:

The self-check scans all sessions for potential issues: - Empty sessions — sessions with no messages (possibly corrupted) - High tool_call ratio — sessions where tool calls dominate (agent may be stuck in a loop) - Stale sessions — sessions not updated recently

Mechanism 5: Debounce and Message Aggregation

The debounce system prevents rapid-fire messages from creating multiple LLM calls:

# Per-sender buffer with thread-safe access
_debounce_buffers = {}  # sender_id -> [{"text": str, "images": [path]}]
_debounce_timers = {}   # sender_id -> threading.Timer
_debounce_lock = threading.Lock()

Timing diagram:

Time ──────────────────────────────────────────►
     │         │    │              │
     msg1      msg2 msg3          flush
     │         │    │              │
     ├─ timer ─┤    │              │
     │  (3s)   │    │              │
     │         ├────┤              │
     │         timer│              │
     │         (3s) │              │
     │              ├── timer(3s) ─┤
     │              │              │
     └──────────────┴──────────────┘
     Buffer: [msg1, msg2, msg3]
                                   └─► Flush:
                                       merge texts
                                       collect images
                                       single LLM call

This is particularly important for messaging platforms where users often send messages in rapid succession (split thoughts across multiple messages).

Mechanism 6: Cross-Session Context Bridging

A subtle but important design pattern that solves the problem of context fragmentation across sessions:

Problem: The scheduler runs tasks in the scheduler session, but the user chats in the dm_USER_ID session. When the scheduler sends a self-check report, the user may respond in their DM, but the DM session has no context about what was sent.

Solution:

def _get_recent_scheduler_context():
    # 1. Read scheduler session file
    # 2. Check freshness (2-hour window)
    # 3. Find last message tool call content
    # 4. Truncate to 800 chars
    # 5. Format with timestamp
    # 6. Return for injection into DM system prompt

The DM system prompt includes recent scheduler output:

[Agent recently sent via scheduled task (09:00)]
Today's self-check report:
- Sessions: 15 active
- Errors: 2 warnings
- Memory: 142 facts stored
...(truncated)

This is only injected when: - The scheduler session file was modified within the last 2 hours - The current session is NOT the scheduler session (prevents circular injection) - The scheduler actually sent a message (not just processed internally)

12 Programming Language

Implementation Language: Python (100%)

The entire system is written in Python with a deliberate minimal-dependency philosophy:

Standard Library Usage:

Module	Usage
`http.server`, `socketserver`	HTTP server with `ThreadingMixIn`
`json`	All serialization (config, sessions, tools, API calls)
`os`, `shutil`	File operations, directory management
`subprocess`	Shell execution (`exec` tool), ffmpeg, git
`threading`	Multi-threaded architecture (locks, timers, daemon threads)
`urllib.request`, `urllib.parse`, `urllib.error`	All HTTP client operations
`base64`	Image encoding for multimodal LLM calls
`hashlib`, `hmac`	HMAC-SHA256 for ASR authentication
`logging`	Structured application logging
`time`, `datetime`	Timestamps, timezone handling (CST/UTC)
`uuid`	Memory record identifiers
`struct`	Binary data handling (for media processing)
`ssl`	SSL configuration for WebSocket ASR

External Dependencies (3 packages):

Package	Purpose	Size
`croniter`	Cron expression parsing for scheduler	Lightweight
`lancedb`	Embedded vector database for memory	Moderate (includes Lance format)
`websocket-client`	WebSocket communication for ASR	Lightweight
`pilk` (optional)	WeChat SILK audio format decoding	Lightweight

Code Style

The codebase uses a distinctive style that prioritizes readability and debuggability:

Single-file modules — each file is a complete, self-contained module
Module-level state — global variables with explicit init functions (no classes for state management)
%-formatting over f-strings in many places — consistent with C/Unix tradition
Explicit error handling — try/except blocks at every I/O boundary
Comments in mixed languages — English for public API, Chinese for implementation details
No type annotations — consistent with rapid prototyping approach
No abstract base classes — concrete implementations only

Architecture Anti-Patterns (Intentional Tradeoffs)

The codebase makes several deliberate tradeoffs:

Pattern	Convention	7/24 Office Choice	Rationale
HTTP client	`requests` or `httpx`	`urllib.request`	Zero dependency
Async I/O	`asyncio`	`threading`	Simpler mental model
Type safety	Type annotations	None	Rapid prototyping
Dependency injection	Constructor injection	Global state + init()	Fewer abstraction layers
Testing	pytest suite	Manual testing	Production as test environment
Configuration	Pydantic/dataclass	Raw dict from JSON	Minimal boilerplate

These are reasonable tradeoffs for a solo-developer, production-running system where the author is the primary user and maintainer.

13 Memory Management

Memory Architecture Overview

7/24 Office implements a sophisticated three-layer memory system that maps roughly to human memory models:

┌─────────────────────────────────────────────────────┐
│                Human Analogy                         │
│                                                     │
│  Working Memory    ←→   Session (40 messages)       │
│  Episodic Memory   ←→   Compressed (LLM-extracted)  │
│  Semantic Memory   ←→   Retrieved (vector search)   │
└─────────────────────────────────────────────────────┘

Layer 1: Session Memory (Working Memory)

Storage: JSON files in sessions/ directory, one per session key.

Capacity: Last 40 messages per session.

Overflow handling:

if len(messages) > MAX_SESSION_MESSAGES:
    evicted = messages[:-MAX_SESSION_MESSAGES]
    messages = messages[-MAX_SESSION_MESSAGES:]
    mem_mod.compress_async(evicted, session_key)

Truncation boundary cleanup:

After truncation, orphan messages may appear at the start (tool results without matching assistant messages). The system skips to the first user or system message:

while messages and messages[0].get("role") not in ("user", "system"):
    messages.pop(0)

Image handling:

Before saving, base64 image URLs are replaced with [image] text markers:

# Before: {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,/9j/..."}}
# After:  {"type": "text", "text": "[image]"}

This prevents: 1. Session files growing unboundedly with large base64 strings 2. API errors from LLMs that don't accept image_url in history messages

Layer 2: Compressed Memory (Episodic/Long-Term)

Trigger: Automatic, whenever session messages exceed 40 and overflow occurs.

Process:

Filter messages — keep only user and assistant text messages (skip tool calls, empty content)
Format as dialogue — "User: ...\nAssistant: ..." format
LLM extraction — send dialogue to LLM with structured extraction prompt
Parse JSON output — extract array of {fact, keywords, persons, timestamp, topic} objects
Generate embeddings — embed each fact using OpenAI-compatible embedding API
Deduplicate — compare each new fact against existing memories via cosine similarity
Store — insert non-duplicate memories into LanceDB vector table

LLM Output Parsing (Robust):

The parser handles common LLM output variations:

text = content.strip()
if text.startswith("```"):
    # Remove markdown code fences
    lines = [l for l in text.split("\n") if not l.strip().startswith("```")]
    text = "\n".join(lines)

try:
    result = json.loads(text)
except json.JSONDecodeError:
    # Fallback: find content between [ and ]
    start = text.find("[")
    end = text.rfind("]")
    if start >= 0 and end > start:
        result = json.loads(text[start:end + 1])

Layer 3: Retrieved Memory (Active Recall)

Trigger: Every user message, before LLM call.

Process:

Embed user message text
Vector search in LanceDB (default top-K=5)
Filter out seed data and low-quality results
Format as [Relevant Memories] block
Append to system prompt

Zero-latency cache:

For hardware/voice channels where latency is critical, the system maintains a pre-computed memory summary cache:

_context_cache = {}  # session_key -> str

def get_cached_context(session_key):
    return _context_cache.get(session_key, "")

Dual Memory Search Interface

The system provides two memory search tools with complementary strengths:

Tool	Method	Use Case
`search_memory`	Keyword search (`grep -r -i`) in workspace/memory/	Exact term matching, file-level search
`recall`	Vector semantic search in LanceDB	Meaning-based recall, fuzzy matching

The keyword search is scoped by scope parameter: - all — search all memory files - long — search only MEMORY.md (persistent long-term document) - daily — search only daily log files (matching 2*.md pattern)

Memory System Comparison

Aspect	LangChain Memory	7/24 Office Memory
Architecture	Pluggable abstractions	Fixed three-layer pipeline
Dependencies	LangChain + vector store adapter	LanceDB only
Compression	User-implemented	Built-in LLM extraction
Deduplication	User-implemented	Built-in (cosine 0.92)
LOC	~500+ (with abstractions)	~300
Configuration	Code-level setup	JSON config
Persistence	Depends on adapter	LanceDB files + JSON sessions

14 Continued Learning

Self-Evolution Through Runtime Tool Creation

The most distinctive learning mechanism in 7/24 Office is runtime tool creation — the agent genuinely extends its own capabilities based on encountered tasks.

Evolution loop:

Conversation 1: "Can you check Bitcoin price?"
    → Agent has no crypto tool
    → Agent creates crypto_price tool via create_tool
    → Tool saved to plugins/crypto_price.py

Conversation 2: "What's the price of ETH?"
    → Agent now has crypto_price tool
    → Executes it directly
    → No tool creation needed

System restart:
    → plugins/ directory scanned
    → crypto_price.py loaded via exec()
    → Tool available immediately

This is a genuine form of open-ended self-evolution — the agent's capability space grows monotonically based on user interactions. Unlike fine-tuning (which requires retraining), this is immediate and persistent.

Limitations:

No tool improvement — existing tools are not automatically refined or optimized
No tool composition — new tools don't automatically compose with existing ones
Quality depends on LLM — the quality of created tools depends on the LLM's code generation capability
No sandboxing — created tools run with full Python permissions
No versioning — no history of tool modifications

Memory-Based Behavioral Adaptation

The three-layer memory system provides implicit behavioral learning:

Preference learning — compressed memories include user preferences ("User prefers meetings at 10am"), which are retrieved and injected into future conversations
Context accumulation — facts about projects, people, and deadlines persist across sessions, allowing the agent to maintain long-term context
Pattern recognition — over time, the memory accumulates patterns that influence the agent's responses (e.g., remembering that a particular approach worked for a type of problem)

Personality Evolution via Markdown Files

The SOUL.md, AGENT.md, and USER.md files provide a manual learning mechanism:

SOUL.md — can be updated to refine agent personality and communication style
AGENT.md — can be updated with new troubleshooting procedures based on encountered issues
USER.md — can be updated with new user context as preferences change

These files are read on every conversation turn, so changes take effect immediately.

Self-Check as Learning Signal

The daily self-check report provides a learning signal:

Error pattern detection — recurring errors in the log suggest systematic issues
Session health metrics — high tool_call ratios may indicate inefficient tool use patterns
Memory growth tracking — memory count trends indicate information accumulation rate

However, the system does not automatically act on these signals — the self-check report is sent to the owner for human review.

Comparison with Other Learning Approaches

System	Learning Mechanism	Persistence	Scope
7/24 Office	Runtime tool creation + memory compression	Disk (plugins/ + LanceDB)	Single instance
LangChain agents	No built-in learning	Session-only (default)	Per-session
AutoGPT	Task-based memory	File system	Per-task
OpenEvolve	Evolutionary program improvement	Program database	Per-experiment
Ouro Loop	Reflective log + BOUND evolution	JSONL + CLAUDE.md	Per-project
Devin	Proprietary session memory	Cloud	Per-workspace

7/24 Office's runtime tool creation is unique in that it provides permanent capability expansion rather than just information retention.

15 Applications

Application 1: 24/7 Personal AI Assistant

The primary use case — a continuously running AI agent that handles daily tasks:

Task Type	How It's Handled
Scheduling	"Remind me to call client at 3pm" → creates cron task
File management	"Save this as a report" → writes to workspace
Research	"Find recent papers on RAG" → multi-engine web search
Media processing	"Trim this video from 0:30 to 1:20" → ffmpeg operation
Memory	"What did we discuss about the project last week?" → vector recall
System status	Daily self-check reports sent automatically

Application 2: Edge-Deployed AI Agent

Designed for deployment on embedded hardware:

Target	Jetson Orin Nano
RAM	8GB (agent uses <2GB)
CPU	ARM64
GPU	Available for local inference
Storage	Local SSD for LanceDB + sessions
Connectivity	WiFi/Ethernet for API calls

This enables AI agent deployment in scenarios where cloud hosting is undesirable (privacy, latency, cost) but LLM API access is available.

Application 3: Self-Evolving Tool Platform

The runtime tool creation enables organic capability growth:

Month 1: 26 built-in tools
Month 2: 26 + 5 custom tools (crypto prices, weather, RSS feeds, ...)
Month 3: 26 + 12 custom tools (project-specific automation, ...)
Month 6: 26 + 30+ custom tools (full personal automation suite)

Each tool is a single Python function with @tool decorator — no framework boilerplate, no configuration files, no deployment pipelines.

Application 4: MCP Tool Aggregator

By connecting multiple MCP servers, 7/24 Office becomes a unified interface to diverse tool ecosystems:

{
  "mcp_servers": {
    "filesystem": {
      "transport": "stdio",
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-filesystem"]
    },
    "github": {
      "transport": "stdio",
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-github"]
    },
    "database": {
      "transport": "http",
      "url": "http://localhost:3000/mcp"
    }
  }
}

All MCP tools appear alongside built-in tools in the LLM's tool definitions, creating a unified agent that can interact with file systems, APIs, databases, and custom services through a single conversation interface.

Application 5: Automated Monitoring and Notification

Combining the scheduler with the tool use loop creates a monitoring system:

Scheduled task: "Every hour, check server status and notify if issues"
    │
    ▼
Scheduler triggers → LLM receives message
    │
    ▼
LLM uses exec tool: "curl -s http://server/health"
    │
    ▼
LLM analyzes response → decides if notification needed
    │
    ▼
If issue: LLM uses message tool to alert owner
If OK: LLM does nothing (no notification spam)

Application 6: Multi-Tenant Production Service

Via router.py, the system can serve multiple users with complete isolation:

User A message → Router → Container A (own session, memory, workspace)
User B message → Router → Container B (own session, memory, workspace)
User C message → Router → Container C (auto-provisioned on first message)

Each container runs its own instance of the agent with independent: - Session files - Memory database - Workspace - Plugins - Scheduled tasks

System	Architecture	Dependencies	Memory	Self-Evolution	Deployment
7/24 Office	8 files, ~3.5K LOC	3 packages	3-layer (session/compressed/retrieval)	Runtime tool creation	Edge/cloud, Docker multi-tenant
LangChain	Framework (100K+ LOC)	100+ packages	Pluggable adapters	No built-in	Cloud
AutoGPT	Agent framework	Many	File-based	Task-based learning	Cloud
CrewAI	Multi-agent framework	LangChain +	Shared memory	No built-in	Cloud
Ouro Loop	Methodology framework (3 files)	0	Reflective log (30 entries)	BOUND evolution	Any agent
Devin	Proprietary full agent	Proprietary	Cloud-based	Proprietary	Cloud only

7/24 Office occupies a unique niche: it is a complete, production-running AI agent system that is small enough to be fully understood by a single developer, yet capable enough to run 24/7 with self-repair, memory, scheduling, and tool evolution. Its zero-framework approach is a philosophical statement as much as an engineering choice — proving that agent systems don't need massive frameworks to be production-viable.

Open Questions and Future Directions

Tool quality assurance: How to ensure runtime-created tools are correct and safe? Could the self-check system validate plugin health?
Multi-agent coordination: The multi-tenant router isolates users, but could agents collaborate across containers?
Memory optimization: The compression prompt could be improved to extract more structured information (relations, causality, temporal sequences).
Offline LLM support: The edge deployment story would be strengthened by support for local LLMs (Ollama, llama.cpp) — currently the system requires an API endpoint.
Observability: The system logs extensively but lacks structured metrics (Prometheus, OpenTelemetry) for production monitoring.
Security hardening: The exec tool and create_tool feature need sandboxing for multi-user deployments. The current OWNER_IDS whitelist is insufficient for production multi-tenant use.
Internationalization: Adapting the WeChat Work integration to Slack/Discord/Telegram would significantly expand the potential user base.