← Back to Index
7/24 Office
Self-Evolving AI Agent System — 26 Tools, 3500 Lines Pure Python, MCP/Skill Plugins, Three-Layer Memory, Self-Repair, 24/7 Production Organization: wangziqi06 (independent developer) Published: March 2026 Type: repo Report Type: PhD-Level Technical Analysis Report Date: April 2026
Table of Contents
- Full Title and Attribution
- Authors and Team
- Core Contribution
- Supported Solutions
- LLM Integration
- Key Results
- Reproducibility
- Compute and API Costs
- Architecture Solution
- Component Breakdown
- Core Mechanisms (Detailed)
- Programming Language
- Memory Management
- Continued Learning
- Applications
1 Full Title and Attribution
Full Title: 7/24 Office — Self-Evolving AI Agent System
Repository: github.com/wangziqi06/724-office
License: MIT
Status: Production-running, actively developed (March–April 2026)
Stars: 1,136 (as of April 2026)
Languages: Python (100%)
Size: ~3,500 lines of pure Python across 8 files
Dependencies: 3 packages: croniter (cron parsing), lancedb (vector storage), websocket-client (ASR)
Design Philosophy:
Zero framework dependency. Every line is visible and debuggable. No LangChain, no LlamaIndex, no CrewAI — just the standard library + 3 small packages.
The name "7/24 Office" (七二四办公室) references 24/7 availability — the system is designed to run continuously as a personal AI agent that handles scheduling, file management, web search, video processing, memory recall, and self-diagnostics autonomously.
2 Authors and Team
7/24 Office is developed by wangziqi06, an independent developer operating as a solo author with AI co-development tools. The project was built in under 3 months and is running in production 24/7.
The author explicitly positions 7/24 Office as a counter-thesis to framework-heavy agent architectures: "No LangChain, no LlamaIndex, no CrewAI — just the standard library + 3 small packages." The codebase is deliberately compact (~3,500 lines) to remain fully comprehensible by a single developer, with every line visible and debuggable.
The project appears to originate from the Chinese developer ecosystem, with WeChat Work (Enterprise WeChat) as the primary messaging integration and iFlytek-compatible ASR for voice recognition. The README and code comments contain bilingual content (Chinese and English).
Development Methodology
The project was built "solo with AI co-development tools in under 3 months." This positions 7/24 Office as both a product of and a testament to AI-assisted development — the system itself is an AI agent, and it was built with AI agents.
3 Core Contribution
Key Contribution: 7/24 Office demonstrates that a production-grade, self-evolving AI agent system can be built in ~3,500 lines of pure Python with zero framework dependencies, featuring runtime tool creation, three-layer memory, MCP protocol integration, self-repair diagnostics, and 24/7 autonomous operation — proving that the complexity typically associated with agent frameworks (LangChain, LlamaIndex, CrewAI) is largely unnecessary.
What 7/24 Office Provides
- Tool Use Loop — OpenAI-compatible function calling with automatic retry, up to 20 iterations per conversation
- Three-Layer Memory — Session history (short-term) + LLM-compressed long-term memory + LanceDB vector retrieval (active recall)
- MCP Protocol Client — Self-implemented JSON-RPC (no MCP SDK), connects external MCP servers via stdio or HTTP transport
- Runtime Tool Creation — The agent can write, save, and load new Python tools at runtime via
create_tool - Self-Repair — Daily self-check, session health diagnostics, error log analysis, auto-notification on failure
- Cron Scheduling — One-shot and recurring tasks, persistent across restarts, timezone-aware
- Multi-Tenant Router — Docker-based auto-provisioning, one container per user, health-checked
- Multimodal — Image/video/file/voice/link handling, ASR (speech-to-text), vision via base64
- Web Search — Multi-engine (Tavily, web search, GitHub, HuggingFace) with automatic source routing
- Video Processing — Trim, add BGM, AI video generation via ffmpeg + API, exposed as tools
Key Innovation: Self-Evolution Through Runtime Tool Creation
The most architecturally significant feature is runtime tool creation — the agent can extend its own capabilities by writing new Python tools that persist across restarts. This creates a genuine self-evolution loop:
User request for new capability
│
▼
Agent writes Python function
with @tool decorator
│
▼
Function saved to plugins/ directory
│
▼
Function loaded via exec() and
registered in tool registry
│
▼
New tool available to LLM
in subsequent conversations
│
▼
Agent can now handle requests
that were previously impossible
This is a meaningful implementation of open-ended tool evolution — the agent's capability space grows over time based on the tasks it encounters.
Architectural Philosophy
The project adheres to five design principles:
| Principle | Implementation |
|---|---|
| Zero framework dependency | No LangChain/LlamaIndex/CrewAI; stdlib + 3 packages |
| Single-file tools | Adding a capability = adding one function with @tool decorator |
| Edge-deployable | Targets Jetson Orin Nano (8GB RAM, ARM64 + GPU); RAM budget <2GB |
| Self-evolving | Runtime tool creation, self-diagnostics, auto-notification |
| Offline-capable | Core works without cloud APIs (except LLM itself); local embeddings supported |
4 Supported Solutions
| Solution Type | Support Level | Tool(s) Used |
|---|---|---|
| 24/7 personal AI assistant | Primary use case | Full system |
| Task scheduling and automation | Built-in | schedule, list_schedules, remove_schedule |
| File management | Built-in | read_file, write_file, edit_file, list_files |
| Web research | Built-in | web_search (multi-engine: Tavily, web, GitHub, HuggingFace) |
| Video processing | Built-in | trim_video, add_bgm, generate_video |
| Memory and recall | Built-in | search_memory, recall (vector semantic search) |
| System diagnostics | Built-in | self_check, diagnose |
| Media sending | Built-in | send_image, send_file, send_video, send_link |
| Shell execution | Built-in | exec (with timeout, default 60s, max 300s) |
| MCP tool extension | Plugin system | reload_mcp + any MCP-compatible server |
| Custom tool creation | Self-evolution | create_tool, list_custom_tools, remove_tool |
| Voice interaction | Built-in | WebSocket ASR pipeline (iFlytek-compatible) |
| Multi-tenant deployment | Built-in | Docker-based per-user isolation via router.py |
Tool Categorization
| Category | Count | Tools |
|---|---|---|
| Core | 2 | exec, message |
| Files | 4 | read_file, write_file, edit_file, list_files |
| Scheduling | 3 | schedule, list_schedules, remove_schedule |
| Media Send | 4 | send_image, send_file, send_video, send_link |
| Video | 3 | trim_video, add_bgm, generate_video |
| Search | 1 | web_search (multi-engine, auto-routing) |
| Memory | 2 | search_memory, recall |
| Diagnostics | 2 | self_check, diagnose |
| Plugins | 3 | create_tool, list_custom_tools, remove_tool |
| MCP | 1 | reload_mcp |
| Total | 26 | (includes tools registered by MCP servers at runtime) |
5 LLM Integration
OpenAI-Compatible Function Calling
7/24 Office uses the OpenAI chat completions API format with function calling. It is designed to work with any OpenAI-compatible API provider.
Provider Configuration:
{
"models": {
"default": "deepseek-chat",
"providers": {
"deepseek-chat": {
"api_base": "https://api.deepseek.com/v1",
"api_key": "...",
"model": "deepseek-chat",
"max_tokens": 8192
},
"openai-gpt4": {
"api_base": "https://api.openai.com/v1",
"api_key": "...",
"model": "gpt-4o",
"max_tokens": 8192
}
}
}
}
Multi-Provider Architecture:
The system supports multiple LLM providers and routes different tasks to different models:
| Task | Model Selection Strategy |
|---|---|
| Main conversation | default provider (configurable) |
| Memory compression | Prefers cheaper model (e.g., deepseek-chat) to avoid compatibility issues with thinking models |
| Embeddings | Dedicated embedding API (e.g., text-embedding-3-small, 1024 dimensions) |
This is a practical cost optimization — memory compression is a background task that doesn't need the most capable model, so it routes to cheaper providers.
Tool Use Loop
The core interaction pattern is a synchronous tool use loop with up to 20 iterations:
User message
│
▼
Build system prompt (SOUL.md + AGENT.md + USER.md + time)
│
▼
Inject retrieved memories into system prompt
│
▼
Inject cross-session scheduler context
│
▼
┌──► Call LLM with messages + tool definitions
│ │
│ ├── No tool_calls → Return text response
│ │
│ └── Has tool_calls → Execute each tool
│ │
│ ▼
│ Append tool results to messages
│ │
└───────────────┘ (up to 20 iterations)
│
▼
Save session (with overflow → memory compression)
Key implementation details:
- Thread safety: Each session has its own lock (
threading.Lock). Concurrent messages to the same session are serialized. - Performance tracking: Every chat call logs
prep_time,llm_total_time,tool_count, andtotal_timein milliseconds. - Error handling: LLM API errors are caught and return user-friendly error messages. Tool execution errors are caught per-tool and returned as
[error]strings to the LLM. - Image stripping: Before saving sessions, base64 image URLs are replaced with
[image]text markers to prevent API errors in history replay and reduce storage.
Raw HTTP via urllib
A notable implementation choice: the system uses Python's urllib.request directly instead of httpx, requests, or any HTTP client library. All API calls — LLM, embedding, search, video generation — use raw urllib.request.Request with manual JSON serialization.
req = urllib.request.Request(url, data=data, headers=headers)
with urllib.request.urlopen(req, timeout=timeout) as resp:
return json.loads(resp.read())
This is consistent with the zero-dependency philosophy but trades developer ergonomics for minimal footprint.
Reasoning Content Preservation
The system preserves reasoning_content from models that support chain-of-thought (e.g., DeepSeek's reasoning models):
def _serialize_assistant_msg(msg_data):
result = {"role": "assistant"}
result["content"] = msg_data.get("content") or None
reasoning = msg_data.get("reasoning_content")
if reasoning:
result["reasoning_content"] = reasoning
# ...
If the model returns reasoning content, it's preserved in the session history. If the model uses tool calls but doesn't return reasoning content, a placeholder "ok" is inserted to maintain API compatibility with certain providers.
6 Key Results
Production Metrics
| Metric | Value |
|---|---|
| Codebase size | ~3,500 lines across 8 files |
| Built-in tools | 26 |
| Framework dependencies | 0 (LangChain, LlamaIndex, CrewAI) |
| Package dependencies | 3 (croniter, lancedb, websocket-client) |
| Production uptime target | 24/7 |
| Development time | <3 months (solo developer + AI co-development) |
| GitHub stars | 1,136 (April 2026) |
| Max tool loop iterations | 20 per conversation |
| Session message limit | 40 (overflow triggers memory compression) |
| Memory deduplication threshold | 0.92 cosine similarity |
Self-Diagnostics Report Format
The self_check tool generates comprehensive system health reports:
| Diagnostic Area | Metrics Collected |
|---|---|
| Session activity | Active sessions today, total user/assistant/tool_call counts |
| System health | Disk usage, memory usage, process status |
| Error logs | Last 24h errors from application log |
| Scheduled tasks | Active job count, next trigger times |
| Memory system | Total memories stored, storage size |
| Session health | Empty sessions, high tool_call ratios, potential issues |
File Size Breakdown
| File | Size | Purpose |
|---|---|---|
tools.py |
48KB | Tool registry + 26 tool implementations + plugin system + MCP bridge |
xiaowang.py |
22KB | Entry point, HTTP server, callbacks, debounce, ASR pipeline |
router.py |
17KB | Multi-tenant Docker routing, container lifecycle |
llm.py |
14KB | LLM API calls, tool use loop, session management |
memory.py |
13KB | Three-layer memory: compress, deduplicate, retrieve |
mcp_client.py |
12KB | MCP protocol client (JSON-RPC, stdio/HTTP transport) |
scheduler.py |
7KB | Cron + one-shot scheduling, persistent jobs |
self_check_tool.py |
2KB | Self-check diagnostic report generation |
| Total | ~135KB | ~3,500 lines |
7 Reproducibility
Installation
git clone https://github.com/wangziqi06/724-office.git
cd 724-office
cp config.example.json config.json
# Edit config.json with your API keys
pip install croniter lancedb websocket-client
# Optional: pilk (for WeChat silk audio decoding)
mkdir -p workspace/memory workspace/files
python3 xiaowang.py
Configuration Requirements
| Requirement | Necessity | Notes |
|---|---|---|
| OpenAI-compatible LLM API | Required | DeepSeek, OpenAI, Anthropic, or any compatible provider |
| Embedding API | Required (for memory) | OpenAI text-embedding-3-small or compatible |
| Messaging platform | Required (for chat) | WeChat Work credentials (token, guid, api_url) |
| Tavily API key | Optional | For high-quality web search |
| Search API key | Optional | For general web search |
| ASR credentials | Optional | For voice message transcription |
| Video generation API | Optional | For AI video generation |
| MCP servers | Optional | External tool servers (stdio or HTTP) |
Personality Configuration
The system uses three optional Markdown files for personality and behavior:
| File | Purpose | Content Type |
|---|---|---|
SOUL.md |
Agent personality and behavior rules | Character definition, communication style |
AGENT.md |
Operational procedures and troubleshooting | How-to guides, error handling procedures |
USER.md |
User preferences and context | Personal info, scheduling preferences |
These files are read at every conversation turn and injected into the system prompt, allowing the agent's behavior to be customized without code changes.
Reproducibility Assessment
| Factor | Assessment |
|---|---|
| Code availability | Fully open-source, MIT license |
| Minimal dependencies | Only 3 packages, all pip-installable |
| Platform dependency | WeChat Work integration is China-specific; would need adaptation for Slack/Discord/Telegram |
| LLM dependency | Requires an OpenAI-compatible API; behavior varies by model |
| Hardware target | Designed for edge deployment (Jetson Orin Nano, 8GB RAM) |
| Configuration complexity | Multiple API keys required; config.example.json provides template |
| Documentation | README covers architecture and setup; no extensive docs site |
Limitations on Reproducibility
- Messaging platform coupling: The callback handler (
handle_callback) is tightly coupled to WeChat Work's message format (cmd codes, msgTypes, fileId/fileAeskey fields). Adapting to other platforms requires rewritingxiaowang.pycallback handling. - Chinese ecosystem tools: ASR uses iFlytek-compatible WebSocket protocol; video generation uses a specific API format. These may not be available outside China.
- No test suite: Unlike Ouro Loop's 507 tests, 7/24 Office has no automated tests. Production validation relies on manual testing and 24/7 monitoring.
- Security surface: The
exectool executes arbitrary shell commands. Thecreate_toolfeature usesexec()to load arbitrary Python code. Production deployment requires trust boundaries.
8 Compute and API Costs
Runtime Resource Requirements
| Resource | Requirement |
|---|---|
| RAM target | <2GB (designed for edge deployment) |
| CPU | Minimal (event-driven, I/O-bound) |
| GPU | Optional (for local inference or embedding) |
| Disk | LanceDB storage + session JSON files + media files |
| Network | Required for LLM API calls; optional for other features |
| Hardware target | Jetson Orin Nano (8GB RAM, ARM64 + GPU) |
Per-Conversation LLM Costs
| Component | Estimated Tokens | Cost (DeepSeek Chat) | Cost (GPT-4o) |
|---|---|---|---|
| System prompt (SOUL + AGENT + USER + time) | ~1,000-3,000 | ~$0.001 | ~$0.01 |
| Memory injection (top-K retrieved) | ~200-1,000 | ~$0.001 | ~$0.005 |
| Tool definitions (26 tools) | ~3,000-4,000 | ~$0.003 | ~$0.02 |
| User message + conversation history | ~500-5,000 | ~$0.005 | ~$0.03 |
| Per-iteration LLM response | ~200-2,000 | ~$0.002 | ~$0.01 |
| Tool loop (1-5 iterations typical) | ~2,000-10,000 | ~$0.01 | ~$0.05 |
| Total per conversation | ~7,000-25,000 | ~$0.02 | ~$0.13 |
Background Processing Costs
| Process | Frequency | Estimated Cost |
|---|---|---|
| Memory compression | On session overflow (>40 messages) | ~$0.01-$0.05 per compression |
| Embedding generation | Per memory fact + per user query | ~$0.001 per call |
| Self-check report | Daily (scheduled task) | ~$0.05-$0.10 per report |
| Scheduled task execution | Per cron trigger | ~$0.02-$0.10 per task |
Cost Optimization Strategies
The system implements several cost-reducing patterns:
- Cheaper model for compression: Memory compression explicitly prefers
deepseek-chatover the default model - Message limit (40): Prevents unbounded context growth; overflow triggers background compression
- Image stripping: Base64 images removed from session history to reduce token count
- Deduplication (0.92 threshold): Prevents storing near-identical memories
- Truncated tool output:
read_filecaps output at 10,000 characters - Debounce: Merges rapid-fire messages (3-second window) into single LLM calls
24/7 Running Cost Estimate
| Usage Pattern | Daily Conversations | Estimated Daily Cost (DeepSeek) | Estimated Daily Cost (GPT-4o) |
|---|---|---|---|
| Light (personal) | 10-20 | $0.20-$0.50 | $1.30-$2.60 |
| Moderate | 50-100 | $1.00-$2.50 | $6.50-$13.00 |
| Heavy (production) | 200+ | $4.00+ | $26.00+ |
The edge-deployment design (Jetson Orin Nano target) suggests the hardware cost is a one-time ~$200-$500 investment, with ongoing costs dominated by LLM API fees.
9 Architecture Solution
System Architecture
The system follows a pipeline architecture with clear data flow from messaging platform to LLM to tools:
┌─────────────────┐
│ WeChat Work │
│ (Messaging │
│ Platform) │
└────────┬────────┘
│ HTTP callback
┌────────▼────────┐
│ router.py │
│ │
│ Multi-tenant │
│ Docker routing │
│ Per-user │
│ containers │
└────────┬────────┘
│
┌────────▼────────┐
│ xiaowang.py │ Entry Point
│ │
│ ┌─ HTTP server (ThreadingMixIn)
│ ├─ Callback dispatch (cmd/msgType)
│ ├─ Debounce (3s window, per-sender)
│ ├─ Media download (3 fallback paths)
│ ├─ ASR pipeline (WebSocket streaming)
│ └─ File persistence (monthly dirs)
└────────┬────────┘
│
┌────────▼────────┐
│ llm.py │ Core Loop
│ │
│ ┌─ System prompt construction
│ │ (SOUL.md + AGENT.md + USER.md)
│ ├─ Memory retrieval + injection
│ ├─ Cross-session context bridge
│ ├─ Tool use loop (max 20 iterations)
│ ├─ Session management (40 msg limit)
│ └─ Image stripping for storage
└────────┬────────┘
│
┌──────────────┼──────────────┐
│ │ │
┌────────▼───┐ ┌──────▼──────┐ ┌────▼────────┐
│ tools.py │ │ memory.py │ │scheduler.py │
│ │ │ │ │ │
│ 26 built-in │ │ Compress: │ │ Cron + once │
│ tools │ │ LLM extract│ │ jobs.json │
│ Plugin dir │ │ Deduplicate:│ │ Persistent │
│ @tool deco │ │ cosine sim │ │ TZ-aware │
│ MCP bridge │ │ Retrieve: │ │ Auto-notify │
└──────┬──────┘ │ vector │ │ on failure │
│ │ search │ └─────────────┘
┌──────▼──────┐ └─────────────┘
│mcp_client.py│
│ │
│ JSON-RPC │
│ stdio/HTTP │
│ Auto- │
│ reconnect │
│ Namespace: │
│ srv__tool │
└─────────────┘
Threading Model
The system uses a multi-threaded architecture (no async/await):
| Thread | Purpose | Lifecycle |
|---|---|---|
| Main thread | HTTP server (ThreadingMixIn spawns per-request threads) |
Process lifetime |
| Per-request threads | Handle incoming webhooks → dispatch to callback handler | Per HTTP request |
| Debounce timers | threading.Timer per sender; fires after 3s of silence |
Created/cancelled per message |
| Chat lock threads | Serialize concurrent messages to same session | Per chat call |
| Memory compression | Background thread for LLM-based compression of evicted messages | Daemon, per overflow |
| Scheduler loop | Background thread checking jobs every 10 seconds | Daemon, process lifetime |
| Scheduler triggers | Per-job execution threads | Daemon, per trigger |
| MCP stdio readers | Timeout-wrapped reader threads for subprocess communication | Per MCP request |
| ASR streaming | Audio streaming thread + WebSocket client thread | Per voice message |
The use of threading over asyncio is a deliberate design choice — it avoids the "function coloring problem" (async infecting the entire call stack) and keeps the codebase accessible to developers unfamiliar with async Python.
Data Persistence
project_root/
├── config.json ← Master configuration
├── jobs.json ← Persistent scheduler state (atomic writes)
├── sessions/ ← Session history (one JSON file per session)
│ ├── dm_USER_ID.json ← DM session
│ ├── scheduler.json ← Scheduler session
│ └── test.json ← Test session
├── memory_db/ ← LanceDB vector storage
│ └── memories/ ← Vector table files
├── workspace/
│ ├── SOUL.md ← Agent personality
│ ├── AGENT.md ← Operational procedures
│ ├── USER.md ← User context
│ ├── memory/ ← Keyword-searchable memory files
│ │ └── MEMORY.md ← Long-term memory document
│ └── files/ ← Received/generated media
│ ├── index.json ← File metadata index
│ └── 2026-03/ ← Monthly organized media
└── plugins/ ← Runtime-created tools
└── *.py ← Custom tool files
All persistent state uses atomic writes (write to .tmp then os.replace()) to prevent corruption on crash.
10 Component Breakdown
Component 1: xiaowang.py — Entry Point (22KB)
The application entry point serving multiple responsibilities:
| Subcomponent | LOC (est.) | Purpose |
|---|---|---|
| Configuration loading | ~30 | JSON config, environment variables, directory setup |
| Module initialization | ~20 | Init messaging, LLM, scheduler, tools, memory in dependency order |
| File persistence | ~50 | Monthly-organized media storage with metadata index |
| ASR pipeline | ~120 | WebSocket streaming speech-to-text with HMAC authentication |
| Message debouncing | ~60 | 3-second per-sender debounce with fragment merging |
| Callback handler | ~120 | Message type dispatch (text, image, video, file, voice, link, location) |
| Media download | ~80 | Three-path media download: enterprise → personal → direct HTTP |
| HTTP server | ~40 | Threaded HTTP server with GET (health) and POST (callback) endpoints |
Debounce Architecture:
Message 1 ──► Buffer[sender_id] = [msg1] Timer(3s) started
Message 2 ──► Buffer[sender_id] = [msg1,msg2] Timer reset
Message 3 ──► Buffer[sender_id] = [msg1,msg2,msg3] Timer reset
... 3 seconds pass ...
Timer fires ──► Flush: merge texts, collect images
──► llm.chat(merged_text, session_key, images)
──► Split reply into ≤1800-byte chunks
──► Send chunks with 0.5s spacing
This prevents rapid-fire messages from creating multiple independent LLM calls, which would be wasteful and produce fragmented responses.
Media Download — Three-Path Fallback:
Has fileId + fileAeskey?
│
YES → Enterprise download API (wxWorkDownload)
│ │
│ ├── Success → return path
│ └── Fail → continue
│
Has fileAuthKey?
│
YES → Personal download API (wxDownload)
│ │
│ ├── Success → return path
│ └── Fail → continue
│
Has fileHttpUrl?
│
YES → Direct HTTP download (urllib.request.urlretrieve)
│ │
│ ├── Success → return path
│ └── Fail → all methods failed
│
NO → all methods failed
Component 2: llm.py — Core Loop (14KB)
The LLM interaction layer implementing the tool use loop:
| Subcomponent | LOC (est.) | Purpose |
|---|---|---|
| Provider management | ~30 | Load config, get default provider |
| LLM API calling | ~30 | Raw urllib request to chat completions endpoint |
| Session management | ~80 | Load/save sessions, handle overflow, strip images |
| Multimodal building | ~40 | Image-to-base64 encoding, multimodal message construction |
| System prompt | ~60 | SOUL/AGENT/USER loading + scheduler context injection |
| Tool use loop | ~70 | Core loop: LLM call → tool execution → repeat (max 20) |
Cross-Session Context Bridge:
A notable design pattern — the scheduler runs tasks in its own session (scheduler), but the user sees results in their DM session (dm_USER_ID). To maintain context:
def _get_recent_scheduler_context():
# Read scheduler session file
# Check freshness (2-hour window)
# Find last message tool call content
# Inject into DM session system prompt
This allows the user to respond to scheduled task output (e.g., a self-check report) in their normal chat flow, with the LLM aware of what was sent.
Component 3: tools.py — Tool Registry (48KB)
The largest file, containing the complete tool system:
| Subcomponent | LOC (est.) | Purpose |
|---|---|---|
| Registry + decorator | ~40 | @tool decorator, get_definitions(), execute() |
| Core tools (exec, message) | ~40 | Shell execution with timeout; message sending with chunking |
| File tools | ~70 | read/write/edit/list with workspace-relative paths |
| Scheduler tools | ~30 | CRUD for scheduled tasks |
| Media send tools | ~60 | Image/file/video/link sending via messaging API |
| Video processing tools | ~100 | ffmpeg trim, BGM mixing, AI video generation |
| Web search | ~150 | Multi-engine search (Tavily, web, GitHub, HuggingFace) |
| Memory search tools | ~50 | Keyword search (grep) + semantic search (vector retrieval) |
| Self-check tool | ~50 | System diagnostics report generation |
| Plugin system | ~80 | Plugin loading, runtime tool creation, MCP bridge |
The @tool Decorator:
def tool(name, description, properties, required=None):
def decorator(fn):
_registry[name] = {
"fn": fn,
"definition": {
"type": "function",
"function": {
"name": name,
"description": description,
"parameters": {
"type": "object",
"properties": properties,
**({"required": required} if required else {}),
},
},
},
}
return fn
return decorator
This single decorator handles both tool registration (for execution) and definition generation (for LLM function calling), keeping tool declaration co-located with implementation.
Multi-Engine Web Search:
The web_search tool implements intelligent source routing:
Query → Auto-detect source
│
├── Contains "huggingface/hf model" → HuggingFace API
├── Contains "github.com/github repo" → GitHub API
├── Contains "verify/exist/plugin/mcp" → All engines
└── Default → Dual-engine (Tavily + web)
Each engine:
├── Tavily: Advanced search with AI summary + relevance scores
├── Web: General search API with snippets
├── GitHub: Repo search (stars-sorted) + code search fallback
└── HuggingFace: Model search (downloads-sorted) with pipeline tags
Component 4: memory.py — Three-Layer Memory (13KB)
The memory system implementing the compress-deduplicate-retrieve pipeline:
| Subcomponent | LOC (est.) | Purpose |
|---|---|---|
| Init + public API | ~60 | LanceDB connection, table creation, 4 public functions |
| Embedding | ~30 | OpenAI-compatible embedding API calls |
| Compression | ~100 | LLM-based structured memory extraction |
| Deduplication | ~40 | Cosine similarity against existing memories |
| Storage | ~30 | LanceDB vector table operations |
| Retrieval | ~30 | Vector search + result formatting |
Memory Schema (LanceDB):
{
"id": "uuid", # Unique memory identifier
"fact": "string", # Complete factual statement
"keywords": "[json]", # Keyword array (JSON-serialized)
"persons": "[json]", # Person names involved
"timestamp": "string", # YYYY-MM-DD HH:MM or empty
"topic": "string", # Topic category
"session_key": "string",# Source session
"created_at": float, # Unix timestamp
"vector": [float*1024] # 1024-dim embedding
}
Component 5: mcp_client.py — MCP Protocol Client (12KB)
A self-implemented MCP client with zero SDK dependency:
| Subcomponent | LOC (est.) | Purpose |
|---|---|---|
MCPServer class |
~200 | Single server lifecycle, JSON-RPC, tool discovery |
| Stdio transport | ~60 | Subprocess stdin/stdout communication with timeout |
| HTTP transport | ~20 | POST JSON-RPC to HTTP endpoint |
| Protocol methods | ~40 | initialize, tools/list, tools/call |
| Module-level API | ~60 | init(), get_all_tool_defs(), execute(), reload(), shutdown() |
MCP Protocol Implementation:
The client implements only the three essential MCP methods:
initialize— handshake with protocol version and client infotools/list— discover available tools on the servertools/call— execute a tool with arguments
Tool Namespacing:
MCP tools are namespaced with double underscore: servername__toolname. This prevents name collisions between MCP servers and built-in tools.
Auto-Reconnect:
On ConnectionError or TimeoutError during tools/call, the client automatically:
1. Shuts down the current process
2. Starts a new subprocess
3. Re-runs initialize and tools/list
4. Retries the original call
Component 6: scheduler.py — Task Scheduling (7KB)
Persistent task scheduling with cron support:
| Feature | Implementation |
|---|---|
| One-shot tasks | delay_seconds → trigger at time.time() + delay |
| Recurring tasks | cron_expr → croniter-based next-trigger calculation |
| One-shot cron | cron_expr + once=True → triggers once at next cron match |
| Persistence | jobs.json with atomic writes |
| Check interval | 10-second polling loop (background thread) |
| Timezone | CST (UTC+8) aware — croniter uses local timezone datetime |
| Heartbeat | Log task status every 30 minutes |
| Failure handling | On task failure, sends notification via LLM chat |
Scheduler → LLM Integration:
When a scheduled task triggers, it calls chat_fn(message, "scheduler") — sending the task's message to the LLM as if it were a user message in the "scheduler" session. The LLM can then use any tool (including message to notify the owner). This creates a powerful automation loop:
Cron trigger → scheduler._trigger()
→ chat_fn("Run self-check and send report to owner", "scheduler")
→ LLM invokes self_check tool
→ LLM invokes message tool with report
→ Owner receives daily diagnostic report
Component 7: router.py — Multi-Tenant Docker Router (17KB)
Docker-based multi-tenant isolation:
| Feature | Implementation |
|---|---|
| Per-user containers | Auto-provision Docker container on first message |
| Health checks | Periodic container health verification |
| Request routing | Forward HTTP callbacks to appropriate container |
| Container lifecycle | Start, stop, restart, cleanup |
| Resource isolation | Docker-level resource limits per user |
This component enables the system to serve multiple users with complete isolation — each user gets their own container instance with independent sessions, memory, and workspace.
11 Core Mechanisms (Detailed)
Mechanism 1: Three-Layer Memory Pipeline
The memory system is the most architecturally sophisticated component, implementing a three-stage pipeline inspired by human memory models:
Layer 1 — Session Memory (Short-Term):
┌─────────────────────────────────────┐
│ Session file: dm_USER_ID.json │
│ │
│ Last 40 messages (JSON array) │
│ ├── user messages │
│ ├── assistant messages │
│ ├── tool call results │
│ └── system context │
│ │
│ On overflow (>40 messages): │
│ evicted = messages[:-40] │
│ messages = messages[-40:] │
│ compress_async(evicted) │
│ │
│ On load: │
│ Skip to first user message │
│ (clean truncation boundary) │
└─────────────────────────────────────┘
Layer 2 — Compressed Memory (Long-Term):
┌──────────────────────────────────────────────────┐
│ Background compression thread │
│ │
│ 1. Format evicted messages into dialogue text │
│ 2. Send to LLM with COMPRESS_PROMPT │
│ 3. LLM extracts structured facts: │
│ { │
│ "fact": "User prefers meetings at 10am", │
│ "keywords": ["meeting", "schedule"], │
│ "persons": ["User"], │
│ "timestamp": "2026-03-15 10:00", │
│ "topic": "preferences" │
│ } │
│ 4. Generate embeddings for each fact │
│ 5. Deduplicate against existing memories │
│ (cosine similarity > 0.92 → skip) │
│ 6. Store in LanceDB vector table │
└──────────────────────────────────────────────────┘
Layer 3 — Retrieval (Active Recall):
┌──────────────────────────────────────────────────┐
│ On every user message: │
│ │
│ 1. Embed user message text │
│ 2. Vector search in LanceDB (top-K=5) │
│ 3. Filter out seed data and low-quality results │
│ 4. Format as "[Relevant Memories]" block │
│ 5. Inject into system prompt │
│ │
│ Result (injected before LLM call): │
│ [Relevant Memories] │
│ - User prefers meetings at 10am (2026-03-15) │
│ - Client deadline is April 15 (2026-03-10) │
│ - Project uses React + TypeScript (2026-03-01) │
└──────────────────────────────────────────────────┘
Compression Prompt Engineering:
The COMPRESS_PROMPT is carefully designed to extract only long-term-valuable information:
Rules:
- Only extract information with long-term value
(preferences, plans, contacts, decisions, facts)
- Skip chitchat, greetings, repeated confirmations,
pure tool call results
- Replace "he/she/I" with specific names
- Replace "tomorrow/next week" with specific dates
- If nothing worth remembering, return empty array []
This filtering is critical — without it, the memory would fill with low-value conversational noise, degrading retrieval quality.
Deduplication via Cosine Similarity:
Before storing a new memory, it is compared against the most similar existing memory using vector cosine similarity. If similarity exceeds 0.92, the new memory is skipped. This prevents near-duplicate accumulation:
def _cosine_similarity(a, b):
dot = sum(x * y for x, y in zip(a, b))
norm_a = sum(x * x for x in a) ** 0.5
norm_b = sum(x * x for x in b) ** 0.5
if norm_a == 0 or norm_b == 0:
return 0
return dot / (norm_a * norm_b)
Note: The implementation uses a pure-Python dot product calculation rather than NumPy, consistent with the minimal-dependency philosophy.
Mechanism 2: Runtime Tool Creation (Self-Evolution)
The self-evolution mechanism allows the agent to create new tools at runtime:
Agent receives request for capability it doesn't have
│
▼
Agent uses create_tool to write a new Python function
│
▼
Function is written to plugins/ directory as .py file
│
▼
File is loaded via exec() with @tool decorator available
│
▼
New tool registered in _registry with OpenAI function schema
│
▼
Subsequent LLM calls include new tool in tool_defs
│
▼
Agent can now use the new tool in conversations
│
▼
Tool persists across restarts (loaded from plugins/ on startup)
Plugin Loading Mechanism:
def _exec_plugin(code, source=" "):
exec(compile(code, source, "exec"), {
"__builtins__": __builtins__,
"tool": tool, # The @tool decorator
"log": log, # Logger
})
The exec() call provides a controlled environment with access to:
- __builtins__ — Python built-ins
- tool — the decorator for tool registration
- log — the application logger
Security Considerations:
This is the most security-sensitive component. The agent can write and execute arbitrary Python code. Mitigations include:
- Single-user mode enforcement (OWNER_IDS whitelist)
- Workspace-relative path resolution
- Per-tool error handling (tool crashes don't crash the system)
- No automatic execution on untrusted input (requires LLM decision)
However, there is no sandboxing, code review, or capability restriction on created tools — a genuinely powerful but potentially dangerous feature.
Mechanism 3: MCP Protocol Bridge
The MCP client bridges external MCP servers into the agent's tool ecosystem:
Agent tool_defs = built-in tools + plugin tools + MCP tools
│
┌──────▼──────┐
│ LLM decides │
│ which tool │
│ to call │
└──────┬──────┘
│
┌────────────┼────────────┐
│ │ │
Built-in Plugin tool MCP tool
tool from (server__name)
(exec, plugins/ │
message) dir │
│ │ ┌────▼────┐
│ │ │ Parse │
│ │ │ name │
│ │ │ split │
│ │ │ on "__" │
│ │ └────┬────┘
│ │ │
│ │ ┌────▼────┐
│ │ │ Route │
│ │ │ to MCP │
│ │ │ server │
│ │ └────┬────┘
│ │ │
│ │ JSON-RPC
│ │ tools/call
│ │ │
│ │ Response
│ │ content
▼ ▼ ▼
Execute Execute Parse MCP
Python fn Python fn response
(text
concat)
MCP to OpenAI Schema Conversion:
# MCP format
{
"name": "search_notes",
"description": "Search through notes",
"inputSchema": {"type": "object", "properties": {...}}
}
# Converted to OpenAI format
{
"type": "function",
"function": {
"name": "notes_server__search_notes", # Namespaced
"description": "Search through notes",
"parameters": {"type": "object", "properties": {...}}
}
}
The inputSchema from MCP maps directly to parameters in OpenAI format — same JSON Schema structure, just renamed.
Hot-Reload Support:
The reload_mcp tool allows runtime reconfiguration:
def reload(config):
old_names = set(_servers.keys())
shutdown() # Close all existing connections
init(config) # Connect with new config
new_names = set(_servers.keys())
added = new_names - old_names
removed = old_names - new_names
return added, removed, len(_servers)
Mechanism 4: Self-Repair Diagnostics
The self-check mechanism runs daily (via scheduled task) and generates comprehensive system health reports:
Daily self-check scheduled task
│
▼
tool_self_check() collects:
├── Session activity (today's active sessions, message counts)
├── System health (disk, memory, processes)
├── Error log analysis (last 24h from application log)
├── Scheduler status (active jobs, next trigger times)
├── Memory system status (memory count, storage size)
└── Session health diagnostics
├── Empty sessions detection
├── High tool_call ratio detection
└── Potential issue flagging
│
▼
Report formatted as structured text
│
▼
LLM analyzes report + decides on actions
│
▼
message tool sends summary to owner
Session Health Detection:
The self-check scans all sessions for potential issues: - Empty sessions — sessions with no messages (possibly corrupted) - High tool_call ratio — sessions where tool calls dominate (agent may be stuck in a loop) - Stale sessions — sessions not updated recently
Mechanism 5: Debounce and Message Aggregation
The debounce system prevents rapid-fire messages from creating multiple LLM calls:
# Per-sender buffer with thread-safe access
_debounce_buffers = {} # sender_id -> [{"text": str, "images": [path]}]
_debounce_timers = {} # sender_id -> threading.Timer
_debounce_lock = threading.Lock()
Timing diagram:
Time ──────────────────────────────────────────►
│ │ │ │
msg1 msg2 msg3 flush
│ │ │ │
├─ timer ─┤ │ │
│ (3s) │ │ │
│ ├────┤ │
│ timer│ │
│ (3s) │ │
│ ├── timer(3s) ─┤
│ │ │
└──────────────┴──────────────┘
Buffer: [msg1, msg2, msg3]
└─► Flush:
merge texts
collect images
single LLM call
This is particularly important for messaging platforms where users often send messages in rapid succession (split thoughts across multiple messages).
Mechanism 6: Cross-Session Context Bridging
A subtle but important design pattern that solves the problem of context fragmentation across sessions:
Problem: The scheduler runs tasks in the scheduler session, but the user chats in the dm_USER_ID session. When the scheduler sends a self-check report, the user may respond in their DM, but the DM session has no context about what was sent.
Solution:
def _get_recent_scheduler_context():
# 1. Read scheduler session file
# 2. Check freshness (2-hour window)
# 3. Find last message tool call content
# 4. Truncate to 800 chars
# 5. Format with timestamp
# 6. Return for injection into DM system prompt
The DM system prompt includes recent scheduler output:
[Agent recently sent via scheduled task (09:00)]
Today's self-check report:
- Sessions: 15 active
- Errors: 2 warnings
- Memory: 142 facts stored
...(truncated)
This is only injected when: - The scheduler session file was modified within the last 2 hours - The current session is NOT the scheduler session (prevents circular injection) - The scheduler actually sent a message (not just processed internally)
12 Programming Language
Implementation Language: Python (100%)
The entire system is written in Python with a deliberate minimal-dependency philosophy:
Standard Library Usage:
| Module | Usage |
|---|---|
http.server, socketserver |
HTTP server with ThreadingMixIn |
json |
All serialization (config, sessions, tools, API calls) |
os, shutil |
File operations, directory management |
subprocess |
Shell execution (exec tool), ffmpeg, git |
threading |
Multi-threaded architecture (locks, timers, daemon threads) |
urllib.request, urllib.parse, urllib.error |
All HTTP client operations |
base64 |
Image encoding for multimodal LLM calls |
hashlib, hmac |
HMAC-SHA256 for ASR authentication |
logging |
Structured application logging |
time, datetime |
Timestamps, timezone handling (CST/UTC) |
uuid |
Memory record identifiers |
struct |
Binary data handling (for media processing) |
ssl |
SSL configuration for WebSocket ASR |
External Dependencies (3 packages):
| Package | Purpose | Size |
|---|---|---|
croniter |
Cron expression parsing for scheduler | Lightweight |
lancedb |
Embedded vector database for memory | Moderate (includes Lance format) |
websocket-client |
WebSocket communication for ASR | Lightweight |
pilk (optional) |
WeChat SILK audio format decoding | Lightweight |
Code Style
The codebase uses a distinctive style that prioritizes readability and debuggability:
- Single-file modules — each file is a complete, self-contained module
- Module-level state — global variables with explicit init functions (no classes for state management)
- %-formatting over f-strings in many places — consistent with C/Unix tradition
- Explicit error handling — try/except blocks at every I/O boundary
- Comments in mixed languages — English for public API, Chinese for implementation details
- No type annotations — consistent with rapid prototyping approach
- No abstract base classes — concrete implementations only
Architecture Anti-Patterns (Intentional Tradeoffs)
The codebase makes several deliberate tradeoffs:
| Pattern | Convention | 7/24 Office Choice | Rationale |
|---|---|---|---|
| HTTP client | requests or httpx |
urllib.request |
Zero dependency |
| Async I/O | asyncio |
threading |
Simpler mental model |
| Type safety | Type annotations | None | Rapid prototyping |
| Dependency injection | Constructor injection | Global state + init() | Fewer abstraction layers |
| Testing | pytest suite | Manual testing | Production as test environment |
| Configuration | Pydantic/dataclass | Raw dict from JSON | Minimal boilerplate |
These are reasonable tradeoffs for a solo-developer, production-running system where the author is the primary user and maintainer.
13 Memory Management
Memory Architecture Overview
7/24 Office implements a sophisticated three-layer memory system that maps roughly to human memory models:
┌─────────────────────────────────────────────────────┐
│ Human Analogy │
│ │
│ Working Memory ←→ Session (40 messages) │
│ Episodic Memory ←→ Compressed (LLM-extracted) │
│ Semantic Memory ←→ Retrieved (vector search) │
└─────────────────────────────────────────────────────┘
Layer 1: Session Memory (Working Memory)
Storage: JSON files in sessions/ directory, one per session key.
Capacity: Last 40 messages per session.
Overflow handling:
if len(messages) > MAX_SESSION_MESSAGES:
evicted = messages[:-MAX_SESSION_MESSAGES]
messages = messages[-MAX_SESSION_MESSAGES:]
mem_mod.compress_async(evicted, session_key)
Truncation boundary cleanup:
After truncation, orphan messages may appear at the start (tool results without matching assistant messages). The system skips to the first user or system message:
while messages and messages[0].get("role") not in ("user", "system"):
messages.pop(0)
Image handling:
Before saving, base64 image URLs are replaced with [image] text markers:
# Before: {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,/9j/..."}}
# After: {"type": "text", "text": "[image]"}
This prevents: 1. Session files growing unboundedly with large base64 strings 2. API errors from LLMs that don't accept image_url in history messages
Layer 2: Compressed Memory (Episodic/Long-Term)
Trigger: Automatic, whenever session messages exceed 40 and overflow occurs.
Process:
- Filter messages — keep only user and assistant text messages (skip tool calls, empty content)
- Format as dialogue —
"User: ...\nAssistant: ..."format - LLM extraction — send dialogue to LLM with structured extraction prompt
- Parse JSON output — extract array of
{fact, keywords, persons, timestamp, topic}objects - Generate embeddings — embed each fact using OpenAI-compatible embedding API
- Deduplicate — compare each new fact against existing memories via cosine similarity
- Store — insert non-duplicate memories into LanceDB vector table
LLM Output Parsing (Robust):
The parser handles common LLM output variations:
text = content.strip()
if text.startswith("```"):
# Remove markdown code fences
lines = [l for l in text.split("\n") if not l.strip().startswith("```")]
text = "\n".join(lines)
try:
result = json.loads(text)
except json.JSONDecodeError:
# Fallback: find content between [ and ]
start = text.find("[")
end = text.rfind("]")
if start >= 0 and end > start:
result = json.loads(text[start:end + 1])
Layer 3: Retrieved Memory (Active Recall)
Trigger: Every user message, before LLM call.
Process:
- Embed user message text
- Vector search in LanceDB (default top-K=5)
- Filter out seed data and low-quality results
- Format as
[Relevant Memories]block - Append to system prompt
Zero-latency cache:
For hardware/voice channels where latency is critical, the system maintains a pre-computed memory summary cache:
_context_cache = {} # session_key -> str
def get_cached_context(session_key):
return _context_cache.get(session_key, "")
Dual Memory Search Interface
The system provides two memory search tools with complementary strengths:
| Tool | Method | Use Case |
|---|---|---|
search_memory |
Keyword search (grep -r -i) in workspace/memory/ |
Exact term matching, file-level search |
recall |
Vector semantic search in LanceDB | Meaning-based recall, fuzzy matching |
The keyword search is scoped by scope parameter:
- all — search all memory files
- long — search only MEMORY.md (persistent long-term document)
- daily — search only daily log files (matching 2*.md pattern)
Memory System Comparison
| Aspect | LangChain Memory | 7/24 Office Memory |
|---|---|---|
| Architecture | Pluggable abstractions | Fixed three-layer pipeline |
| Dependencies | LangChain + vector store adapter | LanceDB only |
| Compression | User-implemented | Built-in LLM extraction |
| Deduplication | User-implemented | Built-in (cosine 0.92) |
| LOC | ~500+ (with abstractions) | ~300 |
| Configuration | Code-level setup | JSON config |
| Persistence | Depends on adapter | LanceDB files + JSON sessions |
14 Continued Learning
Self-Evolution Through Runtime Tool Creation
The most distinctive learning mechanism in 7/24 Office is runtime tool creation — the agent genuinely extends its own capabilities based on encountered tasks.
Evolution loop:
Conversation 1: "Can you check Bitcoin price?"
→ Agent has no crypto tool
→ Agent creates crypto_price tool via create_tool
→ Tool saved to plugins/crypto_price.py
Conversation 2: "What's the price of ETH?"
→ Agent now has crypto_price tool
→ Executes it directly
→ No tool creation needed
System restart:
→ plugins/ directory scanned
→ crypto_price.py loaded via exec()
→ Tool available immediately
This is a genuine form of open-ended self-evolution — the agent's capability space grows monotonically based on user interactions. Unlike fine-tuning (which requires retraining), this is immediate and persistent.
Limitations:
- No tool improvement — existing tools are not automatically refined or optimized
- No tool composition — new tools don't automatically compose with existing ones
- Quality depends on LLM — the quality of created tools depends on the LLM's code generation capability
- No sandboxing — created tools run with full Python permissions
- No versioning — no history of tool modifications
Memory-Based Behavioral Adaptation
The three-layer memory system provides implicit behavioral learning:
- Preference learning — compressed memories include user preferences ("User prefers meetings at 10am"), which are retrieved and injected into future conversations
- Context accumulation — facts about projects, people, and deadlines persist across sessions, allowing the agent to maintain long-term context
- Pattern recognition — over time, the memory accumulates patterns that influence the agent's responses (e.g., remembering that a particular approach worked for a type of problem)
Personality Evolution via Markdown Files
The SOUL.md, AGENT.md, and USER.md files provide a manual learning mechanism:
- SOUL.md — can be updated to refine agent personality and communication style
- AGENT.md — can be updated with new troubleshooting procedures based on encountered issues
- USER.md — can be updated with new user context as preferences change
These files are read on every conversation turn, so changes take effect immediately.
Self-Check as Learning Signal
The daily self-check report provides a learning signal:
- Error pattern detection — recurring errors in the log suggest systematic issues
- Session health metrics — high tool_call ratios may indicate inefficient tool use patterns
- Memory growth tracking — memory count trends indicate information accumulation rate
However, the system does not automatically act on these signals — the self-check report is sent to the owner for human review.
Comparison with Other Learning Approaches
| System | Learning Mechanism | Persistence | Scope |
|---|---|---|---|
| 7/24 Office | Runtime tool creation + memory compression | Disk (plugins/ + LanceDB) | Single instance |
| LangChain agents | No built-in learning | Session-only (default) | Per-session |
| AutoGPT | Task-based memory | File system | Per-task |
| OpenEvolve | Evolutionary program improvement | Program database | Per-experiment |
| Ouro Loop | Reflective log + BOUND evolution | JSONL + CLAUDE.md | Per-project |
| Devin | Proprietary session memory | Cloud | Per-workspace |
7/24 Office's runtime tool creation is unique in that it provides permanent capability expansion rather than just information retention.
15 Applications
Application 1: 24/7 Personal AI Assistant
The primary use case — a continuously running AI agent that handles daily tasks:
| Task Type | How It's Handled |
|---|---|
| Scheduling | "Remind me to call client at 3pm" → creates cron task |
| File management | "Save this as a report" → writes to workspace |
| Research | "Find recent papers on RAG" → multi-engine web search |
| Media processing | "Trim this video from 0:30 to 1:20" → ffmpeg operation |
| Memory | "What did we discuss about the project last week?" → vector recall |
| System status | Daily self-check reports sent automatically |
Application 2: Edge-Deployed AI Agent
Designed for deployment on embedded hardware:
| Target | Jetson Orin Nano |
|---|---|
| RAM | 8GB (agent uses <2GB) |
| CPU | ARM64 |
| GPU | Available for local inference |
| Storage | Local SSD for LanceDB + sessions |
| Connectivity | WiFi/Ethernet for API calls |
This enables AI agent deployment in scenarios where cloud hosting is undesirable (privacy, latency, cost) but LLM API access is available.
Application 3: Self-Evolving Tool Platform
The runtime tool creation enables organic capability growth:
Month 1: 26 built-in tools
Month 2: 26 + 5 custom tools (crypto prices, weather, RSS feeds, ...)
Month 3: 26 + 12 custom tools (project-specific automation, ...)
Month 6: 26 + 30+ custom tools (full personal automation suite)
Each tool is a single Python function with @tool decorator — no framework boilerplate, no configuration files, no deployment pipelines.
Application 4: MCP Tool Aggregator
By connecting multiple MCP servers, 7/24 Office becomes a unified interface to diverse tool ecosystems:
{
"mcp_servers": {
"filesystem": {
"transport": "stdio",
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem"]
},
"github": {
"transport": "stdio",
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-github"]
},
"database": {
"transport": "http",
"url": "http://localhost:3000/mcp"
}
}
}
All MCP tools appear alongside built-in tools in the LLM's tool definitions, creating a unified agent that can interact with file systems, APIs, databases, and custom services through a single conversation interface.
Application 5: Automated Monitoring and Notification
Combining the scheduler with the tool use loop creates a monitoring system:
Scheduled task: "Every hour, check server status and notify if issues"
│
▼
Scheduler triggers → LLM receives message
│
▼
LLM uses exec tool: "curl -s http://server/health"
│
▼
LLM analyzes response → decides if notification needed
│
▼
If issue: LLM uses message tool to alert owner
If OK: LLM does nothing (no notification spam)
Application 6: Multi-Tenant Production Service
Via router.py, the system can serve multiple users with complete isolation:
User A message → Router → Container A (own session, memory, workspace)
User B message → Router → Container B (own session, memory, workspace)
User C message → Router → Container C (auto-provisioned on first message)
Each container runs its own instance of the agent with independent: - Session files - Memory database - Workspace - Plugins - Scheduled tasks
Comparison with Related Systems
| System | Architecture | Dependencies | Memory | Self-Evolution | Deployment |
|---|---|---|---|---|---|
| 7/24 Office | 8 files, ~3.5K LOC | 3 packages | 3-layer (session/compressed/retrieval) | Runtime tool creation | Edge/cloud, Docker multi-tenant |
| LangChain | Framework (100K+ LOC) | 100+ packages | Pluggable adapters | No built-in | Cloud |
| AutoGPT | Agent framework | Many | File-based | Task-based learning | Cloud |
| CrewAI | Multi-agent framework | LangChain + | Shared memory | No built-in | Cloud |
| Ouro Loop | Methodology framework (3 files) | 0 | Reflective log (30 entries) | BOUND evolution | Any agent |
| Devin | Proprietary full agent | Proprietary | Cloud-based | Proprietary | Cloud only |
7/24 Office occupies a unique niche: it is a complete, production-running AI agent system that is small enough to be fully understood by a single developer, yet capable enough to run 24/7 with self-repair, memory, scheduling, and tool evolution. Its zero-framework approach is a philosophical statement as much as an engineering choice — proving that agent systems don't need massive frameworks to be production-viable.
Open Questions and Future Directions
- Tool quality assurance: How to ensure runtime-created tools are correct and safe? Could the self-check system validate plugin health?
- Multi-agent coordination: The multi-tenant router isolates users, but could agents collaborate across containers?
- Memory optimization: The compression prompt could be improved to extract more structured information (relations, causality, temporal sequences).
- Offline LLM support: The edge deployment story would be strengthened by support for local LLMs (Ollama, llama.cpp) — currently the system requires an API endpoint.
- Observability: The system logs extensively but lacks structured metrics (Prometheus, OpenTelemetry) for production monitoring.
- Security hardening: The
exectool andcreate_toolfeature need sandboxing for multi-user deployments. The currentOWNER_IDSwhitelist is insufficient for production multi-tenant use. - Internationalization: Adapting the WeChat Work integration to Slack/Discord/Telegram would significantly expand the potential user base.