One Server, Three Decisions: What an AI-Integrated Engineering Workflow Actually Looks Like

Decision 1: One large model, or two specialized ones?

Two RTX 3090s. 48GB of VRAM total. No NVLink, asymmetric PCIe lanes.

Before committing to any architecture, I needed to see the numbers. The naive approach would be to combine both GPUs for a single large model. But cross-GPU inference without NVLink drops to 5-10 tokens/s — far too slow for interactive coding sessions where latency kills flow state.

So I profiled how engineering time actually breaks down:

~80% interactive work — writing code, refactoring, debugging, running tests. This demands sub-second latency.
~20% deep analysis — architecture design, schema evolution strategy, complex debugging. This demands reasoning depth, not speed.

Two different cadences. Two different models.

Model	Role	Hardware	Throughput
Qwen2.5-Coder:32B	Interactive coding — completion, refactoring, debugging	Single GPU (19GB VRAM)	40-50 tokens/s
DeepSeek-R1:70B	Architectural reasoning — full chain-of-thought output	Cross-GPU pipeline (42GB VRAM)	5-10 tokens/s

Principle: Don't fight hardware constraints. Let them dictate division of labor. The same principle applies to choosing between microservices and monoliths — the constraint shapes the architecture.

Decision 2: Where does AI enter the workflow — and through what interface?

Model selection is a solved problem once you know your constraints. The harder question is integration: at what points in an engineer's daily workflow should AI have access, and through what interface?

I mapped every access point against a specific context. If a tool didn't have exactly one clear reason to exist, it didn't belong in the system. Four survived.

What a typical day looks like

Time	Location	Task	Tool	Model
Morning	At the workstation	Writing PySpark ETL pipelines	Aider CLI (in project directory)	Qwen:32B
Afternoon	At the workstation	Designing Iceberg partition strategy	Python batch script (R1 reasons → Coder implements)	R1:70B → Qwen:32B
Evening	Remote (Mac, coffee shop)	Building React components	VS Code Remote SSH + Aider	Qwen:32B
Commute	Phone	Technical concept lookup	Open WebUI (browser)	Qwen:32B

Remote access runs through Tailscale — WireGuard-based mesh VPN. No ports exposed, no traffic through third-party servers, no public IP required. The same security posture you'd expect from production infrastructure.

Principle: Every tool in the workflow must justify its existence with a specific, non-overlapping use case. If two tools serve the same context, eliminate one. This isn't minimalism — it's operational discipline.

Decision 3: Should I deploy OpenClaw?

This required the most evaluation time. I don't make decisions by instinct — I make them by exhausting the evidence first.

OpenClaw gained significant traction in early 2026. It provides autonomous computer control through messaging platforms (Telegram, WhatsApp, Slack), can read and write files, execute shell commands, browse the web, and write code to extend its own capabilities.

The capability set, if deployed:

Impressive surface area. So I built a risk matrix — because capability without a threat model is just exposure:

Risk	Severity	Acceptable?
Unrestricted file read/write across the entire machine	High	No
Cisco's security team documented data exfiltration in third-party skills	High	No
Local models lack judgment reliability for autonomous file/message operations	Medium	No
Self-evolving behavior means incomplete auditability	Medium	No

Four rows. Four "No"s. When every line in the risk matrix fails, the decision isn't close — you walk away.

The architectural difference between my approach and OpenClaw is not about capability — it's about where the decision boundary sits:

Principle: The capability boundary of AI is not the permission boundary you should grant it. In any system — software or organizational — the cost of failure scales with the scope of autonomous authority. AI generates. Humans authorize. That boundary should be explicit, not emergent.

The Full Stack: Three Layers, Each With a Role

This local server isn't a replacement for cloud AI. It's one layer in a three-layer system:

Layer	Tool	Role	Trigger
Cloud CLI	Claude Code	Primary development partner — project-level file access, command execution, strongest reasoning	Default for most development tasks
Cloud Web	Gemini	Real-time web search, cross-referencing, concept clarification	When live information or a second perspective is needed
Local GPU	Ollama + Aider / Open WebUI	Offline development, private data processing, batch reasoning	When code or data should not leave the local network

Each layer has a clear trigger condition. The local layer exists specifically for one scenario: when the cost of sending code or data to a third-party server exceeds the cost of running a less capable model locally.

That's not a philosophical stance. It's a risk calculation.

Boundaries: What This System Cannot Do

I'm uncomfortable with systems that don't declare their own boundaries. Any honest architecture document includes limitations:

No internet access. The local models cannot search, fetch live data, or access anything beyond their training cutoff. They are offline reasoning engines.
No proactive behavior. They do not monitor, alert, or initiate. They respond when asked.
Weaker than cloud frontier models. Claude and GPT-4 produce objectively better output for complex reasoning tasks.

Within its defined scope — code generation, architecture analysis, bug diagnosis, technical documentation processing — the system performs adequately. Zero marginal cost. Complete data privacy. Available 24/7 without network dependency.

This was never a choice between "best" and "second-best." It was a choice between convenience with reduced control and slight friction with full sovereignty over data flow.

I need to know where my data goes. I need to know what has authority over my file system. When those answers are ambiguous, I default to the option that gives me more control — even at the cost of capability.

One Server, Three Decisions: What an AI-Integrated Engineering Workflow Actually Looks Like

Decision 1: One large model, or two specialized ones?

Decision 2: Where does AI enter the workflow — and through what interface?

What a typical day looks like

Decision 3: Should I deploy OpenClaw?

The Full Stack: Three Layers, Each With a Role

Boundaries: What This System Cannot Do

Tags

Related Posts

How I Almost Crashed an Order System to Save 30% in Costs: The Survivorship Bias Lesson That Shattered My Trust in "95% Accuracy"

Four AI Agents Planned My Feature While I Waited for an Uber — All I Did Was Review

等 Uber 的五分鐘，四個 AI 幫我規劃完整個功能——回到辦公桌，我只做 Code Review

Enjoyed this article?

Share this article

Comments