Back to Blog
One Server, Three Decisions: What an AI-Integrated Engineering Workflow Actually Looks Like
📋 Case Study

One Server, Three Decisions: What an AI-Integrated Engineering Workflow Actually Looks Like

B
Blake
Mar 10, 2026 By Blake 12 min read
Most engineers use AI as a chatbot. This is what happens when you treat it as infrastructure — with the same rigor you'd apply to choosing a database or designing a deployment pipeline.

GPUX2

Decision 1: One large model, or two specialized ones?

Two RTX 3090s. 48GB of VRAM total. No NVLink, asymmetric PCIe lanes.

Before committing to any architecture, I needed to see the numbers. The naive approach would be to combine both GPUs for a single large model. But cross-GPU inference without NVLink drops to 5-10 tokens/s — far too slow for interactive coding sessions where latency kills flow state.

So I profiled how engineering time actually breaks down:

  • ~80% interactive work — writing code, refactoring, debugging, running tests. This demands sub-second latency.

  • ~20% deep analysis — architecture design, schema evolution strategy, complex debugging. This demands reasoning depth, not speed.

Two different cadences. Two different models.

Dualcore model allocation

Model

Role

Hardware

Throughput

Qwen2.5-Coder:32B

Interactive coding — completion, refactoring, debugging

Single GPU (19GB VRAM)

40-50 tokens/s

DeepSeek-R1:70B

Architectural reasoning — full chain-of-thought output

Cross-GPU pipeline (42GB VRAM)

5-10 tokens/s

Principle: Don't fight hardware constraints. Let them dictate division of labor. The same principle applies to choosing between microservices and monoliths — the constraint shapes the architecture.


Decision 2: Where does AI enter the workflow — and through what interface?

Model selection is a solved problem once you know your constraints. The harder question is integration: at what points in an engineer's daily workflow should AI have access, and through what interface?

I mapped every access point against a specific context. If a tool didn't have exactly one clear reason to exist, it didn't belong in the system. Four survived.

System architecture and access layers

What a typical day looks like

Time

Location

Task

Tool

Model

Morning

At the workstation

Writing PySpark ETL pipelines

Aider CLI (in project directory)

Qwen:32B

Afternoon

At the workstation

Designing Iceberg partition strategy

Python batch script (R1 reasons → Coder implements)

R1:70B → Qwen:32B

Evening

Remote (Mac, coffee shop)

Building React components

VS Code Remote SSH + Aider

Qwen:32B

Commute

Phone

Technical concept lookup

Open WebUI (browser)

Qwen:32B

Remote access runs through Tailscale — WireGuard-based mesh VPN. No ports exposed, no traffic through third-party servers, no public IP required. The same security posture you'd expect from production infrastructure.

Principle: Every tool in the workflow must justify its existence with a specific, non-overlapping use case. If two tools serve the same context, eliminate one. This isn't minimalism — it's operational discipline.


Decision 3: Should I deploy OpenClaw?

This required the most evaluation time. I don't make decisions by instinct — I make them by exhausting the evidence first.

OpenClaw gained significant traction in early 2026. It provides autonomous computer control through messaging platforms (Telegram, WhatsApp, Slack), can read and write files, execute shell commands, browse the web, and write code to extend its own capabilities.

The capability set, if deployed:

OpenClaw permission scope

Impressive surface area. So I built a risk matrix — because capability without a threat model is just exposure:

Risk

Severity

Acceptable?

Unrestricted file read/write across the entire machine

High

No

Cisco's security team documented data exfiltration in third-party skills

High

No

Local models lack judgment reliability for autonomous file/message operations

Medium

No

Self-evolving behavior means incomplete auditability

Medium

No

Four rows. Four "No"s. When every line in the risk matrix fails, the decision isn't close — you walk away.

The architectural difference between my approach and OpenClaw is not about capability — it's about where the decision boundary sits:

My approach vs OpenClaw

Principle: The capability boundary of AI is not the permission boundary you should grant it. In any system — software or organizational — the cost of failure scales with the scope of autonomous authority. AI generates. Humans authorize. That boundary should be explicit, not emergent.


The Full Stack: Three Layers, Each With a Role

This local server isn't a replacement for cloud AI. It's one layer in a three-layer system:

Layer

Tool

Role

Trigger

Cloud CLI

Claude Code

Primary development partner — project-level file access, command execution, strongest reasoning

Default for most development tasks

Cloud Web

Gemini

Real-time web search, cross-referencing, concept clarification

When live information or a second perspective is needed

Local GPU

Ollama + Aider / Open WebUI

Offline development, private data processing, batch reasoning

When code or data should not leave the local network

Each layer has a clear trigger condition. The local layer exists specifically for one scenario: when the cost of sending code or data to a third-party server exceeds the cost of running a less capable model locally.

That's not a philosophical stance. It's a risk calculation.


Boundaries: What This System Cannot Do

I'm uncomfortable with systems that don't declare their own boundaries. Any honest architecture document includes limitations:

  • No internet access. The local models cannot search, fetch live data, or access anything beyond their training cutoff. They are offline reasoning engines.

  • No proactive behavior. They do not monitor, alert, or initiate. They respond when asked.

  • Weaker than cloud frontier models. Claude and GPT-4 produce objectively better output for complex reasoning tasks.

Within its defined scope — code generation, architecture analysis, bug diagnosis, technical documentation processing — the system performs adequately. Zero marginal cost. Complete data privacy. Available 24/7 without network dependency.

This was never a choice between "best" and "second-best." It was a choice between convenience with reduced control and slight friction with full sovereignty over data flow.

I need to know where my data goes. I need to know what has authority over my file system. When those answers are ambiguous, I default to the option that gives me more control — even at the cost of capability.

Enjoyed this article? Show some love!

0
Clap

Enjoyed this article?

Subscribe for engineering notes and AI development insights

We respect your privacy. No spam, unsubscribe anytime.

Share this article

Comments