NVIDIA Nemotron & NemoClaw — Technical Briefing

Executive Summary

NVIDIA announced two products at GTC 2026 (March 16) that directly impact our OpenClaw evaluation:

Nemotron — NVIDIA's open-source LLM family. Hybrid Mamba-Transformer architecture with Mixture-of-Experts. Models range from 4B to ~500B parameters, but only activate 1B–50B at inference time, making them fast and efficient. Free, permissive license. Designed for agentic AI workloads at scale.
NemoClaw — An enterprise security wrapper that installs OpenClaw + Nemotron models inside a sandboxed runtime with network isolation, filesystem restrictions, and privacy routing. Apache 2.0 license. Currently alpha.

Key Insight

Together, they transform the OpenClaw equation. Our original evaluation flagged OpenClaw as CRITICAL security risk. NemoClaw addresses the top concerns (network exposure, filesystem access, credential leakage). Nemotron provides free local inference, eliminating API costs for routine tasks.

Bottom Line

NemoClaw + Nemotron makes OpenClaw deployable in a way raw OpenClaw never was — but it's alpha software with no third-party audits yet, and our core objections (Jack's allergy safety, prompt-based security) still apply.

NVIDIA Nemotron — The Model Family

What Is It?

Nemotron is NVIDIA's family of open foundation models, spanning four generations since 2024. The current generation (Nemotron 3) introduces a breakthrough hybrid Mamba-Transformer Mixture-of-Experts architecture that delivers frontier-class quality at a fraction of the compute cost.

NVIDIA's strategy is clear: give away the models to sell the hardware. But the models are genuinely good, and the licensing is among the most permissive in the industry.

The Nemotron 3 Lineup

Model	Total Params	Active Params	Context Window	Target Use Case
Nano 4B	4B	~1B	1M tokens	Edge devices, mobile, IoT
Nano 30B	30B	3B	1M tokens	Efficient agent tasks, local workstations
Super 120B	120B	12B	1M tokens	Multi-agent workflows, complex reasoning
Ultra ~500B	~500B	~50B	1M tokens	Frontier reasoning (expected H1 2026)

Architecture Deep Dive

The Nemotron 3 architecture combines three paradigms that have individually proven successful:

1. Mamba-2 Layers (Linear-Time Sequence Processing)

Process sequences in O(n) time instead of O(n²) for standard attention
Excellent for long context windows — 1M tokens becomes practical
Handle sequential reasoning and pattern matching efficiently
23 of 52 layers in Nano 30B are Mamba-2

2. Transformer Attention Layers (Precise Associative Recall)

Grouped Query Attention (GQA) with 2 groups for efficiency
Handle tasks requiring precise retrieval from context (names, numbers, code references)
6 of 52 layers in Nano 30B are GQA attention
Placed strategically where precise recall matters most

3. Mixture-of-Experts (Parameter Efficiency)

23 of 52 layers are MoE in Nano 30B
Each MoE layer has multiple specialist "expert" sub-networks
Router selects only 1–2 experts per token — rest stay dormant
Result: 30B total params but only 3B active per token = 10x efficiency

Novel Innovation — Latent MoE

Compresses token representations before routing to experts. Enables 4x more expert specialists at the same inference cost. Think of it as "expert specialization on a budget."

Multi-Token Prediction (MTP)

Model predicts multiple future tokens simultaneously. Up to 3x wall-clock speedup for structured output (JSON, code, markdown). Particularly valuable for agent tool-calling patterns.

NVFP4 Native Training

First models trained natively in 4-bit floating point precision. Purpose-built for NVIDIA B200 GPUs. 4x memory and compute efficiency vs FP8 on H100. Means smaller GPUs can run larger models.

Training Pipeline

Phase 1 — Pretraining

25 trillion tokens (massive — GPT-4 was reportedly ~13T)
NVFP4 precision on B200 GPU clusters
NVIDIA released 3T+ tokens of the pretraining data publicly

Phase 2 — Supervised Fine-Tuning

7 million samples from a 40M-sample post-training corpus
Curated for instruction following, tool use, and agentic behavior
NVIDIA released 18M samples of this data publicly

Phase 3 — Multi-Environment Reinforcement Learning

21 different environment configurations
1.2 million environment rollouts
10 new training gym environments (open-sourced)
Optimized for multi-step reasoning and real-world task completion

Benchmark Performance

Nemotron 3 Super (120B / 12B active)

Benchmark	Result	What It Measures
PinchBench	85.6% (best open model in class)	Agent reasoning and planning
AIME 2025	Leading in size class	Advanced mathematics
SWE-Bench Verified	Leading in size class	Real-world software engineering
Terminal Bench	Leading in size class	Command-line task completion
Throughput	5x previous Nemotron	Raw inference speed

Historical: Llama-Nemotron 70B vs Competitors

Benchmark	Nemotron 70B	GPT-4o	Claude 3.5 Sonnet
Arena Hard	85.0	79.3	79.2
AlpacaEval 2 LC	57.6	—	—
MT-Bench	8.98	—	—
Aider (coding)	55.0%	72.9%	—

Honest Assessment

Nemotron wins on alignment/chat benchmarks. Claude and GPT-4o still lead on coding and complex reasoning. Nemotron 3 Super is more competitive on coding (SWE-Bench leading in class), but detailed head-to-head vs Claude Opus/Sonnet is not yet published.

Where Nemotron truly excels: Throughput. When you need many parallel agents doing moderate-complexity tasks, Nemotron's MoE architecture delivers more tokens per second per dollar than any competitor.

Specialized Variants

Variant	Purpose
Nemotron 3 Omni	Multimodal — audio + vision + language in one model
Nemotron 3 VoiceChat	Real-time simultaneous listen-and-respond
Nemotron Nano VL 12B	Vision-language for image understanding
Nemotron RAG	Retrieval and embedding (leading ViDoRe, MTEB leaderboards)
Nemotron Safety	Content moderation and guardrails
Nemotron Speech	Automatic speech recognition and text-to-speech

Licensing

NVIDIA Open Model License:

Use, modify, distribute, commercially deploy — all allowed
Royalty-free, perpetual, worldwide
No attribution required
Weights, training data, AND training recipes all published
One of the most permissive AI model licenses in existence

Availability

Platform	Access
Hugging Face	All models (BF16, FP8 variants)
NVIDIA NIM	API via build.nvidia.com
Ollama	Nemotron 3 Super for local inference
NeMo Framework	Full training and fine-tuning
GitHub	Developer assets at NVIDIA-NeMo/Nemotron

The Nemotron Coalition

Announced at GTC 2026 — a first-of-its-kind global collaboration:

Members: Black Forest Labs, Cursor, LangChain, Mistral AI, Perplexity, Reflection AI, Sarvam, Thinking Machines Lab

Goal: Collaboratively build the next generation of open frontier models across six families:

Nemotron — Language
Cosmos — World models / vision
Isaac GR00T — Robotics
Alpaymayo — Autonomous driving
BioNeMo — Biology / chemistry
Earth-2 — Weather / climate

Notable Adopters

Accenture, Cadence, CrowdStrike, Cursor, Deloitte, EY, Oracle Cloud Infrastructure, Palantir, Perplexity, ServiceNow, Siemens, Synopsys, Zoom

NVIDIA NemoClaw — The Security Wrapper

What Is It?

NemoClaw is an open-source software stack that wraps OpenClaw with enterprise-grade security, privacy, and isolation controls. It is not a separate agent — it is OpenClaw running inside NVIDIA's security infrastructure.

Jensen Huang, GTC 2026 keynote: "30 years of NVIDIA computing, distilled into an agent platform."

Peter Steinberger (OpenClaw creator, now at OpenAI): "With NVIDIA and the broader ecosystem, we're building the claws and guardrails that let anyone create powerful, secure AI assistants."

One-Command Install

curl -fsSL https://nvidia.com/nemoclaw.sh | bash

This installs:

OpenClaw agent
Nemotron models (default: Nemotron 3 Super 120B)
OpenShell sandboxed runtime
NVIDIA Agent Toolkit with pre-configured security policies

Security Note

The curl | bash installation pattern is a security anti-pattern. Mitigate by reviewing the script content before running it. This alone should not be a blocker, but it's worth flagging.

Architecture

Two-component design:

Component	Language	Role
CLI Plugin	TypeScript	Integrates with OpenClaw CLI, user-facing
Blueprint	Python	Orchestrates OpenShell resources, manages sandbox

The Four-Layer Security Model

This is NemoClaw's core value proposition — the direct answer to OpenClaw's CRITICAL security rating.

Layer 1: Network Isolation

Default deny — all network connections blocked unless explicitly whitelisted
Policy defined in openclaw-sandbox.yaml (human-readable, version-controlled)
Unauthorized requests are blocked AND surfaced in a TUI for operator review
Hot-reloadable — change policies without restarting the sandbox
What this fixes: OpenClaw's 40K+ exposed instances problem. NemoClaw never listens on 0.0.0.0.

Layer 2: Filesystem Restrictions

Agent can write ONLY to /sandbox and /tmp
All other filesystem paths are read-only
No access to host filesystem outside the container
What this fixes: OpenClaw's unrestricted file access (could read SSH keys, credentials, browser data)

Layer 3: Process Protection

Agent runs inside OpenShell — a K3s-based container sandbox
All blueprints are immutable, versioned, and digest-verified
Executed as subprocesses with restricted capabilities
No privilege escalation possible from within the sandbox
What this fixes: OpenClaw's Docker sandbox being OFF by default, arbitrary code execution risk

Layer 4: Inference Routing (Privacy Router)

All model API calls route through OpenShell — agent cannot call external APIs directly
Privacy router makes the key decision for each query:
- Sensitive data → routed to local Nemotron models (never leaves the machine)
- Non-sensitive / high-capability needed → routed to frontier cloud models (Claude, GPT)
Configurable classification rules for what counts as "sensitive"
What this fixes: OpenClaw's credential and PII leakage to external model providers

System Requirements

Spec	Minimum	Recommended
CPU	4 vCPU	4+ vCPU
RAM	8 GB	16 GB
Disk	20 GB	40 GB
OS	Ubuntu 22.04 LTS+	Ubuntu 22.04 LTS+
Runtime	Node.js 20+, Docker	Node.js 20+, Docker

Hardware agnostic — does not require NVIDIA GPUs (though optimized for them). Supported on: GeForce RTX PCs, RTX PRO workstations, DGX Station, DGX Spark, any Linux machine with Docker.

Release Status

Detail	Value
Announced	March 16, 2026 (GTC keynote)
License	Apache 2.0
GitHub	github.com/NVIDIA/NemoClaw
Stars	~6.7K (first 2 days)
Forks	739
Contributors	~26
Status	Alpha — "Expect rough edges"
Tech Stack	TypeScript 37.7%, Shell 30.6%, JS 25.7%, Python 4.9%

Alpha Warning

NVIDIA's own docs: "Interfaces, APIs, and behavior may change without notice as the design iterates."

Enterprise Partnerships

Being pursued for NemoClaw integrations: Salesforce, Cisco, Google, Adobe, CrowdStrike, SAP, JFrog (supply chain security)

OpenClaw — Quick Refresher

For full details, see our Deep Research Report on Notion.

Attribute	Detail
What	Open-source autonomous AI agent (TypeScript/Node.js)
GitHub Stars	234K+
License	MIT
Creator	Peter Steinberger (now at OpenAI)
Governance	Moving to open-source foundation
Runtime	Long-lived Gateway daemon on port 18789
Messaging	22+ platforms (WhatsApp, Signal, Telegram, Discord, iMessage, Slack, Teams, etc.)
AI Models	20+ providers (Claude, GPT, Gemini, DeepSeek, Ollama, etc.)
Skills	10,700+ community skills on ClawHub
Integrations	50+ (chat, smart home, music, productivity, browser, cron)
Security Rating	CRITICAL — 512 vulns, 8+ critical CVEs, 20% malicious marketplace skills

Why We Were Cautious

40K+ instances exposed on public internet — Gateway binds to 0.0.0.0
ClawHavoc attack — 1,184 malicious skills in official marketplace (12–20% compromised)
Prompt-based security — safety rules are instructions, not architectural boundaries
Microsoft's warning: "Not appropriate to run on a standard personal or enterprise workstation"
Jack's allergy rules — cannot safely move from hardcoded logic to prompt-based

The Full Stack: OpenClaw + NemoClaw + Nemotron

How They Fit Together

┌────────────────────────────────────────────────────┐
│                    USER INTERFACE                     │
│     WhatsApp  Signal  Telegram  Discord  iMessage    │
└──────────────────────┬──────────────────────┘
                       │
┌──────────────────────▼──────────────────────┐
│                    OPENCLAW                           │
│     Agent Runtime · Skills · Memory · Integrations   │
│     (TypeScript, Gateway daemon, port 18789)         │
└──────────────────────┬──────────────────────┘
                       │
┌──────────────────────▼──────────────────────┐
│                    NEMOCLAW                           │
│     Security Wrapper · Sandbox · Policy Engine       │
│                                                      │
│  ┌─────────────┐  ┌──────────────┐  ┌────────────┐  │
│  │   Network    │  │  Filesystem  │  │  Process   │  │
│  │  Isolation   │  │ Restrictions │  │ Protection │  │
│  │ (whitelist)  │  │ (/sandbox    │  │ (OpenShell │  │
│  │              │  │  /tmp only)  │  │  K3s)      │  │
│  └─────────────┘  └──────────────┘  └────────────┘  │
│                                                      │
│  ┌────────────────────────────────────────────┐    │
│  │           PRIVACY ROUTER                      │    │
│  │  Sensitive → Local    Non-sensitive → Cloud   │    │
│  └────────────────────────────────────────────┘    │
└──────────────────────┬──────────────────────┘
                       │
        ┌──────────────┼──────────────┐
        ▼              ▼              ▼
┌──────────────┐ ┌──────────┐ ┌──────────────┐
│   NEMOTRON   │ │  Claude  │ │   GPT / etc  │
│  (Local LLM) │ │  (Cloud) │ │   (Cloud)    │
│  Free, fast  │ │  Smart   │ │   Optional   │
│  Private     │ │  Capable │ │              │
└──────────────┘ └──────────┘ └──────────────┘

What Each Layer Provides

Layer	Provides	Without It
OpenClaw	Agent brain, messaging, skills, integrations, always-on daemon	No agent — just raw model APIs
NemoClaw	Security sandbox, network isolation, filesystem lock, privacy routing	OpenClaw runs naked — CRITICAL risk
Nemotron	Free local inference, private data stays local, no API costs for routine tasks	Pay per token to cloud providers for everything

Why This Combination Matters

Before NemoClaw: Deploying OpenClaw required accepting CRITICAL security risk. Our evaluation said "do not deploy without full isolation" — which meant building your own sandbox, firewall rules, Docker hardening, and credential isolation manually.

After NemoClaw

NVIDIA built exactly the isolation we specified. One command gets you a sandboxed OpenClaw with: network whitelist (no more 40K exposed instances), filesystem jail (no SSH key / credential theft), process isolation (no container escape), and privacy routing (sensitive data stays on-device via Nemotron).

With Nemotron: Routine queries (scheduling, reminders, simple lookups, home automation) run on free local models. Only complex reasoning (coding, analysis, financial) routes to Claude. This dramatically reduces API costs and keeps private data off external servers.

Comparison: Our Stack vs the NVIDIA-OpenClaw Stack

Architecture Comparison

Dimension	Our Stack (Claude Code + COO)	NVIDIA Stack (OpenClaw + NemoClaw + Nemotron)
Runtime	Ephemeral CLI sessions	Always-on daemon (24/7)
Interface	Terminal + Discord (limited)	22+ messaging platforms
AI Model	Claude only (Anthropic)	Multi-model (Claude + GPT + Gemini + local Nemotron)
Security Model	No daemon = minimal attack surface	4-layer sandbox (NemoClaw)
Privacy	All queries go to Anthropic API	Privacy router — sensitive stays local
Cost	Pro plan + API usage	Nemotron free locally; API only for complex tasks
Coding	Best-in-class (Claude Code)	Weaker — Nemotron trails Claude on coding
Orchestration	C-suite agent hierarchy (COO/CTO/CFO/CISO/CMO)	Flat — single agent with skills
Memory	File-based + session persistence	SQLite vector + daily logs + MEMORY.md
Smart Home	Home Assistant MCP	Home Assistant (same underlying)
Network Mgmt	UniFi MCP (direct UDM Pro control)	No equivalent
Financial	Monarch Money MCP (real bank data)	No equivalent
Food Safety	Hardcoded allergy rules (rosey-bot)	Prompt-based only — UNACCEPTABLE for Jack
Voice	None	Wake word, push-to-talk, TTS
Music	None	Spotify, Sonos, Shazam
Scheduling	Manual (pending items only)	Cron, scheduled automation
Browser	Firecrawl (scraping)	Full Chromium CDP automation
Messaging	Discord + Mattermost only	WhatsApp, Signal, Telegram, iMessage, Slack, Teams + 16 more

Where Each Stack Wins

Our Stack Wins

Software development and coding tasks
Multi-agent orchestration with domain expertise
Security review (CISO agent reviews before execution)
Financial tracking and analysis
Network infrastructure management
Food safety (hardcoded rules, not prompt-based)
Project isolation and memory management

NVIDIA Stack Wins

Always-on availability (daemon vs CLI sessions)
Messaging ubiquity (22+ platforms vs 2)
Voice interaction
Music control
Scheduled automation / cron
Multi-model flexibility
Privacy (local inference for sensitive data)
Cost efficiency (free local models for routine tasks)
Browser automation

Use Cases for Our Household

High-Value Use Cases (NemoClaw + Nemotron + OpenClaw)

1. Family Messaging Hub

Ashley, Valentina, and family members message the AI via WhatsApp or Signal (apps they already use). No need to install Discord or learn terminal commands. Example: Ashley texts "What's for dinner tonight?" → agent checks meal plan, confirms allergen safety, responds. Privacy router keeps family conversations on local Nemotron — never hits external APIs.

2. Always-On Home Automation

"Turn off the lights at 10pm every night." "If the garage door is open after 11pm, close it and tell me." Scheduled tasks that our ephemeral CLI sessions can't do. Integrates with existing Home Assistant setup.

3. Proactive Scheduling

Morning briefing: weather, calendar, commute, school schedule. Automatic reminders for appointments, medications, school events. Cron-based recurring tasks without human initiation.

4. Voice Interface

Wake word activation for hands-free queries while cooking, driving, etc. Push-to-talk for quick questions. TTS responses — useful when hands are busy.

5. Music Control

"Play Jordan's bedtime playlist on the nursery Sonos." Spotify/Sonos integration through natural language.

6. Local AI for Private Tasks

Journal entries, personal reflections, sensitive family discussions. Nemotron processes locally — nothing leaves the house. Medical questions routed to local model, not cloud APIs.

Use Cases Where Our Stack Remains Superior

Software Development → Claude Code + developer/reviewer agents
Financial Analysis → CFO agent + Monarch Money MCP
Network Management → CTO agent + UniFi MCP
Security Review → CISO agent reviews before execution
Meal Planning → rosey-bot with hardcoded allergy rules (NEVER move to prompt-based)

The Hybrid Approach

Run both stacks, each doing what it's best at:

Task Type	Handled By	Why
Coding, development	Claude Code + COO	Best-in-class coding, agent hierarchy
Finance, budgets	CFO agent + Monarch	Real bank data, structured analysis
Network, infrastructure	CTO agent + UniFi	Direct hardware control
Security review	CISO agent	Architectural review before execution
Meal planning	rosey-bot	Hardcoded allergy safety
Family messaging	OpenClaw + NemoClaw	22+ platforms, always-on
Home automation	OpenClaw + HA	Scheduled, always-on
Voice, music	OpenClaw	No equivalent in our stack
Private/sensitive queries	Nemotron (local)	Never leaves the machine
Quick lookups, reminders	OpenClaw + Nemotron	Free, fast, local

Security Analysis

What NemoClaw Fixes

Original Risk	Rating	NemoClaw Mitigation	Residual Risk
Network exposure (40K+ instances)	CRITICAL	Whitelist-only networking, no 0.0.0.0 binding	LOW — if policy is correctly configured
Filesystem access (SSH keys, creds)	CRITICAL	Write-only to /sandbox and /tmp	LOW — host filesystem isolated
Credential leakage to external APIs	HIGH	Privacy router, all API calls through OpenShell	MEDIUM — depends on classification accuracy
Arbitrary code execution	HIGH	OpenShell K3s container, digest-verified blueprints	LOW — container escape is hard
Prompt injection	HIGH	NOT ADDRESSED — still prompt-based security	HIGH — fundamental architectural flaw
Malicious marketplace skills	CRITICAL	PARTIALLY ADDRESSED — JFrog partnership for supply chain	MEDIUM — skill vetting still incomplete
Data at rest (memory stores PII)	HIGH	Sandbox isolation limits what's stored	MEDIUM — data in /sandbox still unencrypted

What NemoClaw Does NOT Fix

Unfixed: Prompt Injection

The fundamental flaw. If a crafted message can hijack the agent's instructions, the sandbox doesn't help because the agent is already authorized to act. NemoClaw limits the blast radius but doesn't prevent the hijack.

Unfixed: Malicious Skills

ClawHub still has vetting issues. JFrog partnership is announced but not implemented. Installing community skills remains risky.

Unfixed: Jack's Allergy Safety (LIFE-SAFETY ISSUE)

Moving hardcoded allergy rules to prompt-based instructions is STILL unacceptable. A prompt injection could override "never recommend foods containing almonds, sesame, milk, eggs, or peanuts." This is a life-safety issue that sandboxing does not address.

Unfixed: Alpha Software

No third-party security audits. NVIDIA's security claims are design documents, not battle-tested facts.

Unfixed: curl | bash Install

The installation method itself is a security anti-pattern. Mitigated by reviewing the script before running, but still concerning.

Security Recommendation

If deploying NemoClaw + OpenClaw:

Dedicated VM or container host (not EQR1)
Tailscale-only network access (no public exposure)
CISO agent review of sandbox policies before going live
No shared credentials with our main stack
Allergy-related meal planning stays in rosey-bot — NEVER delegate to OpenClaw
Monitor NemoClaw GitHub for security advisories
Revisit in 3–6 months when third-party audits exist

Hardware & Deployment Options

Option A: Docker on EQR2

Spec	EQR2 Current	Requirement
CPU	TBD	4+ vCPU
RAM	TBD	16 GB recommended
GPU	None required	Optional (Nemotron runs on CPU, faster on GPU)
Disk	TBD	40 GB for NemoClaw + models
Network	Tailscale	Already configured

Pros: Separate machine from main infrastructure, Tailscale already set up

Cons: May not have GPU for fast Nemotron inference

Option B: Dedicated VM on EQR1

Pros: EQR1 has resources, Docker available

Cons: Shares host with critical infrastructure. Adds attack surface to primary machine.

Not Recommended

Isolation is the whole point. Don't put the experiment next to production.

Option C: DGX Spark (New Hardware)

NVIDIA's new personal AI supercomputer. Designed specifically for NemoClaw + Nemotron.

128GB unified memory
Grace Blackwell GPU
Runs Nemotron 3 Super natively
MSRP: ~$3,000 (pre-order)

Pros: Purpose-built, maximum Nemotron performance, dedicated hardware

Cons: $3,000, delivery timeline uncertain, may be overkill for evaluation

Option D: Cloud Instance

Spin up a cloud VM (any provider) with: 4 vCPU, 16GB RAM, 40GB disk, Docker pre-installed, Tailscale joined to our tailnet.

Pros: Zero hardware commitment, easy to tear down

Cons: Monthly cost, data leaves our network (partially offset by privacy router)

Nemotron Model Sizing for Local Inference

Model	VRAM (FP16)	VRAM (Quantized)	CPU-Only?	Speed
Nano 4B	~8 GB	~2–4 GB	Yes (slow)	Fast on any GPU
Nano 30B (3B active)	~6 GB active	~2–3 GB active	Yes (usable)	Good on RTX 3060+
Super 120B (12B active)	~24 GB active	~8–12 GB active	Slow	Needs RTX 4090 or better

Sweet Spot

For our use case: Nano 30B is the sweet spot. 3B active params, runs on modest hardware, handles routine tasks well. Route complex queries to Claude via API.

Cost Analysis

Current Stack Costs

Item	Monthly Cost
Anthropic Pro Plan	$20/mo
API overages (if any)	Variable
Total	~$20/mo

NemoClaw + Nemotron Added Costs

Item	Monthly Cost
Hardware (if buying DGX Spark)	$3,000 one-time
Hardware (if cloud VM)	$20–50/mo
Hardware (if existing EQR2)	$0
Nemotron models	Free (open-source)
NemoClaw software	Free (Apache 2.0)
OpenClaw software	Free (MIT)
Claude API for complex routing	Reduced — routine queries go to free Nemotron
Total (EQR2 deploy)	~$0 additional
Total (cloud VM)	~$20–50/mo additional

Cost Savings from Privacy Router

With Nemotron handling routine queries locally:

Simple questions, scheduling, reminders → Nemotron (free)
Home automation commands → Nemotron (free)
Family chat responses → Nemotron (free, private)
Only coding, analysis, complex reasoning → Claude API (paid)

Estimated Savings

60–80% of household queries could run locally on Nemotron, significantly reducing API costs if we move beyond the Pro plan flat rate.

Recommendation

Short Term (Now — Next 30 Days)

Action: Wait and Watch — Do Not Deploy Yet

NemoClaw is alpha with no third-party security audits
Monitor GitHub for security advisories and maturity signals
Continue building claude-auto for our always-on needs
Track the JFrog supply chain security integration

Medium Term (30–90 Days)

Action: Test Nemotron Locally on EQR1 or EQR2

Install Nemotron Nano 30B via Ollama — zero risk, just a local model
Benchmark it against Claude for routine household queries
Evaluate quality for: scheduling, reminders, home automation, simple Q&A
Determine if it's "good enough" for non-critical tasks

Long Term (90+ Days, After NemoClaw Matures)

Action: Evaluate NemoClaw Deployment If All Conditions Are Met

Third-party security audit is published
JFrog supply chain integration is live
Ashley confirms messaging (WhatsApp/Signal) is a real pain point
NemoClaw reaches beta or stable release
CISO agent review approves the deployment plan

What We Should NEVER Do

Hard Rules

Move Jack's allergy rules from rosey-bot to OpenClaw/NemoClaw
Deploy NemoClaw on EQR1 alongside production infrastructure
Install unvetted skills from ClawHub
Expose any port to the public internet
Share API credentials between our stack and the NemoClaw stack

The Hybrid Future

The ideal end state is a dual-stack architecture:

┌──────────────────────────────────────────────────────┐
│              DANIEL'S AI INFRASTRUCTURE               │
│                                                      │
│  ┌─────────────────────┐  ┌─────────────────────┐  │
│  │   CLAUDE CODE + COO │  │  NEMOCLAW + OPENCLAW  │  │
│  │                     │  │                       │  │
│  │  Coding             │  │  Family messaging     │  │
│  │  Finance            │  │  Home automation      │  │
│  │  Network mgmt       │  │  Voice / music        │  │
│  │  Security review    │  │  Scheduling / cron    │  │
│  │  Project mgmt       │  │  Quick lookups        │  │
│  │  Complex reasoning  │  │  Private queries      │  │
│  │                     │  │                       │  │
│  │  Model: Claude      │  │  Models: Nemotron     │  │
│  │  Interface: CLI     │  │  (local) + Claude     │  │
│  │                     │  │  (cloud, complex)     │  │
│  │  runs on: EQR1      │  │  Interface: WhatsApp  │  │
│  │                     │  │  Signal, Discord      │  │
│  │                     │  │                       │  │
│  │                     │  │  runs on: EQR2 or     │  │
│  │                     │  │  dedicated hardware   │  │
│  └─────────────────────┘  └─────────────────────┘  │
│                                                      │
│  ┌─────────────────────┐                             │
│  │     ROSEY-BOT       │  ← Allergy safety STAYS    │
│  │  Hardcoded rules    │    HERE. Never moves.       │
│  │  Meal plans         │                             │
│  │  Discord channels   │                             │
│  └─────────────────────┘                             │
└──────────────────────────────────────────────────────┘

Each component does what it's best at. No single point of failure. Jack's safety rules stay hardcoded. Private data stays local. Complex work uses the best model available.

Appendix: Sources

NVIDIA Nemotron

NVIDIA NemoClaw

OpenClaw (Prior Research)

Deep Research Report on Notion — 60+ sources documented in research-findings.md