Technical Briefing — OpenClaw Evaluation

NVIDIA Nemotron & NemoClaw

What They Are, Why They Matter, and How They Fit with OpenClaw

Prepared: 2026-03-18 | Author: COO / Claude Code Stack | Audience: Daniel Guterman — Technical Decision-Maker

Executive Summary

NVIDIA announced two products at GTC 2026 (March 16) that directly impact our OpenClaw evaluation:

Key Insight

Together, they transform the OpenClaw equation. Our original evaluation flagged OpenClaw as CRITICAL security risk. NemoClaw addresses the top concerns (network exposure, filesystem access, credential leakage). Nemotron provides free local inference, eliminating API costs for routine tasks.

Bottom Line

NemoClaw + Nemotron makes OpenClaw deployable in a way raw OpenClaw never was — but it's alpha software with no third-party audits yet, and our core objections (Jack's allergy safety, prompt-based security) still apply.


NVIDIA Nemotron — The Model Family

What Is It?

Nemotron is NVIDIA's family of open foundation models, spanning four generations since 2024. The current generation (Nemotron 3) introduces a breakthrough hybrid Mamba-Transformer Mixture-of-Experts architecture that delivers frontier-class quality at a fraction of the compute cost.

NVIDIA's strategy is clear: give away the models to sell the hardware. But the models are genuinely good, and the licensing is among the most permissive in the industry.

The Nemotron 3 Lineup

Model Total Params Active Params Context Window Target Use Case
Nano 4B4B~1B1M tokensEdge devices, mobile, IoT
Nano 30B30B3B1M tokensEfficient agent tasks, local workstations
Super 120B120B12B1M tokensMulti-agent workflows, complex reasoning
Ultra ~500B~500B~50B1M tokensFrontier reasoning (expected H1 2026)

Architecture Deep Dive

The Nemotron 3 architecture combines three paradigms that have individually proven successful:

1. Mamba-2 Layers (Linear-Time Sequence Processing)

2. Transformer Attention Layers (Precise Associative Recall)

3. Mixture-of-Experts (Parameter Efficiency)

Novel Innovation — Latent MoE

Compresses token representations before routing to experts. Enables 4x more expert specialists at the same inference cost. Think of it as "expert specialization on a budget."

Multi-Token Prediction (MTP)

Model predicts multiple future tokens simultaneously. Up to 3x wall-clock speedup for structured output (JSON, code, markdown). Particularly valuable for agent tool-calling patterns.

NVFP4 Native Training

First models trained natively in 4-bit floating point precision. Purpose-built for NVIDIA B200 GPUs. 4x memory and compute efficiency vs FP8 on H100. Means smaller GPUs can run larger models.

Training Pipeline

Phase 1 — Pretraining

Phase 2 — Supervised Fine-Tuning

Phase 3 — Multi-Environment Reinforcement Learning

Benchmark Performance

Nemotron 3 Super (120B / 12B active)

BenchmarkResultWhat It Measures
PinchBench85.6% (best open model in class)Agent reasoning and planning
AIME 2025Leading in size classAdvanced mathematics
SWE-Bench VerifiedLeading in size classReal-world software engineering
Terminal BenchLeading in size classCommand-line task completion
Throughput5x previous NemotronRaw inference speed

Historical: Llama-Nemotron 70B vs Competitors

BenchmarkNemotron 70BGPT-4oClaude 3.5 Sonnet
Arena Hard85.079.379.2
AlpacaEval 2 LC57.6
MT-Bench8.98
Aider (coding)55.0%72.9%
Honest Assessment

Nemotron wins on alignment/chat benchmarks. Claude and GPT-4o still lead on coding and complex reasoning. Nemotron 3 Super is more competitive on coding (SWE-Bench leading in class), but detailed head-to-head vs Claude Opus/Sonnet is not yet published.

Where Nemotron truly excels: Throughput. When you need many parallel agents doing moderate-complexity tasks, Nemotron's MoE architecture delivers more tokens per second per dollar than any competitor.

Specialized Variants

VariantPurpose
Nemotron 3 OmniMultimodal — audio + vision + language in one model
Nemotron 3 VoiceChatReal-time simultaneous listen-and-respond
Nemotron Nano VL 12BVision-language for image understanding
Nemotron RAGRetrieval and embedding (leading ViDoRe, MTEB leaderboards)
Nemotron SafetyContent moderation and guardrails
Nemotron SpeechAutomatic speech recognition and text-to-speech

Licensing

NVIDIA Open Model License:

Availability

PlatformAccess
Hugging FaceAll models (BF16, FP8 variants)
NVIDIA NIMAPI via build.nvidia.com
OllamaNemotron 3 Super for local inference
NeMo FrameworkFull training and fine-tuning
GitHubDeveloper assets at NVIDIA-NeMo/Nemotron

The Nemotron Coalition

Announced at GTC 2026 — a first-of-its-kind global collaboration:

Members: Black Forest Labs, Cursor, LangChain, Mistral AI, Perplexity, Reflection AI, Sarvam, Thinking Machines Lab

Goal: Collaboratively build the next generation of open frontier models across six families:

  1. Nemotron — Language
  2. Cosmos — World models / vision
  3. Isaac GR00T — Robotics
  4. Alpaymayo — Autonomous driving
  5. BioNeMo — Biology / chemistry
  6. Earth-2 — Weather / climate

Notable Adopters

Accenture, Cadence, CrowdStrike, Cursor, Deloitte, EY, Oracle Cloud Infrastructure, Palantir, Perplexity, ServiceNow, Siemens, Synopsys, Zoom


NVIDIA NemoClaw — The Security Wrapper

What Is It?

NemoClaw is an open-source software stack that wraps OpenClaw with enterprise-grade security, privacy, and isolation controls. It is not a separate agent — it is OpenClaw running inside NVIDIA's security infrastructure.

Jensen Huang, GTC 2026 keynote: "30 years of NVIDIA computing, distilled into an agent platform."

Peter Steinberger (OpenClaw creator, now at OpenAI): "With NVIDIA and the broader ecosystem, we're building the claws and guardrails that let anyone create powerful, secure AI assistants."

One-Command Install

curl -fsSL https://nvidia.com/nemoclaw.sh | bash

This installs:

Security Note

The curl | bash installation pattern is a security anti-pattern. Mitigate by reviewing the script content before running it. This alone should not be a blocker, but it's worth flagging.

Architecture

Two-component design:

ComponentLanguageRole
CLI PluginTypeScriptIntegrates with OpenClaw CLI, user-facing
BlueprintPythonOrchestrates OpenShell resources, manages sandbox

The Four-Layer Security Model

This is NemoClaw's core value proposition — the direct answer to OpenClaw's CRITICAL security rating.

Layer 1: Network Isolation

Layer 2: Filesystem Restrictions

Layer 3: Process Protection

Layer 4: Inference Routing (Privacy Router)

System Requirements

SpecMinimumRecommended
CPU4 vCPU4+ vCPU
RAM8 GB16 GB
Disk20 GB40 GB
OSUbuntu 22.04 LTS+Ubuntu 22.04 LTS+
RuntimeNode.js 20+, DockerNode.js 20+, Docker

Hardware agnostic — does not require NVIDIA GPUs (though optimized for them). Supported on: GeForce RTX PCs, RTX PRO workstations, DGX Station, DGX Spark, any Linux machine with Docker.

Release Status

DetailValue
AnnouncedMarch 16, 2026 (GTC keynote)
LicenseApache 2.0
GitHubgithub.com/NVIDIA/NemoClaw
Stars~6.7K (first 2 days)
Forks739
Contributors~26
StatusAlpha — "Expect rough edges"
Tech StackTypeScript 37.7%, Shell 30.6%, JS 25.7%, Python 4.9%
Alpha Warning

NVIDIA's own docs: "Interfaces, APIs, and behavior may change without notice as the design iterates."

Enterprise Partnerships

Being pursued for NemoClaw integrations: Salesforce, Cisco, Google, Adobe, CrowdStrike, SAP, JFrog (supply chain security)


OpenClaw — Quick Refresher

For full details, see our Deep Research Report on Notion.

AttributeDetail
WhatOpen-source autonomous AI agent (TypeScript/Node.js)
GitHub Stars234K+
LicenseMIT
CreatorPeter Steinberger (now at OpenAI)
GovernanceMoving to open-source foundation
RuntimeLong-lived Gateway daemon on port 18789
Messaging22+ platforms (WhatsApp, Signal, Telegram, Discord, iMessage, Slack, Teams, etc.)
AI Models20+ providers (Claude, GPT, Gemini, DeepSeek, Ollama, etc.)
Skills10,700+ community skills on ClawHub
Integrations50+ (chat, smart home, music, productivity, browser, cron)
Security RatingCRITICAL — 512 vulns, 8+ critical CVEs, 20% malicious marketplace skills

Why We Were Cautious

  1. 40K+ instances exposed on public internet — Gateway binds to 0.0.0.0
  2. ClawHavoc attack — 1,184 malicious skills in official marketplace (12–20% compromised)
  3. Prompt-based security — safety rules are instructions, not architectural boundaries
  4. Microsoft's warning: "Not appropriate to run on a standard personal or enterprise workstation"
  5. Jack's allergy rules — cannot safely move from hardcoded logic to prompt-based

The Full Stack: OpenClaw + NemoClaw + Nemotron

How They Fit Together

┌────────────────────────────────────────────────────┐
│                    USER INTERFACE                     │
│     WhatsApp  Signal  Telegram  Discord  iMessage    │
└──────────────────────┬──────────────────────┘
                       │
┌──────────────────────▼──────────────────────┐
│                    OPENCLAW                           │
│     Agent Runtime · Skills · Memory · Integrations   │
│     (TypeScript, Gateway daemon, port 18789)         │
└──────────────────────┬──────────────────────┘
                       │
┌──────────────────────▼──────────────────────┐
│                    NEMOCLAW                           │
│     Security Wrapper · Sandbox · Policy Engine       │
│                                                      │
│  ┌─────────────┐  ┌──────────────┐  ┌────────────┐  │
│  │   Network    │  │  Filesystem  │  │  Process   │  │
│  │  Isolation   │  │ Restrictions │  │ Protection │  │
│  │ (whitelist)  │  │ (/sandbox    │  │ (OpenShell │  │
│  │              │  │  /tmp only)  │  │  K3s)      │  │
│  └─────────────┘  └──────────────┘  └────────────┘  │
│                                                      │
│  ┌────────────────────────────────────────────┐    │
│  │           PRIVACY ROUTER                      │    │
│  │  Sensitive → Local    Non-sensitive → Cloud   │    │
│  └────────────────────────────────────────────┘    │
└──────────────────────┬──────────────────────┘
                       │
        ┌──────────────┼──────────────┐
        ▼              ▼              ▼
┌──────────────┐ ┌──────────┐ ┌──────────────┐
│   NEMOTRON   │ │  Claude  │ │   GPT / etc  │
│  (Local LLM) │ │  (Cloud) │ │   (Cloud)    │
│  Free, fast  │ │  Smart   │ │   Optional   │
│  Private     │ │  Capable │ │              │
└──────────────┘ └──────────┘ └──────────────┘

What Each Layer Provides

LayerProvidesWithout It
OpenClawAgent brain, messaging, skills, integrations, always-on daemonNo agent — just raw model APIs
NemoClawSecurity sandbox, network isolation, filesystem lock, privacy routingOpenClaw runs naked — CRITICAL risk
NemotronFree local inference, private data stays local, no API costs for routine tasksPay per token to cloud providers for everything

Why This Combination Matters

Before NemoClaw: Deploying OpenClaw required accepting CRITICAL security risk. Our evaluation said "do not deploy without full isolation" — which meant building your own sandbox, firewall rules, Docker hardening, and credential isolation manually.

After NemoClaw

NVIDIA built exactly the isolation we specified. One command gets you a sandboxed OpenClaw with: network whitelist (no more 40K exposed instances), filesystem jail (no SSH key / credential theft), process isolation (no container escape), and privacy routing (sensitive data stays on-device via Nemotron).

With Nemotron: Routine queries (scheduling, reminders, simple lookups, home automation) run on free local models. Only complex reasoning (coding, analysis, financial) routes to Claude. This dramatically reduces API costs and keeps private data off external servers.


Comparison: Our Stack vs the NVIDIA-OpenClaw Stack

Architecture Comparison

Dimension Our Stack (Claude Code + COO) NVIDIA Stack (OpenClaw + NemoClaw + Nemotron)
RuntimeEphemeral CLI sessionsAlways-on daemon (24/7)
InterfaceTerminal + Discord (limited)22+ messaging platforms
AI ModelClaude only (Anthropic)Multi-model (Claude + GPT + Gemini + local Nemotron)
Security ModelNo daemon = minimal attack surface4-layer sandbox (NemoClaw)
PrivacyAll queries go to Anthropic APIPrivacy router — sensitive stays local
CostPro plan + API usageNemotron free locally; API only for complex tasks
CodingBest-in-class (Claude Code)Weaker — Nemotron trails Claude on coding
OrchestrationC-suite agent hierarchy (COO/CTO/CFO/CISO/CMO)Flat — single agent with skills
MemoryFile-based + session persistenceSQLite vector + daily logs + MEMORY.md
Smart HomeHome Assistant MCPHome Assistant (same underlying)
Network MgmtUniFi MCP (direct UDM Pro control)No equivalent
FinancialMonarch Money MCP (real bank data)No equivalent
Food SafetyHardcoded allergy rules (rosey-bot)Prompt-based only — UNACCEPTABLE for Jack
VoiceNoneWake word, push-to-talk, TTS
MusicNoneSpotify, Sonos, Shazam
SchedulingManual (pending items only)Cron, scheduled automation
BrowserFirecrawl (scraping)Full Chromium CDP automation
MessagingDiscord + Mattermost onlyWhatsApp, Signal, Telegram, iMessage, Slack, Teams + 16 more

Where Each Stack Wins

Our Stack Wins

  • Software development and coding tasks
  • Multi-agent orchestration with domain expertise
  • Security review (CISO agent reviews before execution)
  • Financial tracking and analysis
  • Network infrastructure management
  • Food safety (hardcoded rules, not prompt-based)
  • Project isolation and memory management

NVIDIA Stack Wins

  • Always-on availability (daemon vs CLI sessions)
  • Messaging ubiquity (22+ platforms vs 2)
  • Voice interaction
  • Music control
  • Scheduled automation / cron
  • Multi-model flexibility
  • Privacy (local inference for sensitive data)
  • Cost efficiency (free local models for routine tasks)
  • Browser automation

Use Cases for Our Household

High-Value Use Cases (NemoClaw + Nemotron + OpenClaw)

1. Family Messaging Hub

Ashley, Valentina, and family members message the AI via WhatsApp or Signal (apps they already use). No need to install Discord or learn terminal commands. Example: Ashley texts "What's for dinner tonight?" → agent checks meal plan, confirms allergen safety, responds. Privacy router keeps family conversations on local Nemotron — never hits external APIs.

2. Always-On Home Automation

"Turn off the lights at 10pm every night." "If the garage door is open after 11pm, close it and tell me." Scheduled tasks that our ephemeral CLI sessions can't do. Integrates with existing Home Assistant setup.

3. Proactive Scheduling

Morning briefing: weather, calendar, commute, school schedule. Automatic reminders for appointments, medications, school events. Cron-based recurring tasks without human initiation.

4. Voice Interface

Wake word activation for hands-free queries while cooking, driving, etc. Push-to-talk for quick questions. TTS responses — useful when hands are busy.

5. Music Control

"Play Jordan's bedtime playlist on the nursery Sonos." Spotify/Sonos integration through natural language.

6. Local AI for Private Tasks

Journal entries, personal reflections, sensitive family discussions. Nemotron processes locally — nothing leaves the house. Medical questions routed to local model, not cloud APIs.

Use Cases Where Our Stack Remains Superior

  1. Software Development → Claude Code + developer/reviewer agents
  2. Financial Analysis → CFO agent + Monarch Money MCP
  3. Network Management → CTO agent + UniFi MCP
  4. Security Review → CISO agent reviews before execution
  5. Meal Planning → rosey-bot with hardcoded allergy rules (NEVER move to prompt-based)

The Hybrid Approach

Run both stacks, each doing what it's best at:

Task TypeHandled ByWhy
Coding, developmentClaude Code + COOBest-in-class coding, agent hierarchy
Finance, budgetsCFO agent + MonarchReal bank data, structured analysis
Network, infrastructureCTO agent + UniFiDirect hardware control
Security reviewCISO agentArchitectural review before execution
Meal planningrosey-botHardcoded allergy safety
Family messagingOpenClaw + NemoClaw22+ platforms, always-on
Home automationOpenClaw + HAScheduled, always-on
Voice, musicOpenClawNo equivalent in our stack
Private/sensitive queriesNemotron (local)Never leaves the machine
Quick lookups, remindersOpenClaw + NemotronFree, fast, local

Security Analysis

What NemoClaw Fixes

Original Risk Rating NemoClaw Mitigation Residual Risk
Network exposure (40K+ instances) CRITICAL Whitelist-only networking, no 0.0.0.0 binding LOW — if policy is correctly configured
Filesystem access (SSH keys, creds) CRITICAL Write-only to /sandbox and /tmp LOW — host filesystem isolated
Credential leakage to external APIs HIGH Privacy router, all API calls through OpenShell MEDIUM — depends on classification accuracy
Arbitrary code execution HIGH OpenShell K3s container, digest-verified blueprints LOW — container escape is hard
Prompt injection HIGH NOT ADDRESSED — still prompt-based security HIGH — fundamental architectural flaw
Malicious marketplace skills CRITICAL PARTIALLY ADDRESSED — JFrog partnership for supply chain MEDIUM — skill vetting still incomplete
Data at rest (memory stores PII) HIGH Sandbox isolation limits what's stored MEDIUM — data in /sandbox still unencrypted

What NemoClaw Does NOT Fix

Unfixed: Prompt Injection

The fundamental flaw. If a crafted message can hijack the agent's instructions, the sandbox doesn't help because the agent is already authorized to act. NemoClaw limits the blast radius but doesn't prevent the hijack.

Unfixed: Malicious Skills

ClawHub still has vetting issues. JFrog partnership is announced but not implemented. Installing community skills remains risky.

Unfixed: Jack's Allergy Safety (LIFE-SAFETY ISSUE)

Moving hardcoded allergy rules to prompt-based instructions is STILL unacceptable. A prompt injection could override "never recommend foods containing almonds, sesame, milk, eggs, or peanuts." This is a life-safety issue that sandboxing does not address.

Unfixed: Alpha Software

No third-party security audits. NVIDIA's security claims are design documents, not battle-tested facts.

Unfixed: curl | bash Install

The installation method itself is a security anti-pattern. Mitigated by reviewing the script before running, but still concerning.

Security Recommendation

If deploying NemoClaw + OpenClaw:


Hardware & Deployment Options

Option A: Docker on EQR2

SpecEQR2 CurrentRequirement
CPUTBD4+ vCPU
RAMTBD16 GB recommended
GPUNone requiredOptional (Nemotron runs on CPU, faster on GPU)
DiskTBD40 GB for NemoClaw + models
NetworkTailscaleAlready configured

Pros: Separate machine from main infrastructure, Tailscale already set up

Cons: May not have GPU for fast Nemotron inference

Option B: Dedicated VM on EQR1

Pros: EQR1 has resources, Docker available

Cons: Shares host with critical infrastructure. Adds attack surface to primary machine.

Not Recommended

Isolation is the whole point. Don't put the experiment next to production.

Option C: DGX Spark (New Hardware)

NVIDIA's new personal AI supercomputer. Designed specifically for NemoClaw + Nemotron.

Pros: Purpose-built, maximum Nemotron performance, dedicated hardware

Cons: $3,000, delivery timeline uncertain, may be overkill for evaluation

Option D: Cloud Instance

Spin up a cloud VM (any provider) with: 4 vCPU, 16GB RAM, 40GB disk, Docker pre-installed, Tailscale joined to our tailnet.

Pros: Zero hardware commitment, easy to tear down

Cons: Monthly cost, data leaves our network (partially offset by privacy router)

Nemotron Model Sizing for Local Inference

ModelVRAM (FP16)VRAM (Quantized)CPU-Only?Speed
Nano 4B~8 GB~2–4 GBYes (slow)Fast on any GPU
Nano 30B (3B active)~6 GB active~2–3 GB activeYes (usable)Good on RTX 3060+
Super 120B (12B active)~24 GB active~8–12 GB activeSlowNeeds RTX 4090 or better
Sweet Spot

For our use case: Nano 30B is the sweet spot. 3B active params, runs on modest hardware, handles routine tasks well. Route complex queries to Claude via API.


Cost Analysis

Current Stack Costs

ItemMonthly Cost
Anthropic Pro Plan$20/mo
API overages (if any)Variable
Total~$20/mo

NemoClaw + Nemotron Added Costs

ItemMonthly Cost
Hardware (if buying DGX Spark)$3,000 one-time
Hardware (if cloud VM)$20–50/mo
Hardware (if existing EQR2)$0
Nemotron modelsFree (open-source)
NemoClaw softwareFree (Apache 2.0)
OpenClaw softwareFree (MIT)
Claude API for complex routingReduced — routine queries go to free Nemotron
Total (EQR2 deploy)~$0 additional
Total (cloud VM)~$20–50/mo additional

Cost Savings from Privacy Router

With Nemotron handling routine queries locally:

Estimated Savings

60–80% of household queries could run locally on Nemotron, significantly reducing API costs if we move beyond the Pro plan flat rate.


Recommendation

Short Term (Now — Next 30 Days)

Action: Wait and Watch — Do Not Deploy Yet
  • NemoClaw is alpha with no third-party security audits
  • Monitor GitHub for security advisories and maturity signals
  • Continue building claude-auto for our always-on needs
  • Track the JFrog supply chain security integration

Medium Term (30–90 Days)

Action: Test Nemotron Locally on EQR1 or EQR2
  • Install Nemotron Nano 30B via Ollama — zero risk, just a local model
  • Benchmark it against Claude for routine household queries
  • Evaluate quality for: scheduling, reminders, home automation, simple Q&A
  • Determine if it's "good enough" for non-critical tasks

Long Term (90+ Days, After NemoClaw Matures)

Action: Evaluate NemoClaw Deployment If All Conditions Are Met
  1. Third-party security audit is published
  2. JFrog supply chain integration is live
  3. Ashley confirms messaging (WhatsApp/Signal) is a real pain point
  4. NemoClaw reaches beta or stable release
  5. CISO agent review approves the deployment plan

What We Should NEVER Do

Hard Rules
  • Move Jack's allergy rules from rosey-bot to OpenClaw/NemoClaw
  • Deploy NemoClaw on EQR1 alongside production infrastructure
  • Install unvetted skills from ClawHub
  • Expose any port to the public internet
  • Share API credentials between our stack and the NemoClaw stack

The Hybrid Future

The ideal end state is a dual-stack architecture:

┌──────────────────────────────────────────────────────┐
│              DANIEL'S AI INFRASTRUCTURE               │
│                                                      │
│  ┌─────────────────────┐  ┌─────────────────────┐  │
│  │   CLAUDE CODE + COO │  │  NEMOCLAW + OPENCLAW  │  │
│  │                     │  │                       │  │
│  │  Coding             │  │  Family messaging     │  │
│  │  Finance            │  │  Home automation      │  │
│  │  Network mgmt       │  │  Voice / music        │  │
│  │  Security review    │  │  Scheduling / cron    │  │
│  │  Project mgmt       │  │  Quick lookups        │  │
│  │  Complex reasoning  │  │  Private queries      │  │
│  │                     │  │                       │  │
│  │  Model: Claude      │  │  Models: Nemotron     │  │
│  │  Interface: CLI     │  │  (local) + Claude     │  │
│  │                     │  │  (cloud, complex)     │  │
│  │  runs on: EQR1      │  │  Interface: WhatsApp  │  │
│  │                     │  │  Signal, Discord      │  │
│  │                     │  │                       │  │
│  │                     │  │  runs on: EQR2 or     │  │
│  │                     │  │  dedicated hardware   │  │
│  └─────────────────────┘  └─────────────────────┘  │
│                                                      │
│  ┌─────────────────────┐                             │
│  │     ROSEY-BOT       │  ← Allergy safety STAYS    │
│  │  Hardcoded rules    │    HERE. Never moves.       │
│  │  Meal plans         │                             │
│  │  Discord channels   │                             │
│  └─────────────────────┘                             │
└──────────────────────────────────────────────────────┘

Each component does what it's best at. No single point of failure. Jack's safety rules stay hardcoded. Private data stays local. Complex work uses the best model available.


Appendix: Sources

OpenClaw (Prior Research)