NVIDIA NIM — fertig eingerichtet

Du musst nichts konfigurieren. Der API-Key ist im Stack hinterlegt und wird automatisch durchgereicht an:

Tool	Wie nutzen	Endpoint
Open WebUI (`/`-Subdomain `ai.`)	Modell-Dropdown links oben → NVIDIA-Modelle erscheinen automatisch	OpenAI-kompatibel
n8n (`n8n.`)	Im Workflow den OpenAI-Node nehmen, Base URL = `https://integrate.api.nvidia.com/v1`, API-Key = `={{ $env.NVIDIA_API_KEY }}`	OpenAI-kompatibel
CLI / eigene Scripts	Bearer-Token aus `/opt/hermes-stack/.env` (Variable `NVIDIA_API_KEY`)	`https://integrate.api.nvidia.com/v1`

Welche Modelle gibt's?

NIM bietet u.a. (alle frei nutzbar mit deinem Key):

Chat / Reasoning: meta/llama-3.3-70b-instruct, meta/llama-3.1-nemotron-70b-instruct, nvidia/nemotron-4-340b-instruct, mistralai/mistral-large-2-instruct, qwen/qwen2.5-coder-32b-instruct
Vision: meta/llama-3.2-90b-vision-instruct, microsoft/phi-3.5-vision-instruct
Code: qwen/qwen2.5-coder-32b-instruct, nvidia/usdcode-llama3-70b-instruct
Embeddings: nvidia/nv-embedqa-e5-v5, nvidia/nv-embed-v1
Rerank: nvidia/llama-3.2-nv-rerankqa-1b-v2
Speech / Audio / Video: über separate NIM-Endpoints, in NVIDIA-Console nachschauen

Die volle Liste findest du unter https://build.nvidia.com/models.

Quick-Test in Open WebUI

Öffne https://ai.168-119-232-59.sslip.io
Account anlegen (erste Person wird Admin)
Oben links Modell auswählen → es sollten dutzende NVIDIA-Modelle erscheinen
Lieblings-Pick: meta/llama-3.3-70b-instruct für allgemeine Chats, qwen/qwen2.5-coder-32b-instruct für Code

Quick-Test in n8n

Öffne https://n8n.168-119-232-59.sslip.io
Erste Person legt den Owner-Account an
Neuer Workflow → Node OpenAI → Create Credential:
- API Key Feld: klick auf den Gear, „Add Expression" → ={{ $env.NVIDIA_API_KEY }}
- Base URL: https://integrate.api.nvidia.com/v1
Im Node Model z.B. meta/llama-3.3-70b-instruct eintragen → Run.

Quick-Test direkt (curl)

ssh root@168.119.232.59
source /opt/hermes-stack/.env
curl -s https://integrate.api.nvidia.com/v1/chat/completions \
  -H "Authorization: Bearer $NVIDIA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"meta/llama-3.3-70b-instruct","messages":[{"role":"user","content":"Sag Hallo auf Bayrisch."}]}' | jq -r '.choices[0].message.content'

Troubleshooting

Open WebUI zeigt keine NVIDIA-Modelle: einmal ausloggen + einloggen, oder im Admin-Panel → Settings → Connections → Refresh.
n8n: {{ $env.NVIDIA_API_KEY }} ist leer: Container nach Env-Änderung neu starten:
ssh root@168.119.232.59 "cd /opt/hermes-stack && docker compose up -d --force-recreate n8n"
HTTP 401: Key in /opt/hermes-stack/.env prüfen, dann docker compose up -d --force-recreate open-webui n8n.