-
-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Description
LocalAI version: localai/localai:v3.9.0-gpu-nvidia-cuda-13
Environment, CPU architecture, OS, and Version:
Linux 6.17.0-8-generic #8-Ubuntu SMP PREEMPT_DYNAMIC Fri Nov 14 21:44:46 UTC 2025 x86_64 GNU/Linux
AMD CPU
Nvidia RTX 3060 GPU
Ubuntu 25.10
Kubernetes (RKE2 v1.35.0+rke2r1), using Nvidia k8s-device-plugin with timeSlicing GPU sharing method
Describe the bug
When LocalAI stops a model (either manual or LRU), the child process is not killed, and thus the VRAM is still allocated.
To Reproduce
- Start up a model using any backend (llama or stablediffusion)
- Click the stop model button
- Observe that the VRAM is still shown as allocated in the LocalAI GUI. Also observe that
nvidia-smion the host shows a child process still running and holding the VRAM.
Expected behavior
When LocalAI stops a model, the child process should stop, and the VRAM should be freed.
Logs
After stopping a model, the GPU use in the GUI stays the same, but no models are listed below it:
nvidia-smi shows a child process of LocalAI still running:
Sat Jan 10 17:37:10 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.95.05 Driver Version: 580.95.05 CUDA Version: 13.0 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 3060 Off | 00000000:01:00.0 Off | N/A |
| 0% 56C P8 15W / 170W | 6061MiB / 12288MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 445091 C ...ds/cuda13-llama-cpp/lib/ld.so 6052MiB |
+-----------------------------------------------------------------------------------------+
In this case, PID 445091 is a child of the LocalAI parent process.
Additional context
I can "fix" this by restarting the pod, but for some reason LocalAI is not killing its child processes in my setup when a model is unloaded. This ties up VRAM and keeps me from starting other models.