-
-
Notifications
You must be signed in to change notification settings - Fork 3k
Description
hello. I`m trying to build from source, and cant get localai up on teslap40 vgpu under proxmox.
LocalAI version:
v3.6.0-215-g1e5b9135
commit 1e5b913 (HEAD -> master, origin/master, origin/HEAD)
Environment, CPU architecture, OS, and Version:
VM in proxmox with vgpu passed.
Profile GRID-P40-24Q (mdev=nvidia-53) num_heads=4, frl_config=60, framebuffer=24576M
4 cores from AMD Ryzen 9 5900X 12-Core Processor, processor type host.
Linux LLM-teslap40 6.1.0-38-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.147-1 (2025-08-02) x86_64 GNU/Linux
nvidia-smi
root@LLM-teslap40:/srv/localai/src/LocalAI# nvidia-smi
Fri Oct 31 00:29:41 2025
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.261.03 Driver Version: 535.261.03 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 GRID P40-24Q On | 00000000:00:10.0 Off | N/A |
| N/A N/A P0 N/A / N/A | 0MiB / 24576MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
Describe the bug
LocalAI didnt see my GPU.
If i got qemu default gpu i need to use LOCALAI_FORCE_META_BACKEND_CAPABILITY=nvidia to start
Now it is disabled, but
12:27AM INF LocalAI version: v3.6.0-215-g1e5b9135 (1e5b9135df80534e96cb0f33fd80b4ec5414cb86)
12:27AM INF Capability automatically detected, set LOCALAI_FORCE_META_BACKEND_CAPABILITY to override Capability=nvidia
12:27AM WRN VRAM is less than 4GB, defaulting to CPU. Set LOCALAI_FORCE_META_BACKEND_CAPABILITY to override
LocalAI didn`t see full VRAM.
To Reproduce
git clone https://github.com/go-skynet/LocalAI
cd LocalAI
make build
./local-ai
Logs
root@LLM-teslap40:/srv/localai/src/LocalAI# make build
go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@1958fcbe2ca8bd93af633f11e97d44e567e945af
go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34.2
mkdir -p pkg/grpc/proto
./protoc --experimental_allow_proto3_optional -Ibackend/ --go_out=pkg/grpc/proto/ --go_opt=paths=source_relative --go-grpc_out=pkg/grpc/proto/ --go-grpc_opt=paths=source_relative \
backend/backend.proto
I local-ai build info:
I BUILD_TYPE:
I GO_TAGS:
I LD_FLAGS: -s -w -X "github.com/mudler/LocalAI/internal.Version=v3.6.0-215-g1e5b9135" -X "github.com/mudler/LocalAI/internal.Commit=1e5b9135df80534e96cb0f33fd80b4ec5414cb86"
I UPX:
rm -rf local-ai || true
CGO_LDFLAGS="" go build -ldflags "-s -w -X "github.com/mudler/LocalAI/internal.Version=v3.6.0-215-g1e5b9135" -X "github.com/mudler/LocalAI/internal.Commit=1e5b9135df80534e96cb0f33fd80b4ec5414cb86"" -tags "" -o local-ai ./cmd/local-ai
root@LLM-teslap40:/srv/localai/src/LocalAI# ./local-ai
12:48AM INF Starting LocalAI using 4 threads, with models path: /srv/localai/src/LocalAI/models
12:48AM INF LocalAI version: v3.6.0-215-g1e5b9135 (1e5b9135df80534e96cb0f33fd80b4ec5414cb86)
12:48AM INF Capability automatically detected, set LOCALAI_FORCE_META_BACKEND_CAPABILITY to override Capability=nvidia
12:48AM WRN VRAM is less than 4GB, defaulting to CPU. Set LOCALAI_FORCE_META_BACKEND_CAPABILITY to override
12:48AM INF Preloading models from /srv/localai/src/LocalAI/models
Model name: qwen-image
Model name: qwen3-8b
12:48AM INF core/startup process completed!
12:48AM INF LocalAI API is listening! Please connect to the endpoint for API documentation. endpoint=http://0.0.0.0:8080
12:49AM INF Success ip=192.168.1.134 latency=533.876175ms method=POST status=200 url=/v1/chat/completions
12:49AM INF BackendLoader starting backend=llama-cpp modelID=qwen3-8b o.model=Qwen3-8B.Q4_K_M.gguf
12:49AM ERR failed starting/connecting to the gRPC service error="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:46609: connect: connection refused\""
12:49AM ERR Failed to load model qwen3-8b with backend llama-cpp error="failed to load model with internal loader: grpc service not ready" modelID=qwen3-8b
12:49AM ERR Stream ended with error: failed to load model with internal loader: grpc service not ready
Additional context
But its work in container with image localai/localai:master-gpu-nvidia-cuda12
log
root@LLM-teslap40:/srv/docker-localai# docker compose up -d
[+] Running 1/1
✔ Container docker-localai-localai-1 Started 3.1s
root@LLM-teslap40:/srv/docker-localai# docker compose logs -f
localai-1 | CPU info:
localai-1 | model name : AMD Ryzen 9 5900X 12-Core Processor
localai-1 | flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm rep_good nopl cpuid extd_apicid tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy svm cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw perfctr_core ssbd ibrs ibpb stibp vmmcall fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves clzero xsaveerptr wbnoinvd arat npt lbrv nrip_save tsc_scale vmcb_clean flushbyasid pausefilter pfthreshold v_vmsave_vmload vgif umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor fsrm arch_capabilities
localai-1 | CPU: AVX found OK
localai-1 | CPU: AVX2 found OK
localai-1 | CPU: no AVX512 found
localai-1 | 9:52PM INF Setting logging to info
localai-1 | 9:52PM INF Starting LocalAI using 4 threads, with models path: //models
localai-1 | 9:52PM INF LocalAI version: 5ce982b (5ce982b9c91851c82554d2eebf15f439b1caa7a9)
localai-1 | 9:52PM INF Preloading models from //models
localai-1 |
localai-1 | Model name: dreamshaper
localai-1 |
localai-1 |
localai-1 |
localai-1 | Model name: moondream2-20250414
localai-1 |
localai-1 |
localai-1 |
localai-1 | Model name: qwen3-4b
localai-1 |
localai-1 |
localai-1 | 9:52PM INF core/startup process completed!
localai-1 | 9:52PM INF LocalAI API is listening! Please connect to the endpoint for API documentation. endpoint=http://0.0.0.0:8080
localai-1 | 9:52PM WRN Client error ip=192.168.1.134 latency=1.520117ms method=GET status=401 url=/
localai-1 | 9:52PM INF Success ip=192.168.1.134 latency="10.529µs" method=GET status=200 url=/static/assets/highlightjs.css
localai-1 | 9:52PM INF Success ip=192.168.1.134 latency="22.58µs" method=GET status=200 url=/static/assets/highlightjs.js
localai-1 | 9:52PM INF Success ip=192.168.1.134 latency="25.468µs" method=GET status=200 url=/static/general.css
...
localai-1 | 9:53PM INF Success ip=192.168.1.134 latency=635.634738ms method=POST status=200 url=/v1/chat/completions
localai-1 | 9:53PM INF BackendLoader starting backend=llama-cpp modelID=qwen3-4b o.model=Qwen3-4B.Q4_K_M.gguf
localai-1 | 9:53PM INF Success ip=192.168.1.134 latency=636.851105ms method=POST status=200 url=/v1/chat/completions
localai-1 | 9:53PM INF Success ip=127.0.0.1 latency="30.249µs" method=GET status=200 url=/readyz
localai-1 | 9:54PM INF Success ip=127.0.0.1 latency="11.11µs" method=GET status=200 url=/readyz
docker-compose.yml
services:
localai:
# See https://localai.io/basics/container/#standard-container-images for
# a list of available container images (or build your own with the provided Dockerfile)
# Available images with CUDA, ROCm, SYCL, Vulkan
# Image list (quay.io): https://quay.io/repository/go-skynet/local-ai?tab=tags
# Image list (dockerhub): https://hub.docker.com/r/localai/localai
# For images with python backends, use:
#image: localai/localai:master-cublas-cuda12-ffmpeg
image: localai/localai:master-gpu-nvidia-cuda12
restart: always
# command:
# - ${MODEL_NAME:-gemma-3-4b-it-qat}
# - ${MULTIMODAL_MODEL:-moondream2-20250414}
# - ${IMAGE_MODEL:-sd-1.5-ggml}
# - granite-embedding-107m-multilingual
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/readyz"]
interval: 60s
timeout: 10m
retries: 120
ports:
- 8081:8080
environment:
- LOCALAI_SINGLE_ACTIVE_BACKEND=true
# - DEBUG=true
- API_KEY=apikeysecret
- REBUILD=true
volumes:
- ./volumes/models:/models
- ./volumes/backends:/backends
- ./volumes/models:/build/models
- ./volumes/backends:/build/backends
- ./volumes/images:/tmp/generated/images
# For images with python backends, use:
# image: localai/localai:master-cublas-cuda12-ffmpeg
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]