Subsections of References
Shell Completion
LocalAI provides shell completion support for bash, zsh, and fish shells. Once installed, tab completion works for all CLI commands, subcommands, and flags.
Generating Completion Scripts
Use the completion subcommand to generate a completion script for your shell:
Installation
Bash
Add the following to your ~/.bashrc:
Or install it system-wide:
Zsh
Add the following to your ~/.zshrc:
Or install it to a completions directory:
If shell completions are not already enabled in your zsh environment, add the following to the beginning of your ~/.zshrc:
Fish
Or install it permanently:
Usage
After installation, restart your shell or source your shell configuration file. Then type local-ai followed by a tab to see available commands:
Tab completion also works for subcommands and flags:
System Info and Version
LocalAI provides endpoints to inspect the running instance, including available backends, loaded models, and version information.
System Information
- Method:
GET - Endpoint:
/system
Returns available backends and currently loaded models.
Response
| Field | Type | Description |
|---|---|---|
backends | array | List of available backend names (strings) |
loaded_models | array | List of currently loaded models |
loaded_models[].id | string | Model identifier |
Usage
Example response
Version
- Method:
GET - Endpoint:
/version
Returns the LocalAI version and build commit.
Response
| Field | Type | Description |
|---|---|---|
version | string | Version string in the format version (commit) |
Usage
Example response
Error Responses
| Status Code | Description |
|---|---|
| 500 | Internal server error |
Model compatibility table
Besides llama based models, LocalAI is compatible also with other architectures. The table below lists all the backends, compatible models families and the associated repository.
Note
LocalAI will attempt to automatically load models which are not explicitly configured for a specific backend. You can specify the backend to use by configuring a model with a YAML file. See the advanced section for more details.
Text Generation & Language Models
| Backend and Bindings | Compatible models | Completion/Chat endpoint | Capability | Embeddings support | Token stream support | Acceleration |
|---|---|---|---|---|---|---|
| llama.cpp | LLama, Mamba, RWKV, Falcon, Starcoder, GPT-2, and many others | yes | GPT and Functions | yes | yes | CUDA 12/13, ROCm, Intel SYCL, Vulkan, Metal, CPU |
| vLLM | Various GPTs and quantization formats | yes | GPT | no | no | CUDA 12/13, ROCm, Intel |
| transformers | Various GPTs and quantization formats | yes | GPT, embeddings, Audio generation | yes | yes* | CUDA 12/13, ROCm, Intel, CPU |
| MLX | Various LLMs | yes | GPT | no | no | Metal (Apple Silicon) |
| MLX-VLM | Vision-Language Models | yes | Multimodal GPT | no | no | Metal (Apple Silicon) |
| vllm-omni | vLLM Omni multimodal | yes | Multimodal GPT | no | no | CUDA 12/13, ROCm, Intel |
| langchain-huggingface | Any text generators available on HuggingFace through API | yes | GPT | no | no | N/A |
Audio & Speech Processing
| Backend and Bindings | Compatible models | Completion/Chat endpoint | Capability | Embeddings support | Token stream support | Acceleration |
|---|---|---|---|---|---|---|
| whisper.cpp | whisper | no | Audio transcription | no | no | CUDA 12/13, ROCm, Intel SYCL, Vulkan, CPU |
| faster-whisper | whisper | no | Audio transcription | no | no | CUDA 12/13, ROCm, Intel, CPU |
| piper (binding) | Any piper onnx model | no | Text to voice | no | no | CPU |
| coqui | Coqui TTS | no | Audio generation and Voice cloning | no | no | CUDA 12/13, ROCm, Intel, CPU |
| kokoro | Kokoro TTS | no | Text-to-speech | no | no | CUDA 12/13, ROCm, Intel, CPU |
| chatterbox | Chatterbox TTS | no | Text-to-speech | no | no | CUDA 12/13, CPU |
| kitten-tts | Kitten TTS | no | Text-to-speech | no | no | CPU |
| silero-vad with Golang bindings | Silero VAD | no | Voice Activity Detection | no | no | CPU |
| neutts | NeuTTSAir | no | Text-to-speech with voice cloning | no | no | CUDA 12/13, ROCm, CPU |
| vibevoice | VibeVoice-Realtime | no | Real-time text-to-speech with voice cloning | no | no | CUDA 12/13, ROCm, Intel, CPU |
| pocket-tts | Pocket TTS | no | Lightweight CPU-based text-to-speech with voice cloning | no | no | CUDA 12/13, ROCm, Intel, CPU |
| mlx-audio | MLX | no | Text-tospeech | no | no | Metal (Apple Silicon) |
| nemo | NeMo speech models | no | Speech models | no | no | CUDA 12/13, ROCm, Intel, CPU |
| outetts | OuteTTS | no | Text-to-speech with voice cloning | no | no | CUDA 12/13, CPU |
| faster-qwen3-tts | Faster Qwen3 TTS | no | Fast text-to-speech | no | no | CUDA 12/13, ROCm, Intel, CPU |
| qwen-asr | Qwen ASR | no | Automatic speech recognition | no | no | CUDA 12/13, ROCm, Intel, CPU |
| voxcpm | VoxCPM | no | Speech understanding | no | no | CUDA 12/13, Metal, CPU |
| whisperx | WhisperX | no | Enhanced transcription | no | no | CUDA 12/13, ROCm, Intel, CPU |
Image & Video Generation
| Backend and Bindings | Compatible models | Completion/Chat endpoint | Capability | Embeddings support | Token stream support | Acceleration |
|---|---|---|---|---|---|---|
| stablediffusion.cpp | stablediffusion-1, stablediffusion-2, stablediffusion-3, flux, PhotoMaker | no | Image | no | no | CUDA 12/13, Intel SYCL, Vulkan, CPU |
| diffusers | SD, various diffusion models,… | no | Image/Video generation | no | no | CUDA 12/13, ROCm, Intel, Metal, CPU |
| transformers-musicgen | MusicGen | no | Audio generation | no | no | CUDA, CPU |
Specialized AI Tasks
| Backend and Bindings | Compatible models | Completion/Chat endpoint | Capability | Embeddings support | Token stream support | Acceleration |
|---|---|---|---|---|---|---|
| rfdetr | RF-DETR | no | Object Detection | no | no | CUDA 12/13, Intel, CPU |
| rerankers | Reranking API | no | Reranking | no | no | CUDA 12/13, ROCm, Intel, CPU |
| local-store | Vector database | no | Vector storage | yes | no | CPU |
| huggingface | HuggingFace API models | yes | Various AI tasks | yes | yes | API-based |
Acceleration Support Summary
GPU Acceleration
- NVIDIA CUDA: CUDA 12.0, CUDA 13.0 support across most backends
- AMD ROCm: HIP-based acceleration for AMD GPUs
- Intel oneAPI: SYCL-based acceleration for Intel GPUs (F16/F32 precision)
- Vulkan: Cross-platform GPU acceleration
- Metal: Apple Silicon GPU acceleration (M1/M2/M3+)
Specialized Hardware
- NVIDIA Jetson (L4T CUDA 12): ARM64 support for embedded AI (AGX Orin, Jetson Nano, Jetson Xavier NX, Jetson AGX Xavier)
- NVIDIA Jetson (L4T CUDA 13): ARM64 support for embedded AI (DGX Spark)
- Apple Silicon: Native Metal acceleration for Mac M1/M2/M3+
- Darwin x86: Intel Mac support
CPU Optimization
- AVX/AVX2/AVX512: Advanced vector extensions for x86
- Quantization: 4-bit, 5-bit, 8-bit integer quantization support
- Mixed Precision: F16/F32 mixed precision support
Note: any backend name listed above can be used in the backend field of the model configuration file (See the advanced section).
- * Only for CUDA and OpenVINO CPU/XPU acceleration.
Architecture
LocalAI is an API written in Go that serves as an OpenAI shim, enabling software already developed with OpenAI SDKs to seamlessly integrate with LocalAI. It can be effortlessly implemented as a substitute, even on consumer-grade hardware. This capability is achieved by employing various C++ backends, including ggml, to perform inference on LLMs using both CPU and, if desired, GPU. Internally LocalAI backends are just gRPC server, indeed you can specify and build your own gRPC server and extend LocalAI in runtime as well. It is possible to specify external gRPC server and/or binaries that LocalAI will manage internally.
LocalAI uses a mixture of backends written in various languages (C++, Golang, Python, …). You can check the model compatibility table to learn about all the components of LocalAI.
Backstory
As much as typical open source projects starts, I, mudler, was fiddling around with llama.cpp over my long nights and wanted to have a way to call it from go, as I am a Golang developer and use it extensively. So I’ve created LocalAI (or what was initially known as llama-cli) and added an API to it.
But guess what? The more I dived into this rabbit hole, the more I realized that I had stumbled upon something big. With all the fantastic C++ projects floating around the community, it dawned on me that I could piece them together to create a full-fledged OpenAI replacement. So, ta-da! LocalAI was born, and it quickly overshadowed its humble origins.
Now, why did I choose to go with C++ bindings, you ask? Well, I wanted to keep LocalAI snappy and lightweight, allowing it to run like a champ on any system and avoid any Golang penalties of the GC, and, most importantly built on shoulders of giants like llama.cpp. Go is good at backends and API and is easy to maintain. And hey, don’t forget that I’m all about sharing the love. That’s why I made LocalAI MIT licensed, so everyone can hop on board and benefit from it.
As if that wasn’t exciting enough, as the project gained traction, mkellerman and Aisuko jumped in to lend a hand. mkellerman helped set up some killer examples, while Aisuko is becoming our community maestro. The community now is growing even more with new contributors and users, and I couldn’t be happier about it!
Oh, and let’s not forget the real MVP hereβllama.cpp. Without this extraordinary piece of software, LocalAI wouldn’t even exist. So, a big shoutout to the community for making this magic happen!
CLI Reference
Complete reference for all LocalAI command-line interface (CLI) parameters and environment variables.
Note: All CLI flags can also be set via environment variables. Environment variables take precedence over CLI flags. See .env files for configuration file support.
Global Flags
| Parameter | Default | Description | Environment Variable |
|---|---|---|---|
-h, --help | Show context-sensitive help | ||
--log-level | info | Set the level of logs to output [error,warn,info,debug,trace] | $LOCALAI_LOG_LEVEL |
--debug | false | DEPRECATED - Use --log-level=debug instead. Enable debug logging | $LOCALAI_DEBUG, $DEBUG |
Storage Flags
| Parameter | Default | Description | Environment Variable |
|---|---|---|---|
--models-path | BASEPATH/models | Path containing models used for inferencing | $LOCALAI_MODELS_PATH, $MODELS_PATH |
--data-path | BASEPATH/data | Path for persistent data (collectiondb, agent state, tasks, jobs). Separates mutable data from configuration | $LOCALAI_DATA_PATH |
--generated-content-path | /tmp/generated/content | Location for assets generated by backends (e.g. stablediffusion, images, audio, videos) | $LOCALAI_GENERATED_CONTENT_PATH, $GENERATED_CONTENT_PATH |
--upload-path | /tmp/localai/upload | Path to store uploads from files API | $LOCALAI_UPLOAD_PATH, $UPLOAD_PATH |
--localai-config-dir | BASEPATH/configuration | Directory for dynamic loading of certain configuration files (currently runtime_settings.json, api_keys.json, and external_backends.json). See Runtime Settings for web-based configuration. | $LOCALAI_CONFIG_DIR |
--localai-config-dir-poll-interval | Time duration to poll the LocalAI Config Dir if your system has broken fsnotify events (example: 1m) | $LOCALAI_CONFIG_DIR_POLL_INTERVAL | |
--models-config-file | YAML file containing a list of model backend configs (alias: --config-file) | $LOCALAI_MODELS_CONFIG_FILE, $CONFIG_FILE |
Backend Flags
| Parameter | Default | Description | Environment Variable |
|---|---|---|---|
--backends-path | BASEPATH/backends | Path containing backends used for inferencing | $LOCALAI_BACKENDS_PATH, $BACKENDS_PATH |
--backends-system-path | /var/lib/local-ai/backends | Path containing system backends used for inferencing | $LOCALAI_BACKENDS_SYSTEM_PATH, $BACKEND_SYSTEM_PATH |
--external-backends | A list of external backends to load from gallery on boot | $LOCALAI_EXTERNAL_BACKENDS, $EXTERNAL_BACKENDS | |
--external-grpc-backends | A list of external gRPC backends (format: BACKEND_NAME:URI) | $LOCALAI_EXTERNAL_GRPC_BACKENDS, $EXTERNAL_GRPC_BACKENDS | |
--backend-galleries | JSON list of backend galleries | $LOCALAI_BACKEND_GALLERIES, $BACKEND_GALLERIES | |
--autoload-backend-galleries | true | Automatically load backend galleries on startup | $LOCALAI_AUTOLOAD_BACKEND_GALLERIES, $AUTOLOAD_BACKEND_GALLERIES |
--parallel-requests | false | Enable backends to handle multiple requests in parallel if they support it (e.g.: llama.cpp or vllm) | $LOCALAI_PARALLEL_REQUESTS, $PARALLEL_REQUESTS |
--max-active-backends | 0 | Maximum number of active backends (loaded models). When exceeded, the least recently used model is evicted. Set to 0 for unlimited, 1 for single-backend mode | $LOCALAI_MAX_ACTIVE_BACKENDS, $MAX_ACTIVE_BACKENDS |
--single-active-backend | false | DEPRECATED - Use --max-active-backends=1 instead. Allow only one backend to be run at a time | $LOCALAI_SINGLE_ACTIVE_BACKEND, $SINGLE_ACTIVE_BACKEND |
--preload-backend-only | false | Do not launch the API services, only the preloaded models/backends are started (useful for multi-node setups) | $LOCALAI_PRELOAD_BACKEND_ONLY, $PRELOAD_BACKEND_ONLY |
--enable-watchdog-idle | false | Enable watchdog for stopping backends that are idle longer than the watchdog-idle-timeout | $LOCALAI_WATCHDOG_IDLE, $WATCHDOG_IDLE |
--watchdog-idle-timeout | 15m | Threshold beyond which an idle backend should be stopped | $LOCALAI_WATCHDOG_IDLE_TIMEOUT, $WATCHDOG_IDLE_TIMEOUT |
--enable-watchdog-busy | false | Enable watchdog for stopping backends that are busy longer than the watchdog-busy-timeout | $LOCALAI_WATCHDOG_BUSY, $WATCHDOG_BUSY |
--watchdog-busy-timeout | 5m | Threshold beyond which a busy backend should be stopped | $LOCALAI_WATCHDOG_BUSY_TIMEOUT, $WATCHDOG_BUSY_TIMEOUT |
--watchdog-interval | 500ms | Interval between watchdog checks (e.g., 500ms, 5s, 1m) | $LOCALAI_WATCHDOG_INTERVAL, $WATCHDOG_INTERVAL |
--force-eviction-when-busy | false | Force eviction even when models have active API calls (default: false for safety). Warning: Enabling this can interrupt active requests | $LOCALAI_FORCE_EVICTION_WHEN_BUSY, $FORCE_EVICTION_WHEN_BUSY |
--lru-eviction-max-retries | 30 | Maximum number of retries when waiting for busy models to become idle before eviction | $LOCALAI_LRU_EVICTION_MAX_RETRIES, $LRU_EVICTION_MAX_RETRIES |
--lru-eviction-retry-interval | 1s | Interval between retries when waiting for busy models to become idle (e.g., 1s, 2s) | $LOCALAI_LRU_EVICTION_RETRY_INTERVAL, $LRU_EVICTION_RETRY_INTERVAL |
For more information on VRAM management, see VRAM and Memory Management.
Models Flags
| Parameter | Default | Description | Environment Variable |
|---|---|---|---|
--galleries | JSON list of galleries | $LOCALAI_GALLERIES, $GALLERIES | |
--autoload-galleries | true | Automatically load galleries on startup | $LOCALAI_AUTOLOAD_GALLERIES, $AUTOLOAD_GALLERIES |
--preload-models | A list of models to apply in JSON at start | $LOCALAI_PRELOAD_MODELS, $PRELOAD_MODELS | |
--models | A list of model configuration URLs to load | $LOCALAI_MODELS, $MODELS | |
--preload-models-config | A list of models to apply at startup. Path to a YAML config file | $LOCALAI_PRELOAD_MODELS_CONFIG, $PRELOAD_MODELS_CONFIG | |
--load-to-memory | A list of models to load into memory at startup | $LOCALAI_LOAD_TO_MEMORY, $LOAD_TO_MEMORY |
Note: You can also pass model configuration URLs as positional arguments:
local-ai run MODEL_URL1 MODEL_URL2 ...
Performance Flags
| Parameter | Default | Description | Environment Variable |
|---|---|---|---|
--f16 | false | Enable GPU acceleration | $LOCALAI_F16, $F16 |
-t, --threads | Number of threads used for parallel computation. Usage of the number of physical cores in the system is suggested | $LOCALAI_THREADS, $THREADS | |
--context-size | Default context size for models | $LOCALAI_CONTEXT_SIZE, $CONTEXT_SIZE |
API Flags
| Parameter | Default | Description | Environment Variable |
|---|---|---|---|
--address | :8080 | Bind address for the API server | $LOCALAI_ADDRESS, $ADDRESS |
--cors | false | Enable CORS (Cross-Origin Resource Sharing) | $LOCALAI_CORS, $CORS |
--cors-allow-origins | Comma-separated list of allowed CORS origins | $LOCALAI_CORS_ALLOW_ORIGINS, $CORS_ALLOW_ORIGINS | |
--csrf | false | Enable Fiber CSRF middleware | $LOCALAI_CSRF |
--upload-limit | 15 | Default upload-limit in MB | $LOCALAI_UPLOAD_LIMIT, $UPLOAD_LIMIT |
--api-keys | List of API Keys to enable API authentication. When this is set, all requests must be authenticated with one of these API keys | $LOCALAI_API_KEY, $API_KEY | |
--disable-webui | false | Disables the web user interface. When set to true, the server will only expose API endpoints without serving the web interface | $LOCALAI_DISABLE_WEBUI, $DISABLE_WEBUI |
--disable-runtime-settings | false | Disables the runtime settings feature. When set to true, the server will not load runtime settings from the runtime_settings.json file and the settings web interface will be disabled | $LOCALAI_DISABLE_RUNTIME_SETTINGS, $DISABLE_RUNTIME_SETTINGS |
--disable-gallery-endpoint | false | Disable the gallery endpoints | $LOCALAI_DISABLE_GALLERY_ENDPOINT, $DISABLE_GALLERY_ENDPOINT |
--disable-metrics-endpoint | false | Disable the /metrics endpoint | $LOCALAI_DISABLE_METRICS_ENDPOINT, $DISABLE_METRICS_ENDPOINT |
--machine-tag | If not empty, add that string to Machine-Tag header in each response. Useful to track response from different machines using multiple P2P federated nodes | $LOCALAI_MACHINE_TAG, $MACHINE_TAG |
Hardening Flags
| Parameter | Default | Description | Environment Variable |
|---|---|---|---|
--disable-predownload-scan | false | If true, disables the best-effort security scanner before downloading any files | $LOCALAI_DISABLE_PREDOWNLOAD_SCAN |
--opaque-errors | false | If true, all error responses are replaced with blank 500 errors. This is intended only for hardening against information leaks and is normally not recommended | $LOCALAI_OPAQUE_ERRORS |
--use-subtle-key-comparison | false | If true, API Key validation comparisons will be performed using constant-time comparisons rather than simple equality. This trades off performance on each request for resilience against timing attacks | $LOCALAI_SUBTLE_KEY_COMPARISON |
--disable-api-key-requirement-for-http-get | false | If true, a valid API key is not required to issue GET requests to portions of the web UI. This should only be enabled in secure testing environments | $LOCALAI_DISABLE_API_KEY_REQUIREMENT_FOR_HTTP_GET |
--http-get-exempted-endpoints | ^/$,^/browse/?$,^/talk/?$,^/p2p/?$,^/chat/?$,^/image/?$,^/text2image/?$,^/tts/?$,^/static/.*$,^/swagger.*$ | If --disable-api-key-requirement-for-http-get is overridden to true, this is the list of endpoints to exempt. Only adjust this in case of a security incident or as a result of a personal security posture review | $LOCALAI_HTTP_GET_EXEMPTED_ENDPOINTS |
P2P Flags
| Parameter | Default | Description | Environment Variable |
|---|---|---|---|
--p2p | false | Enable P2P mode | $LOCALAI_P2P, $P2P |
--p2p-dht-interval | 360 | Interval for DHT refresh (used during token generation) | $LOCALAI_P2P_DHT_INTERVAL, $P2P_DHT_INTERVAL |
--p2p-otp-interval | 9000 | Interval for OTP refresh (used during token generation) | $LOCALAI_P2P_OTP_INTERVAL, $P2P_OTP_INTERVAL |
--p2ptoken | Token for P2P mode (optional) | $LOCALAI_P2P_TOKEN, $P2P_TOKEN, $TOKEN | |
--p2p-network-id | Network ID for P2P mode, can be set arbitrarily by the user for grouping a set of instances | $LOCALAI_P2P_NETWORK_ID, $P2P_NETWORK_ID | |
--federated | false | Enable federated instance | $LOCALAI_FEDERATED, $FEDERATED |
Other Commands
LocalAI supports several subcommands beyond run:
local-ai models- Manage LocalAI models and definitionslocal-ai backends- Manage LocalAI backends and definitionslocal-ai tts- Convert text to speechlocal-ai sound-generation- Generate audio files from text or audiolocal-ai transcript- Convert audio to textlocal-ai worker- Run workers to distribute workload (llama.cpp-only)local-ai util- Utility commandslocal-ai explorer- Run P2P explorerlocal-ai federated- Run LocalAI in federated mode
Use local-ai <command> --help for more information on each command.
Examples
Basic Usage
Environment Variables
Advanced Configuration
Related Documentation
- See Advanced Usage for configuration examples
- See VRAM and Memory Management for memory management options
API Error Reference
This page documents the error responses returned by the LocalAI API. LocalAI supports multiple API formats (OpenAI, Anthropic, Open Responses), each with its own error structure.
Error Response Formats
OpenAI-Compatible Format
Most endpoints return errors using the OpenAI-compatible format:
| Field | Type | Description |
|---|---|---|
code | integer|string | HTTP status code or error code string |
message | string | Human-readable error description |
type | string | Error category (e.g., invalid_request_error) |
param | string|null | The parameter that caused the error, if applicable |
This format is used by: /v1/chat/completions, /v1/completions, /v1/embeddings, /v1/images/generations, /v1/audio/transcriptions, /models, and other OpenAI-compatible endpoints.
Anthropic Format
The /v1/messages endpoint returns errors in Anthropic’s format:
| Field | Type | Description |
|---|---|---|
type | string | Always "error" for error responses |
error.type | string | invalid_request_error or api_error |
error.message | string | Human-readable error description |
Open Responses Format
The /v1/responses endpoint returns errors with this structure:
| Field | Type | Description |
|---|---|---|
type | string | One of: invalid_request, not_found, server_error, model_error, invalid_request_error |
message | string | Human-readable error description |
code | string | Optional error code |
param | string | The parameter that caused the error, if applicable |
HTTP Status Codes
| Code | Meaning | When It Occurs |
|---|---|---|
| 400 | Bad Request | Invalid input, missing required fields, malformed JSON |
| 401 | Unauthorized | Missing or invalid API key |
| 404 | Not Found | Model or resource does not exist |
| 409 | Conflict | Resource already exists (e.g., duplicate token) |
| 422 | Unprocessable Entity | Validation failed (e.g., invalid parameter range) |
| 500 | Internal Server Error | Backend inference failure, unexpected server errors |
Global Error Handling
Authentication Errors (401)
When API keys are configured (via LOCALAI_API_KEY or --api-keys), all requests must include a valid key. Keys can be provided through:
Authorization: Bearer <key>headerx-api-key: <key>headerxi-api-key: <key>headertokencookie
Example request without a key:
Error response:
The response also includes the header WWW-Authenticate: Bearer.
Request Parsing Errors (400)
All endpoints return a 400 error if the request body cannot be parsed:
Not Found (404)
Requests to undefined routes return:
Opaque Errors Mode
When LOCALAI_OPAQUE_ERRORS=true is set, all error responses return an empty body with only the HTTP status code. This is a security hardening option that prevents information leaks.
Per-Endpoint Error Scenarios
Chat Completions β POST /v1/chat/completions
| Status | Cause | Example Message |
|---|---|---|
| 400 | Invalid or malformed request body | Bad Request |
| 400 | Model not found in configuration | Bad Request |
| 500 | Backend inference failure | Internal Server Error |
See also: Text Generation
Completions β POST /v1/completions
| Status | Cause | Example Message |
|---|---|---|
| 400 | Invalid or malformed request body | Bad Request |
| 500 | Backend inference failure | Internal Server Error |
Embeddings β POST /v1/embeddings
| Status | Cause | Example Message |
|---|---|---|
| 400 | Invalid or malformed request body | Bad Request |
| 400 | Model not found in configuration | Bad Request |
| 500 | Backend inference failure | Internal Server Error |
See also: Embeddings
Image Generation β POST /v1/images/generations
| Status | Cause | Example Message |
|---|---|---|
| 400 | Invalid or malformed request body | Bad Request |
| 400 | Model not found in configuration | Bad Request |
| 500 | Backend inference failure | Internal Server Error |
See also: Image Generation
Image Editing (Inpainting) β POST /v1/images/edits
| Status | Cause | Example Message |
|---|---|---|
| 400 | Invalid or malformed request body | Bad Request |
| 400 | Missing image file | missing image file |
| 400 | Missing mask file | missing mask file |
| 500 | Storage preparation failure | failed to prepare storage |
Audio Transcription β POST /v1/audio/transcriptions
| Status | Cause | Example Message |
|---|---|---|
| 400 | Missing file field in form data | Bad Request |
| 400 | Model not found in configuration | Bad Request |
| 500 | Backend inference failure | Internal Server Error |
See also: Audio to Text
Text to Speech β POST /v1/audio/speech, POST /tts
| Status | Cause | Example Message |
|---|---|---|
| 400 | Invalid or malformed request body | Bad Request |
| 400 | Model not found in configuration | Bad Request |
| 500 | Backend inference failure | Internal Server Error |
See also: Text to Audio
ElevenLabs TTS β POST /v1/text-to-speech/:voice-id
| Status | Cause | Example Message |
|---|---|---|
| 400 | Invalid or malformed request body | Bad Request |
| 400 | Model not found in configuration | Bad Request |
| 500 | Backend inference failure | Internal Server Error |
ElevenLabs Sound Generation β POST /v1/sound-generation
| Status | Cause | Example Message |
|---|---|---|
| 400 | Invalid or malformed request body | Bad Request |
| 400 | Model not found in configuration | Bad Request |
| 500 | Backend inference failure | Internal Server Error |
Reranking β POST /v1/rerank, POST /jina/v1/rerank
| Status | Cause | Example Message |
|---|---|---|
| 400 | Invalid or malformed request body | Bad Request |
| 422 | top_n less than 1 | top_n - should be greater than or equal to 1 |
| 500 | Backend inference failure | Internal Server Error |
See also: Reranker
Anthropic Messages β POST /v1/messages
| Status | Cause | Error Type | Example Message |
|---|---|---|---|
| 400 | Missing model field | invalid_request_error | model is required |
| 400 | Model not in configuration | invalid_request_error | model configuration not found |
| 400 | Missing or invalid max_tokens | invalid_request_error | max_tokens is required and must be greater than 0 |
| 500 | Backend inference failure | api_error | model inference failed: <details> |
| 500 | Prediction failure | api_error | prediction failed: <details> |
Open Responses β POST /v1/responses
| Status | Cause | Error Type | Example Message |
|---|---|---|---|
| 400 | Missing model field | invalid_request | model is required |
| 400 | Model not in configuration | invalid_request | model configuration not found |
| 400 | Failed to parse input | invalid_request | failed to parse input: <details> |
| 400 | background=true without store=true | invalid_request_error | background=true requires store=true |
| 404 | Previous response not found | not_found | previous response not found: <id> |
| 500 | Backend inference failure | model_error | model inference failed: <details> |
| 500 | Prediction failure | model_error | prediction failed: <details> |
| 500 | Tool execution failure | model_error | failed to execute tools: <details> |
| 500 | MCP configuration error | server_error | failed to get MCP config: <details> |
| 500 | No MCP servers available | server_error | no working MCP servers found |
Open Responses β GET /v1/responses/:id
| Status | Cause | Error Type | Example Message |
|---|---|---|---|
| 400 | Missing response ID | invalid_request_error | response ID is required |
| 404 | Response not found | not_found | response not found: <id> |
Open Responses Events β GET /v1/responses/:id/events
| Status | Cause | Error Type | Example Message |
|---|---|---|---|
| 400 | Missing response ID | invalid_request_error | response ID is required |
| 400 | Response was not created with stream | invalid_request_error | cannot stream a response that was not created with stream=true |
| 400 | Invalid starting_after value | invalid_request_error | starting_after must be an integer |
| 404 | Response not found | not_found | response not found: <id> |
| 500 | Failed to retrieve events | server_error | failed to get events: <details> |
Object Detection β POST /v1/detection
| Status | Cause | Example Message |
|---|---|---|
| 400 | Invalid or malformed request body | Bad Request |
| 400 | Model not found in configuration | Bad Request |
| 500 | Backend inference failure | Internal Server Error |
See also: Object Detection
Video Generation β POST /v1/video/generations
| Status | Cause | Example Message |
|---|---|---|
| 400 | Invalid or malformed request body | Bad Request |
| 400 | Model not found in configuration | Bad Request |
| 500 | Backend inference failure | Internal Server Error |
Voice Activity Detection β POST /v1/audio/vad
| Status | Cause | Example Message |
|---|---|---|
| 400 | Invalid or malformed request body | Bad Request |
| 400 | Model not found in configuration | Bad Request |
| 500 | Backend inference failure | Internal Server Error |
Tokenize β POST /v1/tokenize
| Status | Cause | Example Message |
|---|---|---|
| 400 | Invalid or malformed request body | Bad Request |
| 400 | Model not found in configuration | Bad Request |
Models β GET /v1/models, GET /models
| Status | Cause | Example Message |
|---|---|---|
| 500 | Failed to list models | Internal Server Error |
See also: Model Gallery
Handling Errors in Client Code
Python (OpenAI SDK)
curl
Related Configuration
| Environment Variable | Description |
|---|---|
LOCALAI_API_KEY | Comma-separated list of valid API keys |
LOCALAI_OPAQUE_ERRORS | Set to true to hide error details (returns empty body with status code only) |
LOCALAI_SUBTLEKEY_COMPARISON | Use constant-time key comparison for timing-attack resistance |
LocalAI binaries
LocalAI binaries are available for both Linux and MacOS platforms and can be executed directly from your command line. These binaries are continuously updated and hosted on our GitHub Releases page. This method also supports Windows users via the Windows Subsystem for Linux (WSL).
macOS Download
You can download the DMG and install the application:
Note: the DMGs are not signed by Apple as quarantined. See https://github.com/mudler/LocalAI/issues/6268 for a workaround, fix is tracked here: https://github.com/mudler/LocalAI/issues/6244
Otherwise, use the following one-liner command in your terminal to download and run LocalAI on Linux or MacOS:
Otherwise, here are the links to the binaries:
| OS | Link |
|---|---|
| Linux (amd64) | Download |
| Linux (arm64) | Download |
| MacOS (arm64) | Download |
Details
Binaries do have limited support compared to container images:
- Python-based backends are not shipped with binaries (e.g.
diffusersortransformers) - MacOS binaries and Linux-arm64 do not ship TTS nor
stablediffusion-cppbackends - Linux binaries do not ship
stablediffusion-cppbackend
Running on Nvidia ARM64
LocalAI can be run on Nvidia ARM64 devices, such as the Jetson Nano, Jetson Xavier NX, Jetson AGX Orin, and Nvidia DGX Spark. The following instructions will guide you through building and using the LocalAI container for Nvidia ARM64 devices.
Platform Compatibility
- CUDA 12 L4T images: Compatible with Nvidia AGX Orin and similar platforms (Jetson Nano, Jetson Xavier NX, Jetson AGX Xavier)
- CUDA 13 L4T images: Compatible with Nvidia DGX Spark
Prerequisites
- Docker engine installed (https://docs.docker.com/engine/install/ubuntu/)
- Nvidia container toolkit installed (https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#installing-with-ap)
Pre-built Images
Pre-built images are available on quay.io and dockerhub:
CUDA 12 (for AGX Orin and similar platforms)
CUDA 13 (for DGX Spark)
Build the container
If you need to build the container yourself, use the following commands:
CUDA 12 (for AGX Orin and similar platforms)
CUDA 13 (for DGX Spark)
Usage
Run the LocalAI container on Nvidia ARM64 devices using the following commands, where /data/models is the directory containing the models:
CUDA 12 (for AGX Orin and similar platforms)
CUDA 13 (for DGX Spark)
Note: /data/models is the directory containing the models. You can replace it with the directory containing your models.