Skip to content

macOS Setup Guide

This guide covers setup for caro on macOS, with special attention to Apple Silicon (M1/M2/M3/M4) for GPU acceleration.

  • macOS: 10.15 (Catalina) or later
  • Rust: 1.75 or later
  • Homebrew: Package manager for macOS
  • Xcode: Full Xcode installation for Metal compiler (Apple Silicon only)
Terminal window
# Install rustup (Rust toolchain installer)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
# Load Rust environment
source "$HOME/.cargo/env"
# Verify installation
rustc --version
cargo --version

2. Install Homebrew (if not already installed)

Section titled “2. Install Homebrew (if not already installed)”
Terminal window
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
Terminal window
brew install cmake
# Verify installation
cmake --version
Terminal window
git clone https://github.com/wildcard/caro.git
cd caro
# Build the project (uses CPU backend by default)
cargo build --release
# Install locally
cargo install --path .
Terminal window
# Run a test command
caro "list all files"
# Or using cargo
cargo run --release -- "find text files"

Apple Silicon (M1/M2/M3/M4) chips support GPU-accelerated inference via the MLX framework, providing ~4x faster inference compared to CPU-only mode.

The project includes a fully functional stub implementation that:

  • Correctly detects Apple Silicon hardware
  • Downloads and loads the 1.1GB Qwen model
  • Provides instant responses for testing and development
  • Works without any additional dependencies

For real GPU acceleration, you need the Metal compiler from Xcode.

Section titled “Option 1: Stub Implementation (Recommended for Development)”

No additional setup required! The default build works immediately:

Terminal window
# Build (automatically uses MLX stub on Apple Silicon)
cargo build --release
# Run - will use stub implementation
cargo run --release -- "list files"

When to use:

  • Quick testing and development
  • You don’t want to install multi-GB Xcode
  • You’re developing non-inference features
  • You want instant responses for integration testing

Performance:

  • Model load: ~500ms (from disk)
  • Response time: ~100ms (simulated inference)
  • Memory: ~1.1GB (model file)

Option 2: Full GPU Acceleration with Xcode

Section titled “Option 2: Full GPU Acceleration with Xcode”

For production use with real GPU-accelerated inference:

Choose one of these methods:

Method A: App Store (Recommended)

  1. Open App Store
  2. Search for “Xcode”
  3. Click “Get” or “Install”
  4. Wait for download (~15GB) and installation
  5. Open Xcode once to accept license

Method B: Command Line

Terminal window
# Check if Xcode is already installed
xcode-select -p
# If not installed, install Command Line Tools first
xcode-select --install
# Then download Xcode from Apple Developer
open https://developer.apple.com/xcode/
Terminal window
# Accept Xcode license
sudo xcodebuild -license accept
# Set Xcode as active developer directory
sudo xcode-select --switch /Applications/Xcode.app/Contents/Developer
# Verify Metal compiler is available
xcrun --find metal
# Should output: /usr/bin/metal or similar
# Check Metal version
metal --version
Terminal window
cd caro
# Clean previous builds
cargo clean
# Build with MLX GPU acceleration
cargo build --release --features embedded-mlx
# This will:
# - Compile mlx-rs (may take 5-10 minutes first time)
# - Link against Metal framework
# - Enable GPU acceleration
Terminal window
# Run with logging to see MLX initialization
RUST_LOG=info cargo run --release -- "list all files"
# You should see:
# INFO caro::backends::embedded::mlx: MLX GPU initialized
# INFO caro::backends::embedded::mlx: Using Metal device

Expected Performance (M4 Pro):

  • Model load: < 2s (MLX optimization)
  • First inference: < 2s
  • Subsequent inference: < 500ms
  • First token latency: < 200ms
  • Memory: ~1.2GB (unified memory)

Problem: Metal compiler not found when building with embedded-mlx feature.

Solution: Install full Xcode (not just Command Line Tools):

Terminal window
# Check current developer directory
xcode-select -p
# If it shows /Library/Developer/CommandLineTools, you need full Xcode
# Download from App Store or https://developer.apple.com/xcode/
# After installing, switch to Xcode
sudo xcode-select --switch /Applications/Xcode.app/Contents/Developer

“xcrun: error: unable to find utility ‘metal’”

Section titled ““xcrun: error: unable to find utility ‘metal’””

Problem: Xcode is installed but not configured as active developer directory.

Solution:

Terminal window
sudo xcode-select --switch /Applications/Xcode.app/Contents/Developer
xcrun --find metal # Should now work

Problem: CMake or Metal compiler issues during mlx-rs compilation.

Solution:

Terminal window
# Verify all dependencies
cmake --version # Should show 3.x or higher
xcrun --find metal # Should show Metal compiler path
# Clean and rebuild
cargo clean
cargo build --release --features embedded-mlx
# If still failing, use stub implementation:
cargo build --release # Without embedded-mlx feature

Problem: Model fails to download from Hugging Face.

Solution:

Terminal window
# Check internet connection
curl -I https://huggingface.co
# Manually download model
mkdir -p ~/.cache/caro/models
cd ~/.cache/caro/models
# Download from Hugging Face (1.1GB)
curl -L -o qwen2.5-coder-1.5b-instruct-q4_k_m.gguf \
"https://huggingface.co/Qwen/Qwen2.5-Coder-1.5B-Instruct-GGUF/resolve/main/qwen2.5-coder-1.5b-instruct-q4_k_m.gguf"
# Verify file
ls -lh qwen2.5-coder-1.5b-instruct-q4_k_m.gguf

Problem: Model file corrupted or not found.

Solution:

Terminal window
# Check model location
ls -lh ~/Library/Caches/caro/models/
# or
ls -lh ~/.cache/caro/models/
# Remove corrupted model
rm ~/Library/Caches/caro/models/*.gguf
# Rerun caro to trigger re-download
cargo run --release -- "test"

The project automatically detects your platform:

Terminal window
# Check what backend will be used
cargo test model_variant_detect --lib -- --nocapture
# On Apple Silicon (M1/M2/M3/M4):
# ModelVariant::MLX
# On Intel Mac or other platforms:
# ModelVariant::CPU
BackendFirst InferenceSubsequentModel LoadMemory
Stub~100ms~100ms~500ms~1.1GB
MLX (GPU)< 2s< 500ms< 2s~1.2GB
CPU~4s~3s~3s~1.5GB
BackendFirst InferenceSubsequentModel LoadMemory
CPU~5s~4s~4s~1.5GB
  • macOS 10.15+
  • 4GB RAM
  • 2GB free disk space (for model cache)
  • Internet connection (first run only)
  • Apple Silicon Mac (M1/M2/M3/M4)
  • 8GB+ RAM
  • macOS 12.0+
  • Xcode 14+ installed
  • 5GB free disk space (includes Xcode)

For quick start: Just install Rust, CMake, and build. Works immediately with stub implementation.

For production GPU acceleration: Install Xcode, verify Metal compiler, and build with --features embedded-mlx.

Both modes are fully functional - the stub provides instant responses for development, while MLX provides real GPU-accelerated inference for production use.