macOS Setup Guide

This guide covers setup for caro on macOS, with special attention to Apple Silicon (M1/M2/M3/M4) for GPU acceleration.

Prerequisites

Required

macOS: 10.15 (Catalina) or later
Rust: 1.75 or later
Homebrew: Package manager for macOS

Optional (for GPU acceleration)

Xcode: Full Xcode installation for Metal compiler (Apple Silicon only)

Quick Start (All Macs)

1. Install Rust

# Install rustup (Rust toolchain installer)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# Load Rust environment
source "$HOME/.cargo/env"

# Verify installation
rustc --version
cargo --version

2. Install Homebrew (if not already installed)

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

3. Install CMake

brew install cmake

# Verify installation
cmake --version

4. Clone and Build

git clone https://github.com/wildcard/caro.git
cd caro

# Build the project (uses CPU backend by default)
cargo build --release

# Install locally
cargo install --path .

5. Test Installation

# Run a test command
caro "list all files"

# Or using cargo
cargo run --release -- "find text files"

Apple Silicon GPU Acceleration

Apple Silicon (M1/M2/M3/M4) chips support GPU-accelerated inference via the MLX framework, providing ~4x faster inference compared to CPU-only mode.

Current Status

The project includes a fully functional stub implementation that:

Correctly detects Apple Silicon hardware
Downloads and loads the 1.1GB Qwen model
Provides instant responses for testing and development
Works without any additional dependencies

For real GPU acceleration, you need the Metal compiler from Xcode.

Option 1: Stub Implementation (Recommended for Development)

No additional setup required! The default build works immediately:

# Build (automatically uses MLX stub on Apple Silicon)
cargo build --release

# Run - will use stub implementation
cargo run --release -- "list files"

When to use:

Quick testing and development
You don’t want to install multi-GB Xcode
You’re developing non-inference features
You want instant responses for integration testing

Performance:

Model load: ~500ms (from disk)
Response time: ~100ms (simulated inference)
Memory: ~1.1GB (model file)

Option 2: Full GPU Acceleration with Xcode

For production use with real GPU-accelerated inference:

Step 1: Install Xcode

Choose one of these methods:

Method A: App Store (Recommended)

Open App Store
Search for “Xcode”
Click “Get” or “Install”
Wait for download (~15GB) and installation
Open Xcode once to accept license

Method B: Command Line

# Check if Xcode is already installed
xcode-select -p

# If not installed, install Command Line Tools first
xcode-select --install

# Then download Xcode from Apple Developer
open https://developer.apple.com/xcode/

Step 2: Configure Xcode

# Accept Xcode license
sudo xcodebuild -license accept

# Set Xcode as active developer directory
sudo xcode-select --switch /Applications/Xcode.app/Contents/Developer

# Verify Metal compiler is available
xcrun --find metal
# Should output: /usr/bin/metal or similar

# Check Metal version
metal --version

Step 3: Build with MLX Feature

cd caro

# Clean previous builds
cargo clean

# Build with MLX GPU acceleration
cargo build --release --features embedded-mlx

# This will:
# - Compile mlx-rs (may take 5-10 minutes first time)
# - Link against Metal framework
# - Enable GPU acceleration

Step 4: Verify GPU Acceleration

# Run with logging to see MLX initialization
RUST_LOG=info cargo run --release -- "list all files"

# You should see:
# INFO caro::backends::embedded::mlx: MLX GPU initialized
# INFO caro::backends::embedded::mlx: Using Metal device

Expected Performance (M4 Pro):

Model load: < 2s (MLX optimization)
First inference: < 2s
Subsequent inference: < 500ms
First token latency: < 200ms
Memory: ~1.2GB (unified memory)

Troubleshooting

”metal: command not found”

Problem: Metal compiler not found when building with embedded-mlx feature.

Solution: Install full Xcode (not just Command Line Tools):

# Check current developer directory
xcode-select -p

# If it shows /Library/Developer/CommandLineTools, you need full Xcode
# Download from App Store or https://developer.apple.com/xcode/

# After installing, switch to Xcode
sudo xcode-select --switch /Applications/Xcode.app/Contents/Developer

“xcrun: error: unable to find utility ‘metal’”

Problem: Xcode is installed but not configured as active developer directory.

Solution:

sudo xcode-select --switch /Applications/Xcode.app/Contents/Developer
xcrun --find metal  # Should now work

“mlx-sys build failed”

Problem: CMake or Metal compiler issues during mlx-rs compilation.

Solution:

# Verify all dependencies
cmake --version          # Should show 3.x or higher
xcrun --find metal       # Should show Metal compiler path

# Clean and rebuild
cargo clean
cargo build --release --features embedded-mlx

# If still failing, use stub implementation:
cargo build --release  # Without embedded-mlx feature

Model Download Issues

Problem: Model fails to download from Hugging Face.

Solution:

# Check internet connection
curl -I https://huggingface.co

# Manually download model
mkdir -p ~/.cache/caro/models
cd ~/.cache/caro/models

# Download from Hugging Face (1.1GB)
curl -L -o qwen2.5-coder-1.5b-instruct-q4_k_m.gguf \
  "https://huggingface.co/Qwen/Qwen2.5-Coder-1.5B-Instruct-GGUF/resolve/main/qwen2.5-coder-1.5b-instruct-q4_k_m.gguf"

# Verify file
ls -lh qwen2.5-coder-1.5b-instruct-q4_k_m.gguf

“Failed to load model”

Problem: Model file corrupted or not found.

Solution:

# Check model location
ls -lh ~/Library/Caches/caro/models/
# or
ls -lh ~/.cache/caro/models/

# Remove corrupted model
rm ~/Library/Caches/caro/models/*.gguf

# Rerun caro to trigger re-download
cargo run --release -- "test"

Platform Detection

The project automatically detects your platform:

# Check what backend will be used
cargo test model_variant_detect --lib -- --nocapture

# On Apple Silicon (M1/M2/M3/M4):
# ModelVariant::MLX

# On Intel Mac or other platforms:
# ModelVariant::CPU

Performance Comparison

Apple Silicon M4 Pro

Backend	First Inference	Subsequent	Model Load	Memory
Stub	~100ms	~100ms	~500ms	~1.1GB
MLX (GPU)	< 2s	< 500ms	< 2s	~1.2GB
CPU	~4s	~3s	~3s	~1.5GB

Intel Mac

Backend	First Inference	Subsequent	Model Load	Memory
CPU	~5s	~4s	~4s	~1.5GB

System Requirements

Minimum

macOS 10.15+
4GB RAM
2GB free disk space (for model cache)
Internet connection (first run only)

Recommended for GPU Acceleration

Apple Silicon Mac (M1/M2/M3/M4)
8GB+ RAM
macOS 12.0+
Xcode 14+ installed
5GB free disk space (includes Xcode)

Summary

For quick start: Just install Rust, CMake, and build. Works immediately with stub implementation.

For production GPU acceleration: Install Xcode, verify Metal compiler, and build with --features embedded-mlx.

Both modes are fully functional - the stub provides instant responses for development, while MLX provides real GPU-accelerated inference for production use.