macOS Setup Guide
This guide covers setup for caro on macOS, with special attention to Apple Silicon (M1/M2/M3/M4) for GPU acceleration.
Prerequisites
Section titled “Prerequisites”Required
Section titled “Required”- macOS: 10.15 (Catalina) or later
- Rust: 1.75 or later
- Homebrew: Package manager for macOS
Optional (for GPU acceleration)
Section titled “Optional (for GPU acceleration)”- Xcode: Full Xcode installation for Metal compiler (Apple Silicon only)
Quick Start (All Macs)
Section titled “Quick Start (All Macs)”1. Install Rust
Section titled “1. Install Rust”# Install rustup (Rust toolchain installer)curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
# Load Rust environmentsource "$HOME/.cargo/env"
# Verify installationrustc --versioncargo --version2. Install Homebrew (if not already installed)
Section titled “2. Install Homebrew (if not already installed)”/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"3. Install CMake
Section titled “3. Install CMake”brew install cmake
# Verify installationcmake --version4. Clone and Build
Section titled “4. Clone and Build”git clone https://github.com/wildcard/caro.gitcd caro
# Build the project (uses CPU backend by default)cargo build --release
# Install locallycargo install --path .5. Test Installation
Section titled “5. Test Installation”# Run a test commandcaro "list all files"
# Or using cargocargo run --release -- "find text files"Apple Silicon GPU Acceleration
Section titled “Apple Silicon GPU Acceleration”Apple Silicon (M1/M2/M3/M4) chips support GPU-accelerated inference via the MLX framework, providing ~4x faster inference compared to CPU-only mode.
Current Status
Section titled “Current Status”The project includes a fully functional stub implementation that:
- Correctly detects Apple Silicon hardware
- Downloads and loads the 1.1GB Qwen model
- Provides instant responses for testing and development
- Works without any additional dependencies
For real GPU acceleration, you need the Metal compiler from Xcode.
Option 1: Stub Implementation (Recommended for Development)
Section titled “Option 1: Stub Implementation (Recommended for Development)”No additional setup required! The default build works immediately:
# Build (automatically uses MLX stub on Apple Silicon)cargo build --release
# Run - will use stub implementationcargo run --release -- "list files"When to use:
- Quick testing and development
- You don’t want to install multi-GB Xcode
- You’re developing non-inference features
- You want instant responses for integration testing
Performance:
- Model load: ~500ms (from disk)
- Response time: ~100ms (simulated inference)
- Memory: ~1.1GB (model file)
Option 2: Full GPU Acceleration with Xcode
Section titled “Option 2: Full GPU Acceleration with Xcode”For production use with real GPU-accelerated inference:
Step 1: Install Xcode
Section titled “Step 1: Install Xcode”Choose one of these methods:
Method A: App Store (Recommended)
- Open App Store
- Search for “Xcode”
- Click “Get” or “Install”
- Wait for download (~15GB) and installation
- Open Xcode once to accept license
Method B: Command Line
# Check if Xcode is already installedxcode-select -p
# If not installed, install Command Line Tools firstxcode-select --install
# Then download Xcode from Apple Developeropen https://developer.apple.com/xcode/Step 2: Configure Xcode
Section titled “Step 2: Configure Xcode”# Accept Xcode licensesudo xcodebuild -license accept
# Set Xcode as active developer directorysudo xcode-select --switch /Applications/Xcode.app/Contents/Developer
# Verify Metal compiler is availablexcrun --find metal# Should output: /usr/bin/metal or similar
# Check Metal versionmetal --versionStep 3: Build with MLX Feature
Section titled “Step 3: Build with MLX Feature”cd caro
# Clean previous buildscargo clean
# Build with MLX GPU accelerationcargo build --release --features embedded-mlx
# This will:# - Compile mlx-rs (may take 5-10 minutes first time)# - Link against Metal framework# - Enable GPU accelerationStep 4: Verify GPU Acceleration
Section titled “Step 4: Verify GPU Acceleration”# Run with logging to see MLX initializationRUST_LOG=info cargo run --release -- "list all files"
# You should see:# INFO caro::backends::embedded::mlx: MLX GPU initialized# INFO caro::backends::embedded::mlx: Using Metal deviceExpected Performance (M4 Pro):
- Model load: < 2s (MLX optimization)
- First inference: < 2s
- Subsequent inference: < 500ms
- First token latency: < 200ms
- Memory: ~1.2GB (unified memory)
Troubleshooting
Section titled “Troubleshooting””metal: command not found”
Section titled “”metal: command not found””Problem: Metal compiler not found when building with embedded-mlx feature.
Solution: Install full Xcode (not just Command Line Tools):
# Check current developer directoryxcode-select -p
# If it shows /Library/Developer/CommandLineTools, you need full Xcode# Download from App Store or https://developer.apple.com/xcode/
# After installing, switch to Xcodesudo xcode-select --switch /Applications/Xcode.app/Contents/Developer“xcrun: error: unable to find utility ‘metal’”
Section titled ““xcrun: error: unable to find utility ‘metal’””Problem: Xcode is installed but not configured as active developer directory.
Solution:
sudo xcode-select --switch /Applications/Xcode.app/Contents/Developerxcrun --find metal # Should now work“mlx-sys build failed”
Section titled ““mlx-sys build failed””Problem: CMake or Metal compiler issues during mlx-rs compilation.
Solution:
# Verify all dependenciescmake --version # Should show 3.x or higherxcrun --find metal # Should show Metal compiler path
# Clean and rebuildcargo cleancargo build --release --features embedded-mlx
# If still failing, use stub implementation:cargo build --release # Without embedded-mlx featureModel Download Issues
Section titled “Model Download Issues”Problem: Model fails to download from Hugging Face.
Solution:
# Check internet connectioncurl -I https://huggingface.co
# Manually download modelmkdir -p ~/.cache/caro/modelscd ~/.cache/caro/models
# Download from Hugging Face (1.1GB)curl -L -o qwen2.5-coder-1.5b-instruct-q4_k_m.gguf \ "https://huggingface.co/Qwen/Qwen2.5-Coder-1.5B-Instruct-GGUF/resolve/main/qwen2.5-coder-1.5b-instruct-q4_k_m.gguf"
# Verify filels -lh qwen2.5-coder-1.5b-instruct-q4_k_m.gguf“Failed to load model”
Section titled ““Failed to load model””Problem: Model file corrupted or not found.
Solution:
# Check model locationls -lh ~/Library/Caches/caro/models/# orls -lh ~/.cache/caro/models/
# Remove corrupted modelrm ~/Library/Caches/caro/models/*.gguf
# Rerun caro to trigger re-downloadcargo run --release -- "test"Platform Detection
Section titled “Platform Detection”The project automatically detects your platform:
# Check what backend will be usedcargo test model_variant_detect --lib -- --nocapture
# On Apple Silicon (M1/M2/M3/M4):# ModelVariant::MLX
# On Intel Mac or other platforms:# ModelVariant::CPUPerformance Comparison
Section titled “Performance Comparison”Apple Silicon M4 Pro
Section titled “Apple Silicon M4 Pro”| Backend | First Inference | Subsequent | Model Load | Memory |
|---|---|---|---|---|
| Stub | ~100ms | ~100ms | ~500ms | ~1.1GB |
| MLX (GPU) | < 2s | < 500ms | < 2s | ~1.2GB |
| CPU | ~4s | ~3s | ~3s | ~1.5GB |
Intel Mac
Section titled “Intel Mac”| Backend | First Inference | Subsequent | Model Load | Memory |
|---|---|---|---|---|
| CPU | ~5s | ~4s | ~4s | ~1.5GB |
System Requirements
Section titled “System Requirements”Minimum
Section titled “Minimum”- macOS 10.15+
- 4GB RAM
- 2GB free disk space (for model cache)
- Internet connection (first run only)
Recommended for GPU Acceleration
Section titled “Recommended for GPU Acceleration”- Apple Silicon Mac (M1/M2/M3/M4)
- 8GB+ RAM
- macOS 12.0+
- Xcode 14+ installed
- 5GB free disk space (includes Xcode)
Summary
Section titled “Summary”For quick start: Just install Rust, CMake, and build. Works immediately with stub implementation.
For production GPU acceleration: Install Xcode, verify Metal compiler, and build with --features embedded-mlx.
Both modes are fully functional - the stub provides instant responses for development, while MLX provides real GPU-accelerated inference for production use.