Using Local LLMs with 120 AI Chat

Run your AI assistant locally with Ollama on macOS

What You’ll Need

  • A Mac computer (Apple silicon or Intel chip)
  • 120 AI Chat app
  • Internet connection

1. Install Ollama (your local AI engine)

Ollama allows you to run AI models directly on your Mac.

  1. Go to https://ollama.com/download
  2. Click Download for macOS
  3. Open the downloaded .dmg file
  4. Drag Ollama into your Applications folder
  5. Double-click to launch it once — this will install the background service

Ollama doesn’t show a full app window — it runs silently in the background. You’ll use Terminal to interact with it.

2. Open the Terminal App on macOS

This is where you’ll tell your Mac to download an AI model.

  1. Press Cmd + Space to open Spotlight Search
  2. Type Terminal, then press Enter
  3. A black window opens — this is the Terminal where you'll type commands

3. Download a Local AI Model Using Ollama

Recommended starter model: mistral

In the Terminal window, type: ollama run mistral

Then press Enter.

  • Ollama will automatically download the mistral model (about ~4GB)
  • When it's done, it will be ready to run on your Mac.

You only need to do this once. After that, the model is saved and ready to use.

4. Open 120 AI Chat

  1. Open the 120 AI Chat app
  2. Choose your downloaded model — for example, mistral

5. Start chatting!

  • All answers are generated by the model on your Mac
  • Your data stays private.

Summary

  1. Install Ollama
  2. Open Terminal (Cmd + Space → “Terminal”)
  3. Type: ollama run mistral
  4. Open 120 AI Chat → Select mistral
  5. Start chatting!

A quick guide to understanding LLMs for 120 AI Chat

Most Popular Local LLMs

Mistral (7B): A fast, balanced 4GB model that excels at general chat and quick answers, working well on most modern laptops but with limited coding capabilities.

LLaMA 3 (8B): A smart and accurate 4.5GB general-purpose assistant that delivers high-quality responses but requires 8GB+ RAM and preferably GPU acceleration.

Mixtral (8x7B): A highly intelligent 12GB+ model designed for deep reasoning and long context tasks, but only suitable for high-end systems due to its heavy resource requirements.

Gemma (2B/7B): Google's efficient 1.5-4GB lightweight model that runs anywhere but sacrifices some intelligence compared to larger alternatives.

Phi-3 (3.8B): A compact 2.5GB model that's surprisingly capable for its size, perfect for quick answers on low-end systems but limited in deep reasoning tasks.

Nous Hermes/OpenHermes: A community-tuned 4-7GB model optimized for helpful chat responses and writing, built on existing model foundations with similar system requirements to Mistral/LLaMA.


Minimum System Requirements

For basic models (2B–4B parameters), you'll need 4–8 GB RAM and a modern CPU from 2015 or later.

Mid-range models (7B–8B parameters) require 8–16 GB RAM with an SSD recommended for optimal performance.

Heavy models like Mixtral need 24 GB+ RAM or a dedicated GPU to run effectively.

For the best experience overall, use Apple M1/M2/M3 chips or a GPU with 6GB+ VRAM.

Tip: If your machine feels slow, choose smaller models like phi-3, gemma, or mistral (quantized).