Using Local LLMs (LM Studio) with 120 AI Chat

Run your AI assistant locally with LM Studio on macOS

What You'll Need

  • A Mac computer (Apple silicon or Intel chip)
  • 120 AI Chat app
  • Internet connection

1. Install LM Studio (your local AI engine)

LM Studio is a user-friendly desktop application that allows you to run AI models directly on your Mac. It provides a Graphical User Interface (GUI) for easy interaction.

  1. Go to https://lmstudio.ai
  2. Click Download for macOS.
  3. Open the downloaded .dmg file
  4. Drag LM Studio into your Applications folder.
  5. Double-click the app to launch it for the first time.

2. Download a Local AI Model

LM Studio has a built-in browser for discovering and downloading models.

  1. In the LM Studio app, click the Model Search tab in the App Settings (Developer Mode).
  2. Browse the available models or use the search bar to find a specific one.
  3. Recommended starter model: Mistral
  4. Select the model you want, and click the Download button. LM Studio will handle the download and save the model to your Mac.

3. Start a Local Server in LM Studio

  1. In LM Studio, go to the Developer tab (left sidebar).
  2. Select the model you want to load (e.g., Mistral-7B-Instruct).
  3. Click Start Server on Port 1234... (the default server)
  4. Leave LM Studio running in the background — it will act as your local AI engine.

4. Open 120 AI Chat

  1. Open the 120 AI Chat app.
  2. Choose your downloaded model, for example, Mistral.

5. Start Chatting!

  • All answers are generated by the model on your Mac.
  • Your data stays private.

LM Studio primarily supports models in the GGUF format, which are optimized to run efficiently on consumer hardware.

A quick guide to understanding LLMs for 120 AI Chat

Most Popular Local LLMs

Mistral (7B): A fast, balanced 4GB model that excels at general chat and quick answers, working well on most modern laptops but with limited coding capabilities.

LLaMA 3 (8B): A smart and accurate 4.5GB general-purpose assistant that delivers high-quality responses but requires 8GB+ RAM and preferably GPU acceleration.

Mixtral (8x7B): A highly intelligent 12GB+ model designed for deep reasoning and long context tasks, but only suitable for high-end systems due to its heavy resource requirements.

Gemma (2B/7B): Google's efficient 1.5-4GB lightweight model that runs anywhere but sacrifices some intelligence compared to larger alternatives.

Phi-3 (3.8B): A compact 2.5GB model that's surprisingly capable for its size, perfect for quick answers on low-end systems but limited in deep reasoning tasks.

Nous Hermes/OpenHermes: A community-tuned 4-7GB model optimized for helpful chat responses and writing, built on existing model foundations with similar system requirements to Mistral/LLaMA.


Minimum System Requirements

For basic models (2B–4B parameters), you'll need 4–8 GB RAM and a modern CPU from 2015 or later.

Mid-range models (7B–8B parameters) require 8–16 GB RAM with an SSD recommended for optimal performance.

Heavy models like Mixtral need 24 GB+ RAM or a dedicated GPU to run effectively.

For the best experience overall, use Apple M1/M2/M3 chips or a GPU with 6GB+ VRAM.

Tip: If your machine feels slow, choose smaller models like phi-3, gemma, or mistral (quantized).