Large language models (LLM) like ChatGPT, Google Bard, and Bing Chat all run in the cloud, and that basically means they run on somebody else's computer. Not only that, they're particularly costly to run, and that's why ChatGPT has a Plus option that you can subscribe to, for example. However, you can run many different language models like Llama 2 locally, and with the power of LM Studio, you can run pretty much any LLM locally with ease.

Setting up LM Studio on Windows and Mac is ridiculously easy, and the process is the same for both platforms. It should also work on Linux, though we aren't using it for this tutorial.

LM Studio requirements

You'll need just a couple of things to run LM Studio:

Apple Silicon Mac (M1/M2/M3) with macOS 13.6 or newer

Windows / Linux PC with a processor that supports AVX2 (typically newer PCs)

16GB+ of RAM is recommended. For PCs, 6GB+ of VRAM is recommended

NVIDIA/AMD GPUs supported

An (optionally fast) internet connection to download models

If you have the above, then you're ready to go. I'm using an RTX 4080 with 16GB of VRAM, and since it's one of the best graphics cards, my text generation is quick.

Step 1: Download and launch LM Studio

You'll first need to download LM Studio from the website for whatever platform you're on. This download may take a bit of time as it's roughly 400MB, depending on the speed of your internet connection. Once it's downloaded, launch it, and it should look like the above screenshot.

Step 2: Choose a model to download

Next, choose a model to download by clicking the magnifying glass and looking through the options available. Most of these models will be several gigabytes in size and may take a while to download. I'm using Zepyhr-7B as it's small enough and easy for an LLM to use, but there are a lot of different LLMs to choose from. Have a browse around, do some research, and see if any catch your eye. Zephyr is a model trained to be an assistant, so it can be useful once set up. Once you've chosen one, do the following:

Wait for it to finish downloading. Click the Speech Bubble on the left. At the top, select your model. Wait for it to load.

Step 3: Converse!

It's seriously that simple, and you've already downloaded and set up an LLM locally to speak with. At this point, you can enable GPU acceleration on the right-hand side to speed up responses if you want, though it's not necessary. I run LM Studio on my RTX 4080 with 20 GPU layers, but you may need more or fewer.

Why use an LLM locally?

If you're wondering why you would want to use an LLM locally, there are a few reasons. The first, and one that concerns most people, is privacy. LLMs are powerful tools that can be used for organizational and planning purposes, some of which may be sensitive in nature. If you also want to ask an LLM about private code (for example, if you're debugging it), then you should never use a cloud-based one.

These are only scratching the surface of reasons, too. Sometimes, these LLMs are tuned toward specific use cases that Bard, ChatGPT, and Bing Chat can't provide. As already mentioned, Zephyr is trained as a virtual assistant, and that level of specificity isn't there in other LLMs. Definitely give LM Studio a try if you're interested in trying one out because it's never been easier to run your own LLM!