NLP Install a LLM: Setup a large language model on your own PC.

Christian Bernecker
3 min readDec 8, 2023

Step-by-step instructions on loading 7 B models such as Mistral, Zephyr Beta, and LLama on your local machine.

Discover how you can harness the power of a Large Language Model (LLM) with a seamless installation process that takes just some minutes. Dive into the future of NLP with this user-friendly tutorial designed to empower both novices and seasoned professionals in leveraging the capabilities of local language models for enhanced efficiency and privacy.

Picture made with DALL-E (via Author)

The presented code snippets specifically showcase the use of Mistral as the selected LLM, but the reader is encouraged to choose other models like Zephyr or LLama. The final output demonstrates the model’s ability to generate text in real-time based on a given prompt.

Prerequisites

Step 1: Installation of Dependencies

First, install essential dependencies and execute each command in a terminal.

pip install transformers
pip install huggingface-hub
#If you have no graphic card use this:
pip install ctransformers
#If you have a graphic card use this:
pip install ctransformers[cuda]

This quick step ensures your machine is ready to deploy sophisticated large language models processing.

Step. 2 Choose Your LLM Model and download the model:

Execute one of the following commands in your terminal to download your favorite LLM.

# Mistral
huggingface-cli download TheBloke/Mistral-7B-Instruct-v0.1-GGUF mistral-7b-instruct-v0.1.Q4_K_M.gguf --local-dir Mistral-7B-Instruct-v0.1-GGUF --local-dir-use-symlinks False

# Zephyr
huggingface-cli download TheBloke/zephyr-7B-beta-GGUF zephyr-7b-beta.Q4_K_M.gguf --local-dir .\models\zephyr-7B-beta-GGUF --local-dir-use-symlinks False

# Llamaa
huggingface-cli download TheBloke/Llama-2-7B-Chat-GGUF llama-2-7b-chat.Q4_K_M.gguf --local-dir .\models\Llama-2-7B-Chat-GGUF --local-dir-use-symlinks False

Here I used Mistral but you can use as well Zephyr, LLama2 or any other models that is in the GGUF format. For more models check this awesome creator on Hugginface Hub: https://huggingface.co/TheBloke

Step. 3 Load the Model:

Empower your local setup by selecting a pre-trained LLM from the Hugging Face Transformers library. I decided to go for Mistral:

from ctransformers import AutoModelForCausalLM
# Set gpu_layers to the number of layers to offload to GPU.
# Set to 0 if no GPU acceleration is available on your system. Default = 0

# Mistral
llm = AutoModelForCausalLM.from_pretrained(".\models\Mistral-7B-Instruct-v0.1-GGUF", model_file="mistral-7b-instruct-v0.1.Q4_K_M.gguf", model_type="mistral", gpu_layers=0)
# LLama
#llm = AutoModelForCausalLM.from_pretrained(".\models\Llama-2-7B-Chat-GGUF", model_file="llama-2-7b-chat.Q4_K_M.gguf", model_type="llama", gpu_layers=20)
# Zephyr
#llm = AutoModelForCausalLM.from_pretrained(".\models\zephyr-7B-beta-GGUF", model_file="zephyr-7b-beta.Q4_K_M.gguf", model_type="zephyr", gpu_layers=20)

Step. 4 Execute the model with your prompt:

By utilizing streaming, the output becomes visible in real-time, eliminating the need to wait for the complete return.

prompt = "AI is going to"
for text in llm(prompt, stream=True):
print(text, end="", flush=True)

# Output:
# revolutionize the way we live, work and play. It’s already changing
# the world in ways that many people don’t even realize. From self-driving
# cars to virtual assistants, AI is making .....

What have you learned:

You have learned how to install and use a local Large Language Model (LLM). From installing necessary dependencies, choosing an LLM model (such as Mistral, Zephyr, or LLama2), and loading the selected model.

What’s next?

I’d suggest writing your own private AI assistant.

Leave a comment if you have any questions, recommendations or something is not clear and I’ll try to answer soon as possible.

--

--

Christian Bernecker

IT Architect | Data Scientist | Software Developer | Data Driven Investor