NLP Install a LLM: Setup a large language model on your own PC.
Step-by-step instructions on loading 7 B models such as Mistral, Zephyr Beta, and LLama on your local machine via hugging face.
Discover how you can harness the power of a Large Language Model (LLM) with a seamless installation process that takes just some minutes. Dive into the future of NLP with this user-friendly tutorial designed to empower both novices and seasoned professionals in leveraging the capabilities of local language models for enhanced efficiency and privacy.
The presented code snippets specifically showcase the use of Mistral as the selected LLM, but the reader is encouraged to choose other models like Zephyr or LLama. The final output demonstrates the model’s ability to generate text in real-time based on a given prompt.
Prerequisites
Step 1: Installation of Dependencies
First, install essential dependencies and execute each command in a terminal.
pip install transformers
pip install huggingface-hub
#If you have no graphic card use this:
pip install ctransformers
#If you have a graphic card use this:
pip install ctransformers[cuda]
This quick step ensures your machine is ready to deploy sophisticated large language models processing.
Step. 2 Choose Your LLM Model and download the model:
Execute one of the following commands in your terminal to download your favorite LLM.
# Mistral
huggingface-cli download TheBloke/Mistral-7B-Instruct-v0.1-GGUF mistral-7b-instruct-v0.1.Q4_K_M.gguf --local-dir Mistral-7B-Instruct-v0.1-GGUF --local-dir-use-symlinks False
# Zephyr
huggingface-cli download TheBloke/zephyr-7B-beta-GGUF zephyr-7b-beta.Q4_K_M.gguf --local-dir .\models\zephyr-7B-beta-GGUF --local-dir-use-symlinks False
# Llamaa
huggingface-cli download TheBloke/Llama-2-7B-Chat-GGUF llama-2-7b-chat.Q4_K_M.gguf --local-dir .\models\Llama-2-7B-Chat-GGUF --local-dir-use-symlinks False
Here I used Mistral but you can use as well Zephyr, LLama2 or any other models that is in the GGUF format. For more models check this awesome creator on Hugginface Hub: https://huggingface.co/TheBloke
Step. 3 Load the Model:
Empower your local setup by selecting a pre-trained LLM from the Hugging Face Transformers library. I decided to go for Mistral:
from ctransformers import AutoModelForCausalLM
# Set gpu_layers to the number of layers to offload to GPU.
# Set to 0 if no GPU acceleration is available on your system. Default = 0
# Mistral
llm = AutoModelForCausalLM.from_pretrained(".\models\Mistral-7B-Instruct-v0.1-GGUF", model_file="mistral-7b-instruct-v0.1.Q4_K_M.gguf", model_type="mistral", gpu_layers=0)
# LLama
#llm = AutoModelForCausalLM.from_pretrained(".\models\Llama-2-7B-Chat-GGUF", model_file="llama-2-7b-chat.Q4_K_M.gguf", model_type="llama", gpu_layers=20)
# Zephyr
#llm = AutoModelForCausalLM.from_pretrained(".\models\zephyr-7B-beta-GGUF", model_file="zephyr-7b-beta.Q4_K_M.gguf", model_type="zephyr", gpu_layers=20)
Step. 4 Execute the model with your prompt:
By utilizing streaming, the output becomes visible in real-time, eliminating the need to wait for the complete return.
prompt = "AI is going to"
for text in llm(prompt, stream=True):
print(text, end="", flush=True)
# Output:
# revolutionize the way we live, work and play. It’s already changing
# the world in ways that many people don’t even realize. From self-driving
# cars to virtual assistants, AI is making .....
What have you learned:
You have learned how to install and use a local Large Language Model (LLM). From installing necessary dependencies, choosing an LLM model (such as Mistral, Zephyr, or LLama2), and loading the selected model.
Want to Connect?
Thanks for reading. I hope you enjoyed it and you can get something out of it.
- If you enjoyed the article, give me a few claps below 👏👏👏…
- Follow me to learn more about AI 🤖🤖🤖 …
- Find me on LinkedIn
As always, if you have any questions, ideas, recommendations don’t hesitate to ask in the comments.