Member-only story

NLP SIMILARITY 2: Use Vector Databases and word embeddings of LLM for semantic similarity search

Learn to use a Large Language Models (LLM) to create word embeddings and store them in a vector database for semantic search on your own data.

Christian Bernecker
7 min readNov 3, 2023

In this article you will learn how you can use the sentence transformer from HuggingFace to create word embeddings on your own data. How to store them in the vector database Chroma and how to realize a semantic search with the help of a vector database.

Embeddings

I’ll not spend to much time in describing what Embeddings are. Because I already wrote that in another article. In short terms word embeddings are numerical representations of words that capture their semantic meaning, typically in the form of dense vectors in a multi-dimensional space.

Word Embeddings

These vectors are generated using natural language processing techniques and can be used to measure similarities between words. For more details about embeddings go to:

In essence, embeddings provide a foundational representation of words and documents in a semantic vector space, and semantic search leverages these embeddings to enhance search accuracy, understand context, and deliver more relevant results. This integration of embeddings and semantic search is especially valuable in natural language understanding and information retrieval tasks.

Vector Databases

Let’s define what a vector database is. A vector database is a specialized type of database designed to efficiently store, index, and retrieve high-dimensional vector data. Unlike traditional relational databases, which store structured data in tables, vector databases are optimized for handling data represented as vectors. Vectors are mathematical objects (embeddings) that can represent a wide range of data types, such as numerical features, text embeddings, images, or other structured or…

--

--

Christian Bernecker
Christian Bernecker

Written by Christian Bernecker

AI enthusiast, speaker, and software developer passionate about leveraging technology to improve the world. Always happy to share knowledge and connect.

No responses yet

Write a response