Hey, the sentences itself are represented via one embedding. Of course the embedding is influnced by the words. But in that example we get one sentence embedding and the similarity is calculated on… - Christian Bernecker

Mar 29, 2023

Hey,

the sentences itself are represented via one embedding. Of course the embedding is influnced by the words. But in that example we get one sentence embedding and the similarity is calculated on the sentence embedding level not on the word level.

If you are intressted in word embeddings you can easily modify the encoder function with the attirbute output_value="token_embeddings" like:

model.encode(sentences1, convert_to_tensor=True, output_value="token_embeddings")

but be aware that cosine_scores = util.cos_sim(embeddings1, embeddings2) will not work in that case.

I hope that answers your question?

Written by Christian Bernecker

No responses yet