Hey,
the sentences itself are represented via one embedding. Of course the embedding is influnced by the words. But in that example we get one sentence embedding and the similarity is calculated on the sentence embedding level not on the word level.
If you are intressted in word embeddings you can easily modify the encoder function with the attirbute output_value="token_embeddings" like:
model.encode(sentences1, convert_to_tensor=True, output_value="token_embeddings")
but be aware that cosine_scores = util.cos_sim(embeddings1, embeddings2) will not work in that case.
I hope that answers your question?