Embedding Model
EmbeddingModel
Wrapper around SentenceTransformer with lazy loading and caching.
Provides thread-safe lazy loading of embedding models and optional caching of computed embeddings for improved performance. Supports both single text encoding and batch processing.
Attributes:
| Name | Type | Description |
|---|---|---|
model_name |
Name of the sentence transformer model to use. |
|
device |
Device to run the model on ("cuda", "cpu", or None for auto). |
|
cache_embeddings |
Whether to cache computed embeddings using LRU cache. |
Initialize the embedding model wrapper.
Source code in at_scorer/ml/embeddings.py
Functions
encode
Encode a single text string into an embedding vector.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
The text to encode. |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
Normalized embedding vector as numpy array. Returns zero vector for empty text. |
Source code in at_scorer/ml/embeddings.py
encode_batch
Encode multiple texts efficiently in a batch.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
texts
|
Iterable[str]
|
Iterable of text strings to encode. |
required |
Returns:
| Type | Description |
|---|---|
list[ndarray]
|
List of normalized embedding vectors as numpy arrays. |
Source code in at_scorer/ml/embeddings.py
similarity
Calculate cosine similarity between two embedding vectors.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
a
|
ndarray
|
First embedding vector. |
required |
b
|
ndarray
|
Second embedding vector. |
required |
Returns:
| Type | Description |
|---|---|
float
|
Cosine similarity score between -1 and 1 (typically 0-1 for normalized vectors). |
float
|
Returns 0.0 if either vector is empty. |