Semantic Matcher
SemanticMatcher
Semantic similarity helper built on top of EmbeddingModel.
Provides convenient methods for calculating semantic similarity between texts and matching items across lists. Uses cosine similarity of embeddings with an optional threshold filter.
Attributes:
| Name | Type | Description |
|---|---|---|
embedding_model |
The embedding model to use for encoding texts. |
|
threshold |
Minimum similarity score (0-1) to consider a match. Default: 0.7. |
Example
from at_scorer.ml import EmbeddingModel
model = EmbeddingModel()
matcher = SemanticMatcher(model, threshold=0.75)
# Calculate similarity
score = matcher.similarity("Python developer", "Software engineer")
# Find top matches
candidates = ["Java", "Python", "JavaScript", "C++"]
matches = matcher.top_matches("Python developer", candidates, top_k=2)
# Returns: [("Python", 0.95), ("JavaScript", 0.72)]
Initialize the semantic matcher.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
embedding_model
|
EmbeddingModel
|
The embedding model to use for encoding. |
required |
threshold
|
float
|
Minimum similarity score to consider a match (0-1). |
0.7
|
Source code in at_scorer/ml/semantic_matcher.py
Functions
similarity
Calculate semantic similarity between two texts.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text_a
|
str
|
First text to compare. |
required |
text_b
|
str
|
Second text to compare. |
required |
Returns:
| Type | Description |
|---|---|
float
|
Cosine similarity score between 0 and 1 (higher = more similar). |
Source code in at_scorer/ml/semantic_matcher.py
top_matches
Find top-k most similar candidates to a query text.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
str
|
The query text to match against. |
required |
candidates
|
Iterable[str]
|
Iterable of candidate texts to search. |
required |
top_k
|
int
|
Maximum number of matches to return. |
5
|
Returns:
| Type | Description |
|---|---|
list[tuple[str, float]]
|
List of (candidate, score) tuples, sorted by score descending. |
list[tuple[str, float]]
|
Only includes matches above the threshold. |
Source code in at_scorer/ml/semantic_matcher.py
match_lists
Match items from source list to target list using semantic similarity.
For each item in the source list, finds the top-k most similar items in the target list.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
source
|
Iterable[str]
|
Source list of texts to match. |
required |
target
|
Iterable[str]
|
Target list of texts to match against. |
required |
top_k
|
int
|
Maximum number of matches per source item. |
3
|
Returns:
| Type | Description |
|---|---|
list[tuple[str, str, float]]
|
List of (source_item, target_item, score) tuples for all matches |
list[tuple[str, str, float]]
|
above the threshold. |