Skip to content

ML Configuration

MLConfig dataclass

MLConfig(
    embedding_model="all-MiniLM-L6-v2",
    spacy_model="en_core_web_sm",
    similarity_threshold=0.7,
    use_gpu=True,
    cache_embeddings=True,
    batch_size=32,
)

Configuration for machine learning models and parameters.

Controls the behavior of ML components in ProScorer, including embedding models, text analysis, and similarity thresholds.

Attributes:

Name Type Description
embedding_model str

Name of the sentence transformer model to use for embeddings. Default: "all-MiniLM-L6-v2" (fast, good quality). Other options: "all-mpnet-base-v2" (better quality, slower), "all-MiniLM-L12-v2" (balanced).

spacy_model str

Name of the spaCy language model for text analysis. Default: "en_core_web_sm". Must be installed separately.

similarity_threshold float

Minimum cosine similarity score (0-1) to consider two texts semantically similar. Default: 0.7.

use_gpu bool

Whether to use GPU acceleration if available. Default: True.

cache_embeddings bool

Whether to cache computed embeddings for reuse. Improves performance when scoring multiple resumes. Default: True.

batch_size int

Batch size for processing embeddings. Default: 32. Increase for better GPU utilization, decrease for lower memory usage.

Example
# Default config
config = MLConfig()

# Custom config for better quality
config = MLConfig(
    embedding_model="all-mpnet-base-v2",
    similarity_threshold=0.75,
    use_gpu=True
)

scorer = ProScorer(ml_config=config)