Skip to content

BitonicAI Documentation

ML Config

bitonicai/workspace

ML Configuration

MLConfig `dataclass`

MLConfig(
    embedding_model="all-MiniLM-L6-v2",
    spacy_model="en_core_web_sm",
    similarity_threshold=0.7,
    use_gpu=True,
    cache_embeddings=True,
    batch_size=32,
)

Configuration for machine learning models and parameters.

Controls the behavior of ML components in ProScorer, including embedding models, text analysis, and similarity thresholds.

Attributes:

Name	Type	Description
`embedding_model`	`str`	Name of the sentence transformer model to use for embeddings. Default: "all-MiniLM-L6-v2" (fast, good quality). Other options: "all-mpnet-base-v2" (better quality, slower), "all-MiniLM-L12-v2" (balanced).
`spacy_model`	`str`	Name of the spaCy language model for text analysis. Default: "en_core_web_sm". Must be installed separately.
`similarity_threshold`	`float`	Minimum cosine similarity score (0-1) to consider two texts semantically similar. Default: 0.7.
`use_gpu`	`bool`	Whether to use GPU acceleration if available. Default: True.
`cache_embeddings`	`bool`	Whether to cache computed embeddings for reuse. Improves performance when scoring multiple resumes. Default: True.
`batch_size`	`int`	Batch size for processing embeddings. Default: 32. Increase for better GPU utilization, decrease for lower memory usage.

Example

# Default config
config = MLConfig()

# Custom config for better quality
config = MLConfig(
    embedding_model="all-mpnet-base-v2",
    similarity_threshold=0.75,
    use_gpu=True
)

scorer = ProScorer(ml_config=config)