infineac.topic_extractor.bert_advanced#
- infineac.topic_extractor.bert_advanced(docs: list[str], representation_model: any, embedding_model: any | None = None, umap_model: any | None = None, vectorizer_model: any | None = None, nr_topics: any | None = None, predefined_topics: bool | list[list[str]] | None = None) tuple[bertopic._bertopic.BERTopic, list[int], numpy.ndarray][source]#
Extracts topics from a list of documents using BERTopic.
- Parameters:
docs (list[str]) – List of documents.
representation_model (any) – Representation model to use.
embedding_model (any, default: None) – Embedding model to use. If None, the default embedding model is used.
umap_model (any, default: None) – UMAP model to use. If None, the default UMAP model is used.
vectorizer_model (any, default: None) – Vectorizer model to use. If None, the default vectorizer model is used.
nr_topics (any, default: None) – Number of topics to extract. If None, the number of topics is determined automatically.
predefined_topics (bool | list[list[str]], default: None) – Whether to use predefined_topics. If True, :func:constants.TOPICS is used.
- Returns:
Tuple containing the BERTopic model, the topics and the probabilities.
- Return type:
tuple[BERTopic, list[int], ndarray]