infineac.topic_extractor.bert_advanced#

infineac.topic_extractor.bert_advanced(docs: list[str], representation_model: any, embedding_model: any | None = None, umap_model: any | None = None, vectorizer_model: any | None = None, nr_topics: any | None = None, predefined_topics: bool | list[list[str]] | None = None) tuple[bertopic._bertopic.BERTopic, list[int], numpy.ndarray][source]#

Extracts topics from a list of documents using BERTopic.

Parameters:
  • docs (list[str]) – List of documents.

  • representation_model (any) – Representation model to use.

  • embedding_model (any, default: None) – Embedding model to use. If None, the default embedding model is used.

  • umap_model (any, default: None) – UMAP model to use. If None, the default UMAP model is used.

  • vectorizer_model (any, default: None) – Vectorizer model to use. If None, the default vectorizer model is used.

  • nr_topics (any, default: None) – Number of topics to extract. If None, the number of topics is determined automatically.

  • predefined_topics (bool | list[list[str]], default: None) – Whether to use predefined_topics. If True, :func:constants.TOPICS is used.

Returns:

Tuple containing the BERTopic model, the topics and the probabilities.

Return type:

tuple[BERTopic, list[int], ndarray]