infineac.process_text.remove_sentences_under_threshold#

infineac.process_text.remove_sentences_under_threshold(corpus: list[str], threshold: int = 1) list[int][source]#

Removes sentences from a corpus that only contain threshold words or less. Returns a transformed corpus as well as a list of indices that indicate the original position of the sentences in the corpus.