infineac.process_text.extract_keyword_sentences_window#

infineac.process_text.extract_keyword_sentences_window(text: str, keywords: list[str] | dict[str, int], nlp_model, modifier_words: list[str] = ['disregarding', 'except', 'excluding', 'ignoring', 'leaving out', 'not including', 'omitting'], context_window_sentence: tuple[int, int] | int = 0, join_adjacent_sentences: bool = True, return_type: str = 'list') str | list[str][source]#

Extracts sentences with specific keywords within a text as well as the context surrounding this sentence.

Parameter`#

textstr

The text to extract the sentences from.

keywordslist[str] | dict[str, int], default: []

List of keywords to be searched for in the text and to extract the sentences. If keywords is a dictionary, the keys are the keywords.

nlp_modelspacy.lang

NLP model.

modifier_wordslist[str], default: MODIFIER_WORDS

List of modifier_words, which must not precede the keyword.

context_window_sentencetuple[int, int] | int, default: 0

The context window of of the sentences to be extracted. Either an integer or a tuple of length 2. The first element of the tuple indicates the number of sentences to be extracted before the sentence the keyword was found in, the second element indicates the number of sentences after it. If only an integer is provided, the same number of sentences are extracted before and after the keyword. If one of the elements is -1, all sentences before or after the keyword are extracted. So -1 can be used to extract all sentences before and after the keyword, e.g. the entire paragraph.

join_adjacent_sentencesbool, default: True

Whether to join adjacent sentences or leave them as individual. If context_window_sentence > 0, this parameter is automatically set to True.

return_typestr, default: “list”

The return type of the method. Either “str” or “list”

returns:

The extracted sentences as a concatenated string or list of passages (defined by return_type).

rtype:

str | list[str]

raises ValueError:
  • If context_window_sentence is neither an integer nor a list of length 2.