infineac.file_loader#
Imports and structures the earnings calls data from xml files.
Examples
>>> import infineac.file_loader as file_loader
>>> import spacy_stanza
>>> nlp_stanza = spacy_stanza.load_pipeline("en", processors="tokenize, lemma")
>>> nlp_stanza.add_pipe('sentencizer')
>>> PATH_DIR = "data/transcripts/"
>>> files = list(Path(PATH_DIR).rglob("*.xml"))
>>> events = file_loader.load_files_from_xml(files)
Notes
The earnings calls are stored in xml files. load_files_from_xml() is
the main function of the module, that loads the xml files, extracts the
relevant information and stores it in a list of dictionaries.
Functions
|
Adds information to given event based on the element of an xml file. |
Creates a blank event with the keys that are expected in the final output. |
|
Extracts information from the body of a conference call. |
|
|
Extracts information from an earnings call part. |
Extracts information from an already structured earnings call. |
|
|
Returns the position of the participant based on the lists of corporate and conference call participants. |
|
Parses the xml files and extracts the information from the earnings calls. |
|
Collapse the participants list into a list of strings. |
|
Split the participants string into a list of participants. |
|
Separates and structures the earnings call into its parts and returns them as a dictionary: |
|
Transform unlisted participants to be identified among the listed ones. |