infineac.file_loader#

Imports and structures the earnings calls data from xml files.

Examples

>>> import infineac.file_loader as file_loader
>>> import spacy_stanza
>>> nlp_stanza = spacy_stanza.load_pipeline("en", processors="tokenize, lemma")
>>> nlp_stanza.add_pipe('sentencizer')
>>> PATH_DIR = "data/transcripts/"
>>> files = list(Path(PATH_DIR).rglob("*.xml"))
>>> events = file_loader.load_files_from_xml(files)

Notes

The earnings calls are stored in xml files. load_files_from_xml() is the main function of the module, that loads the xml files, extracts the relevant information and stores it in a list of dictionaries.

Functions

`add_info_to_event`(event, element)	Adds information to given event based on the element of an xml file.
`create_blank_event`()	Creates a blank event with the keys that are expected in the final output.
`extract_info_from_earnings_call_body`(body)	Extracts information from the body of a conference call.
`extract_info_from_earnings_call_part`(part, ...)	Extracts information from an earnings call part.
`extract_info_from_earnings_call_structured`(...)	Extracts information from an already structured earnings call.
`get_participants_position`(participant, ...)	Returns the position of the participant based on the lists of corporate and conference call participants.
`load_files_from_xml`(files)	Parses the xml files and extracts the information from the earnings calls.
`participants_list_collapsed`(participants_list)	Collapse the participants list into a list of strings.
`participants_string_to_list`(participants)	Split the participants string into a list of participants.
`structure_earnings_call`(string)	Separates and structures the earnings call into its parts and returns them as a dictionary:
`transform_unlisted_participants`(participant, ...)	Transform unlisted participants to be identified among the listed ones.