infineac.file_loader#

Imports and structures the earnings calls data from xml files.

Examples

>>> import infineac.file_loader as file_loader
>>> import spacy_stanza
>>> nlp_stanza = spacy_stanza.load_pipeline("en", processors="tokenize, lemma")
>>> nlp_stanza.add_pipe('sentencizer')
>>> PATH_DIR = "data/transcripts/"
>>> files = list(Path(PATH_DIR).rglob("*.xml"))
>>> events = file_loader.load_files_from_xml(files)

Notes

The earnings calls are stored in xml files. load_files_from_xml() is the main function of the module, that loads the xml files, extracts the relevant information and stores it in a list of dictionaries.

Functions

add_info_to_event(event, element)

Adds information to given event based on the element of an xml file.

create_blank_event()

Creates a blank event with the keys that are expected in the final output.

extract_info_from_earnings_call_body(body)

Extracts information from the body of a conference call.

extract_info_from_earnings_call_part(part, ...)

Extracts information from an earnings call part.

extract_info_from_earnings_call_structured(...)

Extracts information from an already structured earnings call.

get_participants_position(participant, ...)

Returns the position of the participant based on the lists of corporate and conference call participants.

load_files_from_xml(files)

Parses the xml files and extracts the information from the earnings calls.

participants_list_collapsed(participants_list)

Collapse the participants list into a list of strings.

participants_string_to_list(participants)

Split the participants string into a list of participants.

structure_earnings_call(string)

Separates and structures the earnings call into its parts and returns them as a dictionary:

transform_unlisted_participants(participant, ...)

Transform unlisted participants to be identified among the listed ones.