There’s a number of available algorithms for both objective and subjective extractions and depending upon the task at hand, one or more of the below will be utilised on client projects.
Algorithms to extract text from documents, split into sentences, etc.
For simple objective extractions, e.g. attributes from consistent formats, rules are the most efficient approach. Rules can be explicitly defined and users can be trained to configure them, ranging from keyword matching and regular expressions to linguistic pattern-based matching.
Template based extraction
For lengthy documents, e.g. prospectus and annual reports, where extractions rely on document structure, a template is the best method to handle it.
For more straight forward documents, e.g. invoices and short contracts that have huge volume and inconsistent formats, it can be easier to convert documents to images and run Optical Character Recognition to extract.
Model based extraction
For tasks where it’s harder to define patterns or rules, the only feasible approach is to develop bespoke models.
ESG classification model
Orbit provides a sentence-level ESG topic classification model off-the-shelf to help locate relevant content more efficiently.
Orbit has developed a bespoke framework that takes a natural language question and looks for candidate answers against pre-defined content to locate relevant information.
Sentence level event detection
A deep learning model that runs on public documents, for example news, to detect entity and pre-defined events at a sentence level. A typical use case is to scan news for ESG controversies with a high confidence level.
Generate summary from news, research reports, and more.
A multi-lingual sentiment model on article and sentence levels.