There is no doubt that financial industry is more and more aware of the value of alternative data. The number of alternative data platforms and data vendors is rapidly increasing. More and more companies are setting up a ‘Chief of Data’ role and data science teams to experience use cases on alternative data. Investment also increases, when for example Nasdaq acquired Quandl to advance the use of alternative data.
Complexities of Converting Alternative Data
There is a rising focus on alternative data, however the industry is not yet fully aware of the nature and opportunities of alternative data, nor appreciate the complexities on processing it.
Alternative data is diversified. By definition, any data apart from static or market data is classed as alternative data. It can be structured or unstructured, in the format of text or speech or image, multi-lingual, and sometimes even in hard copy. Most alternative data vendors specialise on a niche domain, e.g. credit card transactions, property transactions, satellite images, news sentiment etc. It’s difficult for end users to know what the available data is. Luckily the gap is getting filled with rising number of intermediate platforms that match between vendors and users.
Requirements are diversified. Let’s take sentiment as an example, which is not a new concept for financial industry anymore. However it’s still the most well-known way of utilising text data, e.g. news and social media. Purchasing sentiment from a data vendor has its limitations: Sentiment score is calculated in a black box algorithm. Too much information is lost when converting text into a numerical value. Alpha can be diluted if too many users consume the same data. Text data – news, filings, research reports, contains vast information that is proved to be forward looking and predictive. It can be used not only by quant funds, but broadly buy and sell side for stock selection, thematic trading, portfolio management, risk management, investment research, etc. The diversified requirements and applications bring huge challenges to vendors as it is difficult to pre-process everything and sell as off-the-shelf products.
Too much focus on data, with little focus on processing. Alternative data is the source of alpha, but the data itself is not alpha unless being processed properly. For example, there are vendors sourcing and distributing research reports, which is a valuable source of alternative data, especially when under MIFID 2 regulations. With a reasonable price, it’s not a sizeable challenge for a fund to purchase huge volume of reports. What to do with that data is what’s key. However due to lack of understanding of available technologies and long development cycles, most users only extract shallow information or do key word search.
Lack of solution vendor. There are large solution vendors on static and market data who provide off-the-shelf tools and frameworks for data cleansing, storage, and calculation. But it is a different story in the alternative data domain. Funds and investment banks are very protective when it comes to processing alternative data and tend to spend too much resource on building technical capabilities before starting to build business cases. The growth of solution vendors will be a sign of maturity for utilising alternative data, because they focus on building standard processing functionalities. It will be cost effective and accelerate clients’ development cycle. By purchasing a technical solution and support, funds and investment banks can be more focused on business related calculation logics, which is the true business secret to be protected from competitors.
Best practices for utilising alternative data
Business takes ownership. Instead of looking at an alternative data vendor and extracting available market data, the business should define its requirements, potential use cases and then select the appropriate data and solution vendors.
Focus/invest more on processing data. Traditionally, end users rely on data vendors to process alternative data to a level that can be directly used as a signal. They should have the control on how raw data is processed. The more end users are involved in the process, the better they can control what information is to be extracted.
A systematic approach. In many cases, alternative data is big data, which is 3V – high volume, high velocity, high variety. Generating historical data and run an offline POC is acceptable for the first step of validating idea and requirements, however a robust IT system and support is required to support use cases in production.