SESAMm has a large data lake of more than 20 billion articles (growing by 5–10 million a day) and 14 years of data in 100 languages. But its size alone is not what makes it good; it’s a refined process to find the exact data you want that makes it better.
Here’s an example to help explain the point. We’re sometimes asked for help researching data to forecast and monitor the commodities market, even by large companies with their own commodities desk of traders and quant researchers. Why would they seek help from outside their firm?
Simply put, traders want an edge. They want information advantages that others are likely to miss, so they look to alternative data from various sources, anything that adds value and is from different angles. And, as it turns out, commodities are a more challenging segment to analyze when it comes to alternative text data. Unlike for companies, commodity texts are scarcer and need more domain knowledge to unravel their implications. A simple sentiment analysis doesn’t bring enough relevant information.
For a more in-depth view, join us as we discuss NLP-derived alternative data, its benefits, challenges for researchers, and why bigger isn’t always better in the world of data.
Read the full article on Quant Finance.