NLP | Alternative Data | Big Data

What Investors Ought to Know About Natural Language Processing: A Quick Guide

July 13, 2022

•

5 mins read

In this issue of the "what investors ought to know about…" series, we'll cover natural language processing (NLP), a tool that draws from the computer science and computational linguistics disciplines. In the last topic, we discussed knowledge graphs as the core of text analysis. And if knowledge graphs are the core of the data’s context, NLP is the transition to understanding the data.

What is natural language processing?

Natural language processing is an artificial intelligence (AI) technology that automates the data analysis of mined textual, unstructured data to include natural language understanding and natural language generation to simulate a human's ability to create language. It combines computational linguistics with machine learning and deep learning models, performing a special linguistic analysis by algorithms so a machine can "read" text.

Where is natural language processing used?

Today, various industries use NLP, from email filters to virtual assistants and search engines to chatbots. Here's a list of common ways natural language processing is used:

Chatbots: Chatbots are computer programs that use NLP. They simulate human conversation by identifying a sentence's intent, determining suitable topics, keywords, and emotions, and calculating the best response based on the data's interpretation.
Email filters: Email filters apply machine learning using many data samples to sort emails into the right inbox.
Machine translation: Translation software like Google Translate or Microsoft Translator use NLP to translate text from one language to another, such as English to French.
Natural language generation (NLG): NLG, a subfield of NLP, builds applications or computer systems that can automatically produce natural language texts of various types by using a semantic representation as input. Applications of NLG include question answering and text summarization.
Predicting and autocorrecting text: Predictive text and autocorrect use NLP to recognize and recall commonly used words and names to make text suggestions and correct common errors.
Search engines: Search engines like Google search use NLP machine learning to interpret a searcher's intent and provide relevant results. It can even suggest subjects and topics related to the query the searcher might be interested in.
Virtual and voice assistants: Virtual assistants like Apple's Siri or Amazon's Alexa use NLP technology to understand and respond to voice requests. Speech-to-text can dictate messages and notes, and speech recognition can control everything from smartphone apps and smart speakers to thermostats and home security systems.
Web sentiment analysis: Sentiment analysis automates classifying opinions in a text as positive, negative, or neutral. It's a method companies like SESAMm use to monitor sentiments like a brand's sentiment on the web and social media.

Why natural language processing is important to uncover financial-related alternative data

NLP is important because it helps resolve human language ambiguity in big datasets (big data). Languages are complex, diverse, and expressed in unlimited ways, from speaking hundreds of languages and dialects to having a unique set of grammar and syntax rules, slang, and terms for each. In text form, these variables are unstructured text. But with NLP, we can transform unstructured data into structured data and make sense of it.

Because of NLP's power, investors can research and analyze unstructured data from the web to gain insights into financial and ESG data. You can use this wealth of information to focus on systematic data processing, risk management, and alpha discovery through contexts, such as:

Major global indices sentiment
Euronext exchange sentiment
Private company sentiment
ESG risks for public and private companies worldwide

A quick overview of how natural language processing works at SESAMm

At SESAMm, we use named entity recognition (NER), which extracts the names of people, places, and other entities from text, and then named entity disambiguation (NED) to identify named entities based on their context and usage. For example, text referencing "Elon" could refer indirectly to Tesla through its CEO or a university in North Carolina. NED considers the context when classifying entities for an accurate match. Compared to simple pattern matching, which limits the number of possible matches, requires frequent manual adjustments, and can't distinguish homophones, NED is superior.

*Process representation for NER and NED.*

When identifying entities and creating actionable insights, SESAMm uses three other NLP tools: lemmatization and stemming, embeddings, and similarity. The lemmatization process normalizes a word into its base form (morphology) to help identify and aggregate entities. Embedding assigns the entity a numerical value to help analyze how words change meaning depending on context and understand the subtle differences between words that refer to the same concept—similarity measures whether two words, sentences, or objects are close to one another in meaning.

*Representation of nodes in a knowledge graph.*

Of course, NLP couldn't function without the core of the text analytics process: knowledge graphs. A knowledge graph is a digital representation of a network of real-world entities, the foundation of a search engine or question-answering service. This structured data model puts the schema in context through semantic metadata and linking, providing a framework for analytics, data integration, sharing, and unification. In other words, it's like a map and legend, with the legend labeling the concepts, entities, and events and the map connecting and identifying their relationships. These details are stored in a graph database and visualized as a graph representation, hence the term knowledge graph.

SESAMm's natural language processing platform for investment research and analysis

SESAMm is the leading provider of natural language processing and machine learning solutions and analytics for investment firms and corporations.

Our AI and NLP platform, TextReveal®:

Analyzes text in billions of web-based articles and messages
Generates investment insights and ESG analysis used in systematic trading, fundamental research, risk management, and sustainability analysis
Enables a more quantitative approach to leveraging the value of web data that's less prone to human bias
Addresses a growing need in public and private investment sectors for robust, timely, and granular sentiment and ESG data

For a personal demo, contact us today.

Related Blogs

ESG | AI | Risk Management

5 Telltale Signs It's Time to Use AI to Mitigate Risk on Your Portfolios

October 13, 2023

•

5 mins read

One of the biggest challenges in risk monitoring is sifting through mountains of irrelevant data. Whether you're using search engines, financial news platforms, or even specialized in-house analytics, you end up with too much noise. Scrolling through to page 12 of Google is not only time-consuming, but leaves you with the nagging feeling that you could still be missing something.

Artificial Intelligence (AI) is a hot topic, with new breakthroughs and possible applications popping up every day. The question is no longer simply “can AI help me with that?” but rather “how can I use AI to help with that?” For Environmental, Social, and Governance (ESG) controversy and risk monitoring, AI is used to sift through enormous data sets at unparalleled speeds, bringing critical insights to the forefront faster and more efficiently than humanly possible.

When there are hundreds of companies to monitor, for example in a large investment portfolio or a group of suppliers, the advantages of AI are obvious. But what about smaller portfolios? How do you know it’s time to start using AI? Based on our experience working with private equity firms, asset managers and commercial banks, we’ve pulled together five signs that it’s time to consider AI.

1. Overwhelmed by Data: There's Too Much Noise

An AI-powered tool filters out the noise, even in situations where seemingly only humans would be able to do it, giving you the peace of mind that there’s no controversy lurking in the dark corners of the web. All of the key information is gathered in one place, ready for you to evaluate and decide the best course of action.

2. Difficulty Finding Critical Information: The Black Hole of Private Companies

On the flip side - sometimes instead of finding too much data, you can’t find any data at all. For private companies, information can be scarce, especially for smaller companies based overseas, where the only news coverage is local and in the local language. In this case, ESG ratings agencies often aren’t able to fill the gap either. There are millions of firms worldwide and less than 50,000 of them are covered by rating agencies (source).

AI, on the other hand, enables systematic coverage and statistically relevant results without human intervention, analyzing millions of websites and providing coverage on millions of public & private companies. If you are struggling to find information on a company, AI might be the answer.

3. Can't Accurately Analyze an Event: The Context is Missing

Beyond the actual controversy or event itself, understanding the context and history around it is essential for risk assessment. Is this a one-off concern or part of a recurring pattern? To get the full picture, you need to take a closer look not only at the company in question, but the key players, i.e. key executives, and the industry as a whole to understand if this is within the norms.

AI has an important role to play here also. By simply expanding the search, AI can provide you with a full picture of the controversy, including a quick summary and a benchmark against competitors in just a matter of minutes.

4. Missed Critical Window for Action: The Cost of Inefficiency

Markets can change quickly - and it’s only getting worse as information is spreading faster and more widely. The more time it takes you to gather and analyze information, the less time you have to react. This can be a challenge whether you are monitoring 30 companies or 100’s. If you find yourself trapped in a cycle of reacting to news rather than acting proactively, AI can help. Because AI scans and analyzes information in seconds, the alerts to potential controversies are in near real-time, allowing you as much time as possible to take action.

5. Missing ESG Expertise: The Knowledge Gap

To top it all off, ESG is complex and constantly evolving. Understanding what data is relevant and how to evaluate it requires real expertise. ESG rating agencies provide some guidance, but they typically leverage self-reported data - which is naturally biased. Take greenwashing for example where a company misleads its stakeholders, investors, and consumers about its environmental practices by communicating positive environmental performance contrary to its actual, less positive execution. It’s difficult to identify greenwashing using self-reported data.

Because AI relies on external stakeholders, such as online forums or news sources, it offers an unbiased take on a company’s ESG performance. Additionally, by choosing an AI with ESG expertise built-in, you benefit from an expert analysis without increasing the burden on your team.

As the speed and amount of information available continues to grow, AI offers a scalable way to monitor your partners, suppliers and portfolio. To learn more and find out if AI is a good fit for your company, contact our experts at SESAMm.

ESG

November’s Fundamental Human Rights Controversies: A Market Analysis

December 16, 2025

•

5 mins read

Human rights concerns took center stage in November, with rising scrutiny on how digital platforms and luxury brands safeguard vulnerable users and workers. Across the market, allegations of child exploitation, extremist activity, and labor abuses exposed significant governance and oversight gaps. The month’s top three most controversial companies were Roblox Corporation, Snap Inc. (Snapchat), and Tod’s, each facing escalating legal and regulatory pressure.

#1: Roblox Corporation: Intensifying Allegations of Child Exploitation

Roblox, the video game developer, experienced a surge of human rights–related controversies, driven by lawsuits and criminal cases involving child exploitation and online extremism. Multiple families in the United States filed suits alleging that predators used the game to groom and coerce minors, in some cases leading to severe psychological harm.

Regulators also increased pressure. The Texas Attorney General sued the company, accusing it of violating safety laws and misleading parents about the risks associated with young users. Additional criminal cases surfaced in the US, Ireland, and Argentina, where adults were convicted of grooming minors through Roblox. The company faced further backlash after its CEO referred to the child predator crisis as an “opportunity,” prompting criticism even as Roblox highlighted new age-verification tools aimed at improving safety.

#2: Snap Inc.: Social Messaging Platforms Under Renewed Scrutiny

Snap Inc., known for its messaging app Snapchat, emerged as the second-most controversial company in November following several serious incidents involving minors. In the United States, a missing 13-year-old girl was found in a Pennsylvania basement after meeting a man on Snapchat, who has since been charged with human trafficking and sexual assault. Another investigation led to the arrest of a New York man after Snapchat flagged suspected child sexual abuse material on his account.

At a broader level, new research from the Canadian Centre for Child Protection revealed widespread online sexual violence among youth, with Snapchat cited as one of the primary platforms involved. The company was also named in a major lawsuit filed by US school districts against Meta, Google, Snapchat, and TikTok, alleging that platforms suppressed internal research on youth harm and failed to implement meaningful protections.

#3: Tod’s: Supply Chain Labor Abuses Trigger Legal Action

In the luxury sector, Tod’s faced heightened scrutiny following new developments in an ongoing investigation into labor exploitation at its supplier factories. Italian prosecutors expanded their probe into three company executives, citing evidence of serious labor violations involving 53 workers employed by subcontractors. Issues raised included long working hours, low wages, inadequate safety standards, and poor living conditions.

Authorities also highlighted potential negligence and omissions by management, arguing that Tod’s failed to act on inspection findings that documented the abuses. Prosecutors have requested a six-month advertising ban and previously sought judicial administration over the company’s supply chain controls.

Conclusion

November’s top controversies underscore increasing pressure on companies to ensure robust human rights protections, both online and across global supply chains. As regulators, law enforcement, and civil society intensify oversight, firms in technology and consumer markets face rising expectations to demonstrate stronger safety systems, transparent governance, and proactive risk management.

Reach out to SESAMm

TextReveal’s web data analysis of over five million public and private companies is essential for keeping tabs on ESG investment risks. To learn more about how you can analyze web data or to request a demo, reach out to one of our representatives.

Case Study | Text Analysis | Risk Management

Case Study: Tokio Marine Uses NLP to Predict Stock Price Movements

October 27, 2022

•

5 mins read

Tokio Marine & Nichido Fire Insurance Co., Ltd. (TMNF) tapped SESAMm for a joint research venture to predict future stock price movements. SESAMm provided various NLP indicators, such as digital sentiment calculated for single stocks or indices (seen as an entity), as well as its experience in machine learning to work on this task.

These studies concluded with two key findings:

Relationships exist between NLP data from news and social networking sites and investor behavior under specific circumstances. Researchers and investors can use the “digital sentiment” as an indicator of investor sentiment to anticipate price changes. They can then use this anticipation for a specific company or, more generally, any entity that can be isolated in a text (like an index).
By focusing on more stressed situations, like the 2015 market sell-off, the U.S.-China trade war, the coronavirus pandemic, and the start of the Ukrainian crisis, we could show that digital sentiment is beneficial in times of significant stress in the market. Digital sentiment more accurately reflects the stress level in these complicated situations. It, therefore, helps to predict stock price movements more accurately in these stressed cases, providing a tail hedge. It’s not biased by an excess of confidence linked to the “central banks put” for instance.

Providing safety and security since 1879

Tokio Marine Insurance Company was first established in 1879. Over the years, it has added products and services, acquired other businesses, and merged with other companies to eventually become Tokio Marine & Nichido Fire Insurance Co., Ltd. Commonly called Tokio Marine Nichido today, the company is a property and casualty insurance subsidiary of Tokio Marine Holdings, the largest non-mutual private insurance group in Japan. Its products and services provide safety and security to its clients and partners, contributing to more fulfilling lifestyles and business development.

One of the company’s philosophies is to be a good corporate citizen and fulfill its social responsibilities, including protecting the global environment, promoting human rights, creating a responsible working environment, and contributing to society and individual local communities. Recently, the Emperor of Japan awarded Tokio Marine Holdings, Inc. the Medal with Dark Blue Ribbon for donating to the Japan Student Services Organization to support students who face financial difficulty during the

COVID-19 pandemic. Individuals, corporations, or organizations are awarded the Medal with Dark Blue Ribbon for their outstanding contributions to the public.

Transforming and accepting the challenge to grow

According to TMNF, “The business environment surrounding the insurance industry is changing at a faster pace than ever due to changes in demographics, advances in technologies, such as autonomous driving and AI, and longer-term trends, such as the intensification and frequent occurrence of natural disasters, as well as further progress in digitalization due to the COVID-19 pandemic.”

“The business environment surrounding the insurance industry is changing at a faster pace than ever…”

“While these changes in the business environment pose a threat, we consider them to be excellent opportunities for transformation and the creation of new value.” So they’ve adopted the concept, “Transformation (“X”) and Challenge to Growth 2023: Aiming to be the company most chosen for quality and its passion.” Ultimately, it strives to support customers and local communities in times of need while contributing to social responsibility. Five social issues that it will prioritize are:

Global climate change and the increase in natural disasters
The increased burden of long-term care and healthcare due to the aging of society and advances in medical technology
Technological innovation and its effects on the environment
Symbiotic society and responding to the novel coronavirus
Industrial infrastructure and how it supports economic growth and innovation

Leveraging a partner with the right technology

To secure and protect its clients’ assets while elevating social issues, Tokio Marine Nichido sought out an edge in the stock market. Under these circumstances, it was fortunate that TMNF discovered SESAMm in 2020 through the Plug and Play Japan program, a platform with an event that connects Japan to markets abroad. SESAMm had presented its NLP alternative data solution, TextReveal®, to which TMNF considered the platform for access to alternative data and sought collaboration with the SESAMm team for a research project.

“SESAMm has the technology to extract sentiment from news data with a neural network.”
– Tokio Marine & Nichido Fire Insurance Co. Ltd representative

Extracting relations between NLP data and the financial market

In 2021, Tokio Marine Nichido Insurance began collaborating with SESAMm to develop an AI analytics model for alternative data. It models the effect of news and social networking data on investor behavior for stock and bond markets. In other words, it structures text information into knowledge usable by TMNF.

Monitor risks and topics

NLP data can improve the understanding of the market’s behavior by exhibiting the most important topics over time, with a direct indication of the importance of the topics through the text volume (Figure 1).

Main topics in the U.S. market since 2015 — *Figure 1: Automatic detection of the main topics in the U.S. market since 2015, thanks to topic modeling.*

Researchers can also use it to focus on a specific topic or a certain period. For instance, a short analysis of the most frequent keywords in the press, which preceded the market fall during the COVID-19 pandemic, showed the significant predominance of pandemic-related terms (Figure 2).

S&P 500-related articles word cloud between 17 Jan. 2020 and 19 Feb. 2020 — *Figure 2: Most frequent keywords in English S&P 500-related articles between 17 Jan. 2020 and 19 Feb. 2020.*

Focusing on the equity market

NLP tools provide specific data, like sentiment, to get more detailed information at the company level and for many underlyings. Indicators for equity indices, for instance, can be calculated and provide a clean sentiment to monitor markets.

In many situations of stress over recent years, such sentiment proved to be an early indicator of the market’s future degradation. For example, there was a time lag of as long as a month between the time COVID-19 became the main news focus and the time it affected the U.S. stock market. By using SESAMm’s technology to analyze news data during this period, the team found that the U.S. digital sentiment had already deteriorated sharply before stock prices reacted (Figure 3).

Chart comparing U.S. news sentiment to Hang Seng Index and S&P 500 Index — *Figure 3: In 2020, U.S. news sentiment falls ahead of the stock market in response to COVID-19 concerns.*

This sentiment deterioration occurred because of the fear of the coronavirus’s spread’s effect on the global economy (see Figure 2). Even with an all-time high S&P 500, U.S. investors didn’t initially consider this risk. In comparison, HSI companies were closer to the coronavirus spread risk. So as a result, HSI investors reacted ahead of their U.S. counterparts. In other words, by using natural language data, it was possible to capture a risk overlooked by U.S. investors but related in the publicly available texts and take action ahead of the market deleveraging.

Generalizing the results to the credit market

Tokio Marine Nichido also expanded the scope of the research to U.S. high-yield bonds index trade. In the credit market, a high yield has a high beta, which makes its risk comparable to the equity market.

Research shows that, on a risk-adjusted basis, the NLP-data-built signal has a positive and consistent performance through the timeline compared to the U.S. HY T.R. index benchmark (Figure 4). Its performance has a low correlation with the index (Figure 5), so the sentiment is diversifying. It not only acts as a diversifier but delivers higher returns than the benchmark when the U.S. High Yield market sold off (Figure 6). As such, the NLP signal diversifies, hedges, and protects against adverse periods. It provides a mechanical pick-up in risk-adjusted return when running alongside traditional strategy.

SESAMm model performance chart at equi-volatility level — *Figure 4: An NLP-informed signal has positive and consistent performance. The volatility level is the same for both curves.*

*Figure 5: The NLP signal and market daily performances are de-correlated.*

Signal performances during major market sell-offs, backtests equi-volatility level — Figure 6: The NLP signal delivers higher performance during adverse periods.

The NLP signal outperforms the index in realistic backtest conditions, including long allocation only, turnover constraints, and trading fees (Figure 7). The quantitative model integrates some macro indicators, but the previous NLP signal induces the main source of outperformance and risk mitigation.

High-yield model strategy comparison chart — *Figure 7: An NLP-informed high-yield strategy outperforms the U.S. high-yield total return index.*

TMNF is also applying the research to estimate the Fed’s stance—hawkish or dovish—using natural language data, too. It hypothesizes that the market will be focused on the Fed’s stance on interest rate hikes in the next few years.

“The model developed in collaboration with SESAMm is simple in structure, yet, it’s an orthodox and robust model that uses valid data as input.”

Summarizing the collaboration

In developing models, Tokio Marine Nichido believes it’s essential to consider “what data to consider” and to keep it simple. And TMNF achieved these tenets. The model developed in collaboration with SESAMm is simple in structure, yet, it’s an orthodox and robust model that uses valid data as input which is preferable to a risky over-fitting by increasing complexity.

Get in touch with SESAMm

To learn more about Tokio Marine Nichido’s case study or to request a TextReveal demo, reach out to us.

What Investors Ought to Know About Natural Language Processing: A Quick Guide

What is natural language processing?

Where is natural language processing used?

Why natural language processing is important to uncover financial-related alternative data

A quick overview of how natural language processing works at SESAMm

SESAMm's natural language processing platform for investment research and analysis

Related Blogs

Stay ahead with the latest in ESG and AI intelligence

Solution

Others

Resources

About