Forced labor is often assumed to be a problem of distant supply chains. The case of Packers Sanitation Services Inc. (PSSI) dismantles that assumption entirely.
Forced labor is often assumed to be a problem of distant supply chains. The case of Packers Sanitation Services Inc. (PSSI) dismantles that assumption entirely.
PSSI was a leading U.S. industrial cleaning contractor, servicing major meatpacking plants and backed by a top-tier private equity firm. Yet between 2022 and 2024, it became the center of one of the most significant child labor scandals in the U.S., one that had been quietly signaling its risks for years. SESAMm's controversy monitoring platform captured those early signals long before regulators intervened.
The Scandal
In November 2022, the U.S. Department of Labor discovered that PSSI had employed minors as young as 13 in hazardous overnight roles across 13 locations in 8 states. A federal investigation confirmed 102 children had been illegally employed, many handling dangerous chemicals and machinery. Three years earlier, in 2019, PSSI had already been sued for wage violations. The signal was there. It went unheeded.
The Fallout
The consequences were swift. A $1.5 million DOL fine. Contract terminations by Cargill and JBS. A DHS trafficking investigation. A replaced CEO. By late 2024, PSSI had shut its corporate office entirely. Even the private equity owner, Blackstone, faced direct scrutiny from pension funds, a reminder that labor violations travel up the ownership chain.
The Lesson
Every warning sign in this case was publicly visible before the crisis broke out. Wage lawsuits, labor complaints, and media coverage are all available in the public domain. Real-time controversy monitoring can surface these signals early, giving companies and investors the chance to act before exposure becomes unavoidable.
Forced labor is not only a humanitarian crisis. It is a material risk that demands better data, earlier detection, and stronger accountability.
Download the full case study infographic to see the complete timeline of events and key takeaways
Imagine finding out you've run out of milk immediately after pouring a bowl of cereal. Or maybe realizing you don't have eggs while in the middle of baking a cake. We've all been there, and it's frustrating, to say the least. And this scene has been playing around the globe over the last couple of years for many foods and products. One day it's microchip shortages, and the next, it's baby formula.
Unfortunate as it is, it's one thing for consumers to cope with an empty car lot because of chip shortages. It's another to cope with a hungry infant because store shelves that once contained baby formula are now bare. For those parents and caretakers, their emotions are beyond feeling frustrated. They feel anger and panic, the sort of emotions that they share with their friends and colleagues on social media and forums. The kind of expression that can change the public's sentiment about a company, which in turn can move markets.
This Alternative Data Trends post will examine web data concerning the baby formula shortage. We'll analyze articles, social media, and forum conversations culminating in the U.S. crisis as the news reaches national exposure. We'll also highlight red flags investors could've seen had they monitored the situation with an AI-powered text analysis tool like SESAMm's TextReveal®.
Early warnings: When baby formula supplies began to run dry vs. when it became a national crisis
If we compare absolute and relative volumes—relative being mentions about the topic compared to our entire data lake—the term "formula milk market" yields parallel results. Mentions spike in May when the crisis reaches national coverage (see Figure 1).
Figure 1: Absolute and relative mention volumes for “formula milk market” match.
However, comparing absolute and relative volumes for the term "formula milk shortage," we find red flags as early as January 2022, four months before the crisis receives national attention (see Figure 2). Relative mentions spike on three occasions before absolute volumes register any significant noise. The fourth instance matches a ripple on the absolute chart.
Figure 2: Relative mention volumes for “formula milk shortage” show possible controversies.
These articles provide an example of the content published around the times of those rises in mentions:
Analyzing the sentiment and polarity of the formula milk market
In short, the e-reputation of the formula milk market has been negative since the beginning of 2022 (see Figure 3). Positive sentiment drops and reflects the opposing negative sentiment almost exactly until May, when the news about the crisis breaks. Likewise, polarity trends downward over the same period.
Note: Polarity represents a company's aggregate of positive and negative sentiment (opinions, reviews), ranging from -1 to 1. A zero score means that there is as much positive as negative sentiment. High e-reputation brands can have polarity scores of more than 0.5.
Figure 3: “Formula milk market” sentiment analysis and polarity moved negatively over time
In the U.S., four brands produce the bulk of formula milk: Abbott, Mead Johnson, Nestlé, and Perrigo. Abbott and Nestlé hold the largest share of the formula milk market.
Figure 4: Abbott gains more than 75% of mention volume share in Q1 2022.
When we group these four brands' mentions from January 2021 to June 2022, we can see how their mention volumes compare (Figure 4). For example, at the beginning of the graph, we can see that Abbott and Nestlé have more mention-volume relative to their market share. However, at the end of 2021, Mead Johnson and Abbott experience spikes in mentions due to lawsuits against their formulas. Then, in Q1 2022, Abbott mentions increased drastically after its formulas were recalled due to possible contamination, taking more than 75% of the mention volume.
The baby formula market in the U.S. has been volatile for many reasons, which we won't get into in this article. However, this volatility could be seen and planned for. In this case, here are some tactics you can take to minimize your investment risks:
Employ a tool like SESAMm’s TextReveal to evaluate web data for insights into your investments. With premiere NLP technology, you can uncover sentiment and ESG insights about your industry, portfolio companies, or current investments.
Expand your research term for deeper insights. In this study, the term "formula milk market" had matching absolute and relative volumes. From this view, nothing looks out of place, and there aren't any red flags. However, when we expanded our research with the term "formula milk shortage," we found many controversies before the crisis gained national attention.
Dig into the controversies' causes. It's not enough to acknowledge a red flag. It would be best if you looked into what the potential reason is. Is the controversy caused by external factors or internal ones? Maybe both? Is the issue a one-time occurrence, or is it a pattern? So it's essential to avoid black-box tools. With solutions such as TextReveal that allow you to see beyond, you can access the underlying articles triggering the red flags.
Stay in touch with SESAMm
Thanks for reading this issue of Alternative Data Trends. Be sure to catch the next issue by subscribing to our blog. And if you'd like a TextReveal demo, send us a message via the form.
It’s a phrase that’s been thrown around for the last two or three decades—maybe too much in some cases. But it’s a short, catchy phrase. It sums up how we want to describe the amount of data we produce and have to deal with today.
To be clear, when we say “big data,” we mean big data analytics. It’s so much data that we can’t possibly grasp it in any human way, at least not reasonably. It’s coming from everywhere, growing exponentially, and coming at us faster and faster every day. In other words, the person-power it would take to process and analyze big data wouldn’t be feasible or affordable. So, we need help. We need data science. And we need a different type of intelligence: artificial intelligence. But more on that later.
Obviously, the use of big data comes with challenges. But big data initiatives are worth the cost and effort because what we can extract and analyze from it helps us understand the world and how it works at a macro-level. It also helps us dig into details and understand what’s happening at a micro-level. For example, businesses create lots of data in the Finance and Insurance industry. So extracting and analyzing big data can provide insights for investors when making investment decisions.
What is big data in finance?
Big data in finance is the immense amounts of diverse and complex data that banks, financial institutions, and investors use to understand consumer behavior, gain insight into possible investments, and create investment strategies. In other words, this data is primarily used by and for the financial services sector.
How big is big data anyway?
How big big data is depends on the amount of data being sourced, also known as data mining. If we were to consider how much data volume the world produces, it’s “at least 2.5 quintillion bytes of data” daily, according to CloudTweaks. That’s 2,500,000,000,000,000,000 bytes.
We usually measure big data—structured and unstructured data—in petabytes (PB) and terabytes (TB). A petabyte is 1024TB or a million gigabytes (GB). To put this amount of data into perspective, let’s use the newest iPhone as an example. Today’s iPhone can store up to 1TB of data. That means 1PB would equal the amount of data 1024 iPhones can store.
Other big-data challenges
Managing big data’s size is an obvious challenge, but big data comes with even more challenges. For example, any origin that produces or stores data can be a big data source, including social media. Thus, we often gather data from disparate sources.
Big data is also ever-growing. So in dealing with an ever-growing amount of data, we must ensure proper data processing, data management, and data integrity. Our data scientists, for instance, spend a good chunk of their time curating and preparing the data to make sure it’s valuable and clean.
Finally, after we’ve ensured data quality, we need AI to help us make sense of the data we’ve curated. In our case, we use natural language processing (NLP) to read more than 20 billion articles, messages, and forums to make sense of the textual data to enable our clients with multiple use cases, including signals for investment strategies, due diligences on private companies, and ESG controversy monitoring, among others.
How big data is used in the finance industry
Big data is used in many sectors and industries, and in some cases, it’s changing financial business models. However, big data technology has been used in the financial services industry in three key ways: to gain stock market insights, to detect and prevent fraud, and accurately analyze risk.
For instance, through machine learning—using computer algorithms to find patterns in massive amounts of data—data scientists can conduct a deeper data analysis in the financial markets beyond stock market data like stock prices, considering factors such as social and political trends. In some cases, this big data analysis can be provided in real time.
Machine learning also helps with fraud detection. It helps mitigate security risks through monitoring and analyzing customer data like buying patterns around credit cards, for example.
Further, machine learning helps with risk management. Investors can rely on machine learning’s unbiased output from alternative and financial data to predictive analytics, helping identify potential risks or great investment opportunities. Banks use these strategies to analyze business borrowers’ potential defaults, for example.
Other areas big data can provide a competitive advantage in the fintech industry:
Algorithmic trading
Chatbots and robotic process automation
Customer segmentation
Customer satisfaction
SESAMm leverages AI and big data for better investment decisions
SESAMm is a leading NLP technology company, and we serve global financial organizations, corporations, and investors, such as private equity firms, hedge funds, and other asset management firms. We provide datasets or NLP capabilities to enable our clients to generate their own alternative data for use cases, such as ESG and SDG, sentiment, private equity due diligence, corporation studies, and more. With access to SESAMm’s massive data lake, made up of more than 20 billion articles, forums, and messages, our clients can improve their decision-making process.
Request a TextReveal® demo to see how you can leverage big data for your investment decisions today.
Researching and analyzing investment opportunities can be challenging for asset management—private equity and hedge fund portfolio managers, researchers, and analysts—because, of course, you want to make sure that you're a good steward of your client's investments.
And when you find and source data, such as traditional or alternative data, you also want to make sure it's reliable and that the methods used to gather it are tried and true.
This article aims to give you an inside look into SESAMm's knowledge graph—one of the key reasons SESAMm's NLP-derived alternative data is reliable and trusted. We'll explain what a knowledge graph is, why it's important, how it works, and what makes SESAMm's knowledge graph unique.
What is a knowledge graph?
A knowledge graph is a digital representation of a network of real-world entities, the foundation of a search engine or question-answering service. This structured data model puts the schema in context through linking and semantic metadata, providing a framework for data integration, analytics, unification, and sharing. In other words, it's like a map and legend, with the legend labeling the concepts, entities, and events and the map connecting and identifying their relationships. These details are stored in a graph database and visualized as a graph representation, hence the term knowledge graph.
Fun fact: The expression, knowledge graph, gained popularity after Google used it in 2012 to name their semantic network.
Two types of knowledge graphs
There are two general types of knowledge graphs: open and private. Open knowledge graphs are open to the public. They're created and made available by organizations such as Wikidata, DBpedia, and Yago. Private knowledge graphs are often only used by organizations that create them, like Google, WolframAlpha, Facebook, and SESAMm (of course). Some offer them up for a fee or subscription, such as Crunchbase and OpenCorporates.
Why a knowledge graph is important
Knowledge graphs are important because they equip us with a model to see how everything relates from a big-picture view, creating new knowledge. Its benefits include:
Incorporating disparate data sources, avoiding data silos
From a data science and artificial intelligence (AI) perspective, knowledge graphs provide machine-readable details, adding context and depth to data-driven AI techniques such as machine learning. Using knowledge graphs and machine learning models together improves system accuracy and extends the range of machine learning capabilities for better explainability and trustworthiness.
How a knowledge graph works
The core of a knowledge graph is its knowledge model, a collection of interconnected descriptions of concepts, entities, events, and relationships known as an ontology. This model provides a framework for statements or taxonomy. Each statement consists of a subject, predicate, and object (Figure 1)—known as a triple model—and each subject or object is represented only once in the context of the other subjects and their relationships. For example, in this simple sentence, "The boy kicks the ball," The boy is the subject, and kicker is the predicate because he kicks the ball, the object.
Figure1: Apple is the subject, chief executive officer is the predicate, and Tim Cook is the object.
Likewise, each statement consists of three components: nodes, edges, and labels. A node, or vertice, represents an entity, which can be anything existing in the real world, such as a person, company, or object. For instance, in this example (Figure 2), Barack Obama is the subject node, Malia and Sasha are object nodes, and the edges, or relationships, are labeled as father or sibling, respectively.
Figure 2: How the relationships between nodes can be labeled.
What makes SESAMm's knowledge graph unique?
SESAMm uses open and private datasets with custom, curated information to create our proprietary knowledge graph. As a result, the knowledge graph is a vast map connecting and integrating over 70 million related entities and their keywords, relating each organization to its brands, products, associated executives, names, nicknames, and exchange identifiers in the case of public companies from a data repository made up of more than 18 billion articles and messages and growing.
The knowledge graph is updated regularly
Entities within the knowledge graph are updated weekly and tagged to ensure we correctly track their changes. For instance, the CEO of a company today might not be its CEO tomorrow. And brands might be bought and sold, changing the parent company with each sale. So, weekly updates within the knowledge graph ensure the system is aware of these changes.
NLP-driven accuracy
At SESAMm, named entity disambiguation (NED), a natural language processing (NLP) technique, identifies named entities based on their context and usage. Text referencing "Elon," for example, could refer indirectly to Tesla through its CEO or to a university in North Carolina. Only the context allows us to differentiate, and NED considers that context when classifying entities. This method is superior to simple pattern matching, which limits the number of possible matches, requires frequent manual adjustments, and can't distinguish homophones.
SESAMm uses three other NLP tools to identify entities and create actionable insights: lemmatization, embeddings, and similarity. The lemmatization process normalizes a word into its base form (morphology) to help identify and aggregate entities. Embedding assigns the entity a numerical value to help analyze how words change meaning depending on context and understand the subtle differences between words that refer to the same concept. Similarity measures whether two words, sentences, or objects are close to one another in meaning.
SESAMm tailored its knowledge graph to find, extract, and analyze data about public or private entities, which isn't readily available from the web or standard rating firms. This unique implementation of a knowledge graph provides insights to give you an edge when researching, analyzing, and submitting recommendations to the portfolio manager or clients.
SESAMm's premiere platform, TextReveal®, allows you to leverage NLP-driven insights fully and receive high-quality results through data streams, modular API and dashboard visualization, and signals and alerts. It's perfect for many quantitative, quantamental, and ESG investment use cases.
Learn how SESAMm can support you in your investment decision-making and request a demo today.
Sylvain Forté, CEO and co-founder of SESAMm, presented the following at Finovate 2022. In the presentation, Sylvain explains who SESAMm is, what SESAMm does, including examples, and how it benefits our financial clients.
Below is an approximation of this video’s audio content. Watch the video for a better view of graphs, charts, graphics, images, and quotes to which the presenter might be referring to in context.
Hi, everyone. Thank you very much for the opportunity to be with you today. I’m very glad to introduce you to SESAMm. I’m Sylvain, CEO and co-founder of SESAMm.
We’re an artificial intelligence company specializing in analytics for investment professionals and [corporations]. We basically extract billions of articles and messages from the web and transform them into actionable insights to make better decisions. We’re a team of close to 100 people now, and we generate insights from more than 20 billion articles and messages.
Immediate access to daily insights
Let me jump straight to the demo and give you a practical example of what we do. So imagine you’re, for example, a bank looking to compute environmental, social, and governance risks on your portfolio on your clients or on your suppliers. Right now, you may have access to ratings, which are updated once per quarter or once per year. We can give you access immediately to timely daily data on all of your companies in order for you to better assess risks and raise early warnings.
Wirecard use case
In this specific example (Figure 1), we look at Wirecard, a company that went bankrupt due to a 2 billion fraud scandal in Germany.
We extracted dozens of thousands of articles and messages on the company, and we can immediately see that there is a huge anomaly in terms of governance risk. The company is basically exposed to fraud accusations, to lawsuits, and the like, things that you don’t really want to see in your clients or your own portfolio.
Furthermore, we can see on this chart that we can get that type of indicator every single day. And we can see that six months prior to the company’s bankruptcy, there were already huge alerts actually here in January 2020, indicating that the company was in a pretty bad situation from the perspective of web content and web data from news to social platforms, blogs, and forums.
We really have the ability to compute live insights for ESG risk, sustainability monitoring, credit, and similar topics. The advantage of the platform is that we can go very deep. You can see here (Figure 2) some of the underlying governance topics associated with Wirecard, such as fraud, embezzlement, and crime—the main accusation—but also things related to anti-competitive practices or corruption.
Figure 2: Underlying governance topics associate with Wirecard.
And furthermore, the platform enables full transparency. This is AI at scale, but the underlying content is actually text articles and messages that you can read in order to understand the situation and see why the company is in that risk position. So with our platform, with our text analysis engine (TextReveal®), you can immediately extract content on your portfolio, your clients, your suppliers, and for example, generate ESG insights, competitive insights, sentiment insights, or credit warnings, for example.
Trusted, reliable, and abundant insights
We are today trusted by major financial institutions, such as Nomura [Holdings] or Raiffeisen Bank in the banking sector, for example, or large private equity firms worldwide. The reason why they trust us is that we can provide data more quickly—so waiting one day instead of waiting three months—to get an indicator. In addition to that, we have better coverage. We’re the only company in the world that can provide information on five million different public and private companies, meaning all of your banking clients, for example, are covered. And finally, we have access to a large variety of sources, from social content to news and blogs.
Insights beyond companies
Another example that is very common—sadly right now—is clients asking us to follow the Ukraine Russia War and to understand the current situation, including by getting access to local content in local languages in Ukrainian, in Polish, in Russian, to really understand the news and social media out there.
You can see here that beyond companies, we actually track sectors, infrastructure projects, and concepts.
Figure 3: A dashboard view into Nord Stream in the context of Ukraine.
Here (Figure 3), Nord Stream, for example, in the context of Ukraine specifically—so as to understand how these two topics are associated on the web—we can see an explosion in terms of volumes of data over time, the news associating this concept more and more, with more than 40,000 pieces of content. And we can see that sentiment over time, as displayed on this curve (Figure 4), decreases very rapidly, so we see the shock on e-reputation, and we can observe that immediately. And, for example, as a bank or as an asset manager, we can use that to assess the potential risk to clients or portfolio companies.
The interesting thing here is that, beyond the graphs and the raw contents, we can look at where the information comes from. Here (Figure 5), you see a lot of information in German, for example, which is not surprising. And you can even follow the Russian propaganda directly from the platform, looking at Russia Today or Sputnik straight from the engine, as these are also sources that we monitor.
Figure 5: The dashboard on Nord Stream shows sources from Germany and Russia.
And as you can see, these contents are highly customizable and can be used in very specific situations. So this is really a platform as a service (PaaS) that we offer. This is an engine that tracks four million different sources of information, and we can track millions of companies but also even fuzzy concepts, countries, or topics of interest.
Generate analytics from big data with API
One last thought. A lot of our clients integrate with our API; it’s a technical solution. We work a lot with data science teams, data engineering teams, risk teams, quantitative analysts, and heads of innovation. All of these teams are looking to generate analytics from big data and from web content at scale, with solutions that are currently used by dozens of clients worldwide and for which we provide very relevant analytics.
I’ll leave you with three final calls to action.
The first one is come see us at our booth. We would be very happy to present the solution in a bit more detail.
The second is, please request a demo. You understand that these indicators can be tailored to your needs in real time. So we’ll be very happy to show you a demo at SESAMm.com.
And finally, come see us for a free proof-of-concept (POC). We would be very happy to show you how we incorporate these solutions in actual banking tools and in risk management tools.
So the web is now readily available as a system that you can use and that you can rely on in order to generate valuable insights. We’re very happy to provide the solution to the market and to help inform better decisions and to help monitor risks.
Financial and ESG insights begin with big data coupled with data science.
At SESAMm, our artificial intelligence (AI) and natural language processing (NLP) platform analyzes text in billions of web-based articles and messages. It generates investment insights and ESG analysis used in systematic trading, fundamental research, risk management, and sustainability analysis.
This technology enables a more quantitative approach to leveraging the value of web data that is less prone to human bias. It addresses a growing need in public and private investment sectors for robust, timely, and granular sentiment and environment, social, and governance (ESG) data. This article will outline how the data is derived and illustrate its effectiveness and predictive value.
Content coverage and ESG data collection
The genesis of SESAMm’s process is the high-quality content that comprises its data lake, the source from which it draws its insights. SESAMm scans over four million data sources rigorously selected and curated to maximize coverage of both public and private companies. Three guiding criteria—quality, quantity, and frequency—ensure a consistently high input value.
Every day the system adds millions of articles to the 16 billion already in the data lake, going back to 2008. The coverage is global, with 40% of the sources in English (the U.S. and international) and 60% in multiple languages. The data lake, expanding every month, comprises over 4 million sources, including professional news sites, blogs, social media, and discussion forums.
The following tables illustrate SESAMm’s data lake distribution (Q1 2022):
Respect for personal privacy figures highly in the data gathering process. We don’t capture personal data, like personally identifiable information (PII), and respect all website terms of service and global data handling and privacy laws. SESAMm’s data also doesn’t contain any material non-public information (MNPI).
Deriving financial signals and ESG performance indicators
SESAMm’s new TextReveal® Streams platform applies NLP and AI expertise to process the premium quality content gathered in its data lake. This complex process involves named entity recognition (NER) and disambiguation (NED)—the process of identifying entities and distinguishing like-named entities using contextual analysis—and mapping the complex interrelationships between tens of thousands of public and private entities, connecting companies, products, and brands by supply chain, location, or competitive relationship.
Process representation for NER and NED
Using SESAMm’s TextReveal Streams, this wealth of information is filtered to focus on four crucial contexts for systematic data processing, risk management, and alpha discovery:
Sentiment covering major global indices: world equities (and Small Caps, Emerging), U.S. 3000, Europe 600, KOSPI 50, Japan 500, Japan 225
Sentiment covering all assets and derivatives traded on the Euronext exchange
Private company sentiment on more than 25,000 private companies
ESG risks covering 90 major environmental, social, and governance risk categories for the entire company universe, which includes more than 10,000 public and more than 25,000 private companies with worldwide coverage
TextReveal Streams data sets and assessments are used by financial institutions, rating agencies, and the financial services sector, such as hedge funds (quantitative and fundamental) and asset managers, to optimize trade timing and identify new sustainable investment opportunities. Private equity deal and credit teams also use the data for deal sourcing and due diligence. Private equity ESG teams use it to manage initiatives like portfolio company environmental, social, and governance risk and reporting.
Methodology and technology for processing unstructured data
NLP workflow, from data extraction to granular insight aggregation
Data is continually extracted from an expanding universe of over four million sources daily. As it enters the system, it is time-stamped, tagged, indexed, and stored in our data lake to update a point-in-time history extending from 2008 to the present. The source material is then transformed from raw, unstructured text data into conformed, interconnected, machine-readable data with a precise topic.
NLP workflow for TextReveal Streams
Mapping relationships between entities with the Knowledge Graph
At the heart of the text analytics process is SESAMm’s proprietary Knowledge Graph, a vast map connecting and integrating over 70 million related entities and their keywords. It’s essentially a cross-referenced dictionary of keywords, relating each organization to its brands, products, associated executives, names, nicknames, and their exchange identifiers in the case of public companies.
Entities within the Knowledge Graph are updated weekly and tagged to ensure changes are correctly tracked. The CEO of a company today, for example, may not be the CEO tomorrow, and brands may be bought and sold, changing the parent company with each sale. Weekly updates within the Knowledge Graph ensure the system is aware of these changes.
Named entity disambiguation (named entity recognition plus entity linking) is one of the NLP techniques used to identify named entities in text sources using the entities mapped within the Knowledge Graph universe.
At SESAMm, NED identifies named entities based on their context and usage. Text referencing “Elon,” for example, could refer indirectly to Tesla through its CEO or to a university in North Carolina. Only the context allows us to differentiate, and NED considers that context when classifying entities. This method is superior to simple pattern matching, limiting the number of possible matches, requiring frequent manual adjustments, and cannot distinguish homophones.
SESAMm uses three other NLP tools to identify entities and create actionable insights. These are lemmatization, embeddings, and similarity. Each is explained in more detail below.
Analyzing the morphology of words with lemmatization
News articles, blog posts, and social media discussions reference organizations and associated entities in various forms and functions. Lemmatization seeks to standardize these references so the system knows they mean the same thing.
For example, “Tesla,” “his firm,” “the company,” and “it” are all noun phrases that can appear in a single article and refer to a single entity. Even where the reference is apparent, it can take different forms. For example, “Tesla” and “Teslas” both refer to the same entity but have slightly different meanings (semantics) and shapes (morphology).
The lemmatization process standardizes reference shape (morphology) to facilitate identification and aggregation. Lemmatization is a more sophisticated process than stemming, which truncates words to their stem and sometimes deletes information.
Encoding context and meaning with word embedding
In NLP, embedding is a numerical representation of a word that enables its manifold contextual meanings to be calculated relationally. Embeddings are typically real-valued vectors with hundreds of dimensions that encode the contexts in which words appear and, thus, also encode their meanings. Because they are vectors in a predefined vector space, they can be compared, scaled, added, and subtracted. An example of how this works is that the vector representations of king and queen bear the same relation to each other as the representations of man and woman once you subtract the vector that represents royal.
Vectorized representation of embeddings
Using embedding is key to analyzing how words change meaning depending on context and understanding the subtle differences between words that refer to the same concept: synonyms. For example, the words business, company, enterprise, and firm can all refer to the same thing if the context is “organizations.” But they represent different things and even different parts of speech if the context changes.
In the phrase, “[Tesla] will be by far the largest firm by market value ever to join the S&P,” for example, one could replace the word firm with company or enterprise without affecting the meaning significantly. Contrast that with “a firm handshake,” where a similar substitution would render the phrase meaningless.
Also, words referring to the same concept can emphasize slightly different aspects of the concept or imply specific qualities. For example, an enterprise might be assumed to be larger or to have more components than a firm. Embeddings enable machines to make these subtle distinctions.
One advantage of using embedding is that it’s practical because it’s empirically testable. In other words, we can look at actual usage to determine what a word means.
Another advantage is that embeddings are computationally tractable. This understanding of a word’s definition allows us to transform words into computation objects to programmatically examine the contexts in which they appear and, thus, derive their meaning.
As lemmatization is an improvement on stemming, embeddings improve techniques such as one-hot encoding, which is close to the common conception of a definition as a single entry in a dictionary.
SESAMm uses the global vectors for word representation (GloVe) algorithm to generate embeddings. It’s an unsupervised learning algorithm that begins by examining how frequently each word in a text corpus co-occurs with other words in the same corpus. The result is an embedding that encapsulates the word and its context together, allowing SESAMm to identify specific words in a list and different forms of the listed words and unlisted synonyms.
GloVe is an extension of recent approaches to vector representation, combining the global statistics of matrix factorization techniques like latent semantic analysis (LSA) with the local context-based learning of word2vec. The result is an unsupervised algorithm that performs well at capturing meaning and demonstrating it on tasks like calculating analogies and identifying synonyms.
BERT is another algorithm used by SESAMm to generate embeddings. BERT produces word representations that are dynamically informed by the words around them. Google developed the technique, and it’s what’s known as a transformer-based machine learning technique, which means it doesn’t process an input sequence token by token but instead takes the entire sequence as input in one go. This technique is a significant improvement over sequential recurrent neural network (RNN) based models because it can be accelerated by graphics processing units (GPUs).
SESAMm uses BERT for multilingual NLP of its extensive foreign language text because it has been retained using an extensive library of unlabeled data extracted from Wikipedia in over 102 languages. BERT model was trained to predict words from context and next sentence prediction where it was trained to predict if a chosen following sentence was probable or not given the first sentence. As a result of this training process, BERT learned contextual embeddings for words. Due to this comprehensive pre-training, BERT can be finetuned with fewer resources on smaller datasets to optimize its performance on specific tasks.
Linking words, sentences, and topics with cosine similarity
Cosine similarity with centered means it’s identical to the correlation coefficient, which highlights another element of the computational tractability of the embeddings approach. It makes it easy to compare words and contexts for similarity.
Converting words to vector representations means we can quickly and easily compare word similarity by comparing the angle between two vectors. This angle is a function of the projection of one vector onto another. It can identify similar, opposite, or wholly unrelated vectors, which allows us to compute the similarity of the underlying word that the vector represents.
Two vectors aligned in the same orientation will have a similarity measurement of 1, while two orthogonal vectors have a similarity of 0. If two vectors are diametrically opposed, the similarity measurement is -1. In practice, negative similarities are rare, so we clip negative values to 0.
Vectorized representation of cosine similarities
Cosine similarity measures whether two words, sentences, or corpora are close to one another in vector space or “about” the same thing in semantic space. To answer the question, “Is this sentence referencing company X?” we embed the sentence using the process described above and compute the cosine similarity between the sentence and the embedded company profile. Analogously, we compute similarities between sentences and the ESG topics SESAMm monitors by taking the maximum similarity between a sentence and each embedded keyword associated with an ESG topic.
These similarities allow us to identify whether a sentence references fraud, tax avoidance, pollution, or any other ESG risk topic among the more than 90 that SESAMm tracks across the web.
Similarities within ESG topics combine with word counts to resolve the recall and precision problem. Word counts are precise because if a word is identified within a context, then that context, by construction, references the topic.
The virtue of using these NLP techniques is that even if a given keyword list does not include every possible combination of words that a person might use to discuss a topic, relevant entities missed by the word-count process will be identified through vector similarity.
This is the power of SESAMm’s NLP expertise. We can scan many lifetimes’ worth of data in seconds to find the concepts you explicitly ask for and the concepts relevant to your search but that you did not think of yourself.
Sentiment analysis with deep learning and neural networks
Once we’ve identified the concepts and contexts of interest in all the forms they appear, we analyze the context to determine the speakers’ attitudes.
We use sentiment classification models to score a sentence with three possible outcomes: negative, neutral, or positive. The current classification models are based on deep learning AI technologies. Specifically, we stack convolutional neural networks with word embeddings and bayesian optimized hyperparameters—parameters not learned during training. This architecture improves the accuracy and enables fast shipping of production-ready models for a given language. We also produce state-of-the-art frameworks with architecture variations enabling multilingual capabilities, such as transformers and universal sentence encoders.
Condensing information and extracting insights with daily aggregation
Similarities, embedded word counts, and sentiment are state-of-the-art tools for processing unstructured text data. The same tools are effective cross-linguistically.
Once the information has been extracted from millions of data points, it’s aggregated and condensed into actionable insights.
All entities are referenced directly or indirectly within an article. Then, sentence-level references are aggregated to obtain an article-level perspective, and finally, all relevant articles are aggregated to gain an entity-level view of that day.
In this way, reams of data are compressed into several metrics to provide a daily aggregate view for each entity, highlighting trends at a sentence, article, and entity-level comparable over a multi-year history.
ESG analysis use cases
SESAMm’s TextReveal Streams is used in various investment domains, from asset selection to alpha generation and risk management. Systematic hedge funds track retail interest in real time to identify investment opportunities and protect their existing positions. In the Private Equity industry, equity and credit-deal teams use the data in various ways, from monitoring consumer perspectives via forums and customer reviews for evaluating deal prospects to estimating due diligence risks, all to help make investment decisions. Dedicated teams use our data for monitoring portfolio companies for ESG red flags that conventional ESG reporting might miss.
Below are two examples of how aggregated TextReveal Streams data can be used to help identify investment risk and opportunity.
LFIS CapitalL: ESG signals for equity trading
ESG controversies can significantly impact asset prices in the short term, and it’s now estimated that intangible assets, including a company’s ESG rating, account for 90% of its market value.
Working in partnership with LFIS Capital (LFIS), a quantitative asset manager and structured investment solutions provider, SESAMm developed machine learning and NLP algorithms that could analyze ESG keywords in articles, blogs, and social media, to generate a daily ESG score specific to each stock, which is part of the TextReveal Streams’ platform’s core functionality.
The results were promising when these scores were incorporated into a simulated strategy for trading stocks in the Stoxx600 ESG-X index.
A simulated long-only strategy running between 2015 and 2020, using the signals, delivered a 7.9% annualized return, 2.9% higher than the benchmark for similar annualized volatility (17.3% vs. 17.1%). The information ratio of the strategy was greater than 1, with a tracking error of 2.8%. Results for the previous three years were compelling, reflecting the growing interest and news flow around ESG themes.
Researchers also backtested a hypothetical long-short strategy for all stocks in the Stoxx600 ESG-X index with a market cap of over $7.5bn. This investment strategy delivered a Sharpe ratio of approximately 1 with annualized returns and volatility of 6.1% and 5.9%, respectively, between 2015 and 2020. Like the long-only strategy, returns were particularly robust over the three years up to 2020: +6.0% in 2018, +7.3% in 2019, and +11.3% in 2020.
Finally, a simulated “130/30” ESG strategy that combined 100% of the long-only ESG strategy and 30% of the long-short ESG strategy delivered a 10.8% annualized return, 5.8% higher than that of the Stoxx600 ESG-X index. Annualized volatility was similar at 16.9% vs. 17.1%. The strategy experienced a tracking error of 3.8% and an information ratio of over 1.5, with a consistent outperformance each year.
Disclaimer: Past performance is not an indicator of future results. Theoretical calculations are provided for illustrative purposes only. The investment theme illustrations presented herein do not represent transactions currently implemented in any fund or product managed by LFIS.
Wirecard: ESG sentiment and volume as predictive indicators
The Wirecard scandal broke on June 21, 2020, when newswires carried the story that the major German payment processor had filed for bankruptcy after admitting that €1.9 billion ($2.3 billion) of purported escrow deposits did not exist.
Could SESAMm’s TextReveal Streams platform have provided investors with an early warning that the scandal was about to break?
The following chart derived from the platform shows how key ESG metrics, including ESG scores (volumes) and ESG scores (sentiment), reacted to the news.
An analysis of the charts pinpoints a shallow rise in the ESG scores (volumes) time series in the early part of June before the eruption on June 21.
The ESG scores (sentiment) metric also shows a steady increase in negative sentiment for governance, the most relevant of the three ESG factors regarding the scandal.
How key ESG metrics, including ESG scores (volumes) and ESG scores (sentiment), reacted to the Wirecard scandal news.
Additionally, before the crash, governance was the most negative of the three ESG factors most of the time. This was especially the case from late March to early April, and then before the scandal in early June, negative governance sentiment diverged higher from the other two.
The rate-of-change of negative governance sentiment as it rose and peaked in early June before the scandal broke was also extremely high, perhaps providing the basis for an early warning signal.
Portfolio managers who had been keeping an eye on the reputational slide in Governance for Wirecard may have decided the company was at high risk of a negative controversy emerging, giving them cause to drop the stock before the event.
In this way, it can be seen how while not providing a hard and fast early warning signal, SESAMm’s ESG scores can, nevertheless, be used as the basis for developing a data-driven, rules-based portfolio management approach that can help investors avoid high-risk candidates like Wirecard.
SESAMm takes on ESG data challenges
SESAMm’s NLP and AI tools analyze over four million data sources daily to identify thousands of public and private companies and their related products, brands, identifiers, and nicknames, turning reams of unstructured text into structured and actionable data.
SESAMm’s TextReveal Streams platform can be used in many quantitative, quantamental, and ESG investment use cases. TextReveal is a solution that allows you to fully leverage NLP-driven insights and receive high-quality results through data streams, modular API and dashboard visualization, and signals and alerts.
Learn how SESAMm can support you in your investment decision-making and request a demo today.
To request a demo or for access to the full SESAMm Wirecard or LFIS reports, contact us here:
It's a word that most of us in the U.S. despise, almost as much as the word taxes. It's probably because, like taxes, we can't escape its wallet-draining effect when it increases. Maybe the way we feel about it is because the last time the U.S. economy deflated—giving us relief from it—was in the 1930s, when "Prices dropped an average of nearly 7% every year between the years of 1930 and 1933," according to Investopedia. But I digress.
We won’t go into how inflation works, but how the government calculates it—and how its categories affect it—has always been consistent. At least it was until the COVID-19 pandemic hit, that is.
What NLP text mining reveals about the U.S. economy inflation-rate factors and the online conversations about them
To ensure we're on the same page about how we came to the forthcoming information in this use case, let's cover a couple of basics on NLP text mining and inflation rate indexes.
What are NLP and text mining?
Natural language processing (NLP), an A.I. technology, automates the data analysis of mined textual, unstructured data. It includes natural language understanding and natural language generation to simulate a human’s ability to create language, and it’s a component of text mining that performs a special kind of linguistic analysis by deep learning algorithms so a machine can “read” text. Apps like Grammarly or Wordtune analyze text to improve a written text, for example, and chatbots use this technology to interact with customers. Text mining, or text analytics, is the process of examining big data document collections. It’s a computer science discipline that converts unstructured text data in documents and databases into normalized, structured data and datasets for analysis by machine learning models. Deep learning machine-learning algorithms then analyze this data, analyzing semantics and grammatical structures, to gain new insight or aid research from human language. Together, NLP and text mining are like a search engine on steroids.
The Consumer Price Index (CPI)
According to this Forbes Advisor article, "The two most frequently cited indexes that calculate the inflation rate in the U.S. are the Consumer Price Index (CPI) and the Personal Consumption Expenditures Price Index (PCE)." For this article, however, we'll only use the Bureau of Labor Statistics (BLS) method of CPI inflation calculation as a reference. CPI observes a specific group of commonly-purchased goods and services to gauge how prices fluctuate. These foods and services include:
Apparel: Women's and men's clothes, jewelry, etc.
Alcoholic beverages: Beers, wine, liquor, etc.
Energy and commodities: Gasoline, natural gas, electricity, etc.
Food: Items bought by the average consumer, such as breakfast cereal, milk, meat, fruits, vegetables, etc.
Housing and shelter: Rent, housing insurance, bedroom furniture, hotel or motel accommodation costs, etc.
Medical care services: Physicians' services, prescription drugs, medical supplies, etc.
New and used vehicles: Trucks, vans, sedans, SUVs, etc.
Tobacco and smoking products: Tobacco-related items, such as cigarettes, cigars, bidis, kreteks, loose tobacco, etc.
Transportation services: Airline fares, vehicle insurance, etc.
NLP text-mining process: web mentions matched to CPI categories
Using SESAMm's web text analysis engine TextReveal®, we analyzed textual data relating to the inflation topic within the U.S. from 2017 until now. For this analysis, we defined co-mentions as the articles and social media posts that mention "inflation" and at least one of the CPI categories. Note: Although we can analyze more than 100 languages, we focused on English in this case. Also, we didn’t conduct a sentiment analysis from the information extraction.
Figure 1: Inflation co-mentions by category and percentage.
From 2017 to 2019, inflation co-mentions within the U.S. are relatively stable (see Figure 1). But this trend changes with the first shift in 2020, continuing its rapid growth and peak by the end of 2021 due to this surge of inflation reaching record levels.
What was one of the main drivers of the inflation surge? Used cars.
3 used-car and inflation trends uncovered through NLP Text Mining
According to the U.S. Bureau of Labor Statistics, the cost of used vehicles was one of the main drivers of the inflation spike. How did used cars contribute to inflation? The chain of events occurred like so: The increased used-car demand was fueled by a new-vehicle supply shortage caused by a chip shortage generated by supply-chain interruptions due to the COVID-19 pandemic.
As the pandemic-induced supply-chain interruption unfolded, used-car trends developed. Here are three we found in our data mining research:
Trend 1: Co-mentions percentage for used vehicles more than doubled
Figure 2: Used vehicles co-mentions increase percentage-wise.
Based on the percentage of co-mentions compared to other topics, the used-car topic moves from the number eight spot to the number four spot in 2021 (see Figure 2).
Figure 3: Used-car co-mentions begin in early 2021 and exceed those for new cars.
Before 2020, mentions were relatively steady. However, we observe an increase in used-vehicles mentions caused by disruptions in supply chains leading to chip shortages (see Figure 3) as early as January 2020. These shortages led to a decrease in new vehicle inventory. The Statista report, indicating an increase of the used vehicle value index by 49 points compared to the price index recorded in 2020, supports our findings.
Trend 2: Used vehicle prices rose with used-car co-mentions
Figure 4: In 2020, inventory spikes as production and sales plummet, affecting inflation.
Because of the pandemic, car production nearly stopped along with the sale of cars, which created two situations: 1. high inventory to sales ratio and 2. historically low car production (see Figure 4). Vehicles sales picked up later, but car production was still suffering because of supply-chain disruption. That meant the inventory to sales ratio dropped to virtually zero.
So consumers with little-to-no options for new vehicles turned to used cars, increasing their demand and therefore increasing their prices. We confirm this hypothesis with increasing mentions within the used-vehicles topic, coinciding with an inventory volume decrease. All in all, used-vehicle prices rose 40.5%.
Trend 3: The COVID-19 pandemic and new vehicle inventory shortage increased demand
A smaller new-vehicle inventory wasn't the only reason consumers sought out used vehicles. They also wanted used cars because of the pandemic.
Figure 5: The pandemic and new-vehicle supply shortage became bigger reasons for consumers to seek out used cars over cost.
For 2020, we observe that consumers avoided public transportation by rising co-mentions between pandemic-related mentions and the demand for secondhand vehicles (see Figure 5).
Used-car and inflation trends summary
We can summarize the used-car and inflation trends with one phrase: It's a used-car seller's market. For example, online retailers like Carvana have leveraged these factors to grow significantly. In contrast, due mainly to significant supply chain disruptions, motor companies have had the opposite effect, with the Automotive industry projected to lose $210 Billion. Judging by the number of mentions in public web forums and social media, the chip shortage and used-car boom affected General Motors, Ford, and Toyota the most (see Figure 6).
Figure 6: General Motors, Ford, and Toyota suffered pandemic-related shortages the most based on co-mentions.
About SESAMm and TextReveal’s® NLP Text-mining Capabilities
SESAMm is a leading company in alternative data and artificial intelligence, delivering global investment firms and corporations descriptive, prescriptive, or predictive investment analytics worldwide. TextReveal is SESAMm's premiere NLP text-mining product, a solution that allows you to fully leverage NLP-driven insights and receive high-quality results through data streams, modular API and dashboard visualization, and signals and alerts. In other words, we organize, categorize, and capture relevant information from raw data for you.
If I told you that I had a crystal ball and could predict the future, you’d probably laugh in my face. But what if I told you that this crystal ball could give you seemingly invisible data indicating what the future is likely to be, helping you make better investment decisions? Did your ears perk up? I bet they did.
Alternative data, specifically natural language processing (NLP)-generated alternative data, is like a crystal ball. It can help portfolio managers, analysts, and public equity investment managers make better decisions by identifying controversies about a company or potential investment before mainstream data providers and ESG rating firms can. That means you can take data-informed actions before a possible change in your investment value occurs.
That was a lot, so before we go further, let’s cover a quick basic as a refresher.
What is alternative data?
Alternative data is non-traditional information extracted from non-traditional data sources, such as internet social media communities and deeper-level article data. This subset of big data is often nonfinancial and unstructured.
Why use alternative data for finance?
In financial services, alternative data sets give investors insight into the investment process and guide their investment strategies. For example, quant hedge fund managers, asset managers, and private equity firms use alternative data to augment conventional data like those that come from quarterly financial statements and SEC filings. This unconventional data can reveal insights such as metrics on environmental, social, and corporate governance (ESG) information, sentiment analysis, and consumer behavior.
Where does alternative data come from?
Firms, such as data vendors or alternative data providers, find raw data from various sources, depending on the details you need. For instance, they can pull data from transaction data, like credit card transactions, text data from social media platforms and obscure media publishers. They can also extract information from technologies like satellite imagery and geolocation data, IoT sensors, web traffic, app usage, and new data sources yet to exist. All to say, alternative-data sources are found anywhere unconventional, valuable data live.
How does NLP-generated alternative data differ?
NLP-generated alternative data is more than raw data collection and presentation. Instead, it reveals the hard-to-see data and interprets it so you can make better decisions. At SESAMm, for example, we generate alternative data from text using NLP algorithms on a massive, ready-to-use data lake to identify noteworthy trends. Our developers and data scientists then use their machine learning technology to analyze these trends and build investment strategies for our clients.
How can alternative data identify controversies before mainstream providers and ESG rating firms?
There are two main ways alternative data identifies controversies before mainstream providers and ESG rating firms:
First, NLP-generated alternative data’s inherent quality is that it can reveal trends that mainstream providers and ESG firms can’t. And because of this quality—the ability to identify and analyze trends—you can use it to see warnings before a major controversy hits the mainstream.
Second, rating providers can be inconsistent and inaccurate, according to Andrew McLaughlin, a contributor to The Globe and Mail. He states that many ESG rating providers, for instance, are “popping up like dandelions,” and “each uses its own methodologies to rank and score publicly traded companies based on their purported environmental, social and governance risk and performance.” Further, “[their] reports produced are at times rife with inaccuracies,” McLaughlin says. While we at SESAMm might not agree with McLaughlin completely, we believe that alternative data helps bridge the gap between possible shortcomings and a more comprehensive view of an investment’s risks and opportunities.
2 NLP-generated alternative data use cases as examples:
Ericsson (ERIC) analysis
Event: On February 16, 2022, Ericsson investigates an in-house bribery scandal tied to ISIS. According to FIERCE Wireless, “investors reacted to reports that Ericsson may have made payments to the ISIS terror organization to gain access to certain transport routes in Iraq.”
Results: Ericsson’s share value dropped by at least 15% that day as news broke and investors reacted. “It was its biggest share drop in a day since July 2017,” per FIERCE Wireless.
What did NLP-generated alternative data see?
In Ericsson’s case, we analyzed three areas from January 2016 to the event on February 16, 2022:
Name-mention volume
Sentiment polarity
ESG Initiatives Score
Figure 1: Volume over time chart for Ericsson
In Figure 1, we chart our analysis of data volumes, indicating spikes to help detect significant positive or negative events. For instance, the payment scandal similarly affected mention volume as a controversy in 2020. Mentions related to the more recent events continue to increase, making it potentially Ericsson’s most controversial issue so far.
Figure 2: Polarity over time chart for Ericsson
In Figure 2, we analyze Ericsson’s polarity over time. Polarity represents the aggregate of positive and negative sentiment (opinions, reviews) on a company. It can range from -1 to 1. A 0 score means that as much positive as negative sentiment is expressed. High e-reputation brands can have polarity scores over 0.7, based on SESAMm’s research and findings.
Ericsson’s overall polarity sits in the average range for the most part. However, we found that Ericsson’s sentiment suffered significant negative drops caused by controversial news. In other words, the company’s reputation has been affected several times over the years, with the most recent controversies going viral and perceived as very negative.
Figure 3: ESG Score over time for Ericsson
In Figure 3, SESAMm used the analyzed areas and comparisons to compute an ESG Score based on proprietary ESG initiatives data. The scale ranges from 0 to 1, with zero indicating a low and undesirable value and one having a higher and desirable value. We score Ericsson in the 0.05–0.10 range, which we think is relatively low for this company. Despite Ericsson increasing its ESG initiatives over the past year, recent controversies have affected its score negatively.
Figure 4: Ericsson’s ESG risks over time compared to its stock price
Figure 4 charts Ericsson’s ESG risk, which is based on SESAMm’s web data. The range varies from 0 to 1, zero indicating the lowest risk and one as the highest. Ericsson’s score from its latest scandal is a 1. Compared to Ericsson’s stock prices, several spikes in ESG risk anticipated market movements.
Orpea SA (ORP:FP) analysis
Event: On January 24, 2022, Le Monde published an article about the book “Les Fossoyeurs”. According to Le Monde, the book concentrates most of its attacks on Orpéa, a top nursing homes and clinics company, employing “65,000 employees in 1,100 establishments across the planet; 220 nursing homes in France alone.” The book’s author attacks the “Orpea system” and reveals reported elderly abuse and deaths possibly caused by it or negligence.
The media begins to question the limits of ESG rating because of Orpea’s scandal.
Results: Two things occurred after the news broke. One, Orpea’s stock price sustained a 44-point drop. Two, the media begins to question the limits of ESG rating, given Orpea’s rating at the time.
What did NLP-generated alternative data see?
In Orpea’s case, we analyzed three areas from January 2016 to the event on February 16, 2022:
Name-mention volume
Sentiment polarity
ESG Initiatives Score
Figure 5: Volume over time chart for Orpea
In Figure 5, we analyzed volumes of data and compared them with significant events detected. Volume spikes detect clear, negative events in Orpea’s case. For instance, on January 24, 2022, the breaking news had the highest effect since 2016. It’s worthy to note that an upward mention trend becomes visible before the scandal emerges, with volumes reaching levels higher than average.
ESG scores, which range from 0 to 1, are relatively low for Orpea on average. Its controversies have strongly affected its scores in 2018 and 2022 in particular. But the trend to see in the chart is that Orpea’s ESG score had been trending downward for several months before Le Monde’s breaking story.
Figure 8:Orpea’s ESG risks over time compared to its stock price
Figure 8 charts Orpea’s ESG risk, which is based on SESAMm’s web data. The range varies from 0 to 1, zero indicating the lowest risk and one as the highest. Ericsson’s score from its latest scandal is a 1. Compared to Orpea’s stock prices, several spikes in ESG risk anticipated market movements. The current controversy, while very viral, represents a risk equivalent to the 2018 revelations.
Summarizing SESAMm’s Ericsson and Orpea findings
NLP-generated alternative data was able to see trends and events that mainstream ESG rating firms didn’t in the Ericsson and Orpea cases. In both cases, SESAMm would’ve flagged controversies in at least three key areas, name-mention volume, sentiment polarity, and ESG Initiatives Score. And these three areas, with additional proprietary analysis from SESAMm, would’ve provided much-needed insight to investors before their respective market-moving events had occurred.
How SESAMm’s NLP-generated alternative data can help you
Whether for fundamental, quantitative, or quantamental investment use cases, to monitor your corporate risks, or to conduct advanced due diligence on private companies for investment opportunities, explore limitless possibilities using SESAMm’s industry-leading data lake. Our data lake consists of nearly 20 billion articles today, and it’s growing by 20% every year. And if our data lake is our crystal ball, then TextReveal® is what fuels its magic. The data, in conjunction with TextReveal’s NLP algorithms, can reveal alternative data, such as emotion and sentiment data and ESG and risk metrics, on more than 70 million entities like:
Assets
Brands
Product reviews
C-level people
And more
And you can easily access valuable alerts and predictive insights—from live daily or historical data—through dashboards, APIs, or flat files delivered in usable formats. Are you ready to uncover the invisible data about your investments? Request a demo today.
Sésame, ouvre-toi, or in English, open sesame, is the famous magical phrase that inspired us to name SESAMm 8 years ago today. And true to its name, since its inception, SESAMm has been opening doors to a new world of advanced analytics powered by natural language processing.
TRIVIA QUESTION: Why the unusual spelling of SESAMm? (Read until the end for the answer.)
Our heritage
Unlike the phrase’s magical nature in the “Ali Baba and the Forty Thieves” story, SESAMm relies on technology to open doors and uncover hidden treasures. And that has been our goal since we started the company in April 2014. Pierre Rinaldi, Florian Aubry, and I saw the vast amount of textual information available on the web, from news websites to NGO reports and social media. We set out to find a way to translate all that information into powerful, digestible, and actionable insights. In eight years, we’ve created the most extensive data lake in the industry that relies not only on social media but also on forums, review sites, and premium data. Today, the data lake comprises nearly 20 billion articles and grows by 20% year over year.
As we alluded to earlier, the real key to the treasure trove is the technology that uncovers and synthesizes all that data: artificial intelligence, particularly natural language processing (NLP). Our highly-talented technical team developed advanced algorithms to accurately “read” web articles and distill them into only the most relevant data for our users, received as signals and alerts.
From left to right: Co-founders, CTO Florian Aubry, CEO Sylvain Forté, and COO Pierre Rinaldi pictured.
In these eight years, we’ve been able to serve and work with some of the brightest minds in the industry who have trusted us with multiple challenges. Asset managers, private equity firms, and corporations leverage SESAMm’s products for investment strategies, deal sourcing, due diligence, portfolio monitoring, and ESG and positive impact indicators.
In particular, we’re using our technology to transform the ESG industry. For example, we help track controversies and monitor the positive impact for companies that no one else covers in the entire world.
Our team and values
As we proudly surpass the 100-employees mark soon, this is a good moment for us to pause and reflect on where we are and where we want to go. Our mission, tobecome the world’s reference for textual web data analysis, hasn’t changed. We’re more convinced than ever that we are on the right path to achieving that goal.
Our team collaborates between six different sites in 5 countries, many offices, and various cultures. As a deep-tech company, 70% of the group comprises PhDs, engineers, and developers. Moreover, they’re an amazing team that follows horizontal management and servant-leadership approaches, part of the culture we value and insist on.
To close SESAMm's first eight years on a high note, Forbes included me on their 30 under 30 list only a few weeks ago. In my eyes, that is a big recognition of the company and the work the team has done over the years.
Our future
More ESG. As we mentioned before, we want to transform the ESG industry. Currently, we cover a total of close to five million public and private firms. We aim to bring more transparency to the market and align with new regulatory frameworks in a fast-moving environment. By better analyzing companies, we believe we can help investors push for change. For example, to help monitor for positive impact and align with UN sustainable development goals (SDG), we’re launching a new product to systematically generate these types of alerts.
Of course, we want to bring these technologies to new clients, like:
Private equity firms
Quantitative asset managers
High-yield portfolio managers
Corporations to fuel their CSR strategy
From CSR teams looking to evaluate their clients and suppliers from an ESG perspective to central data and analytics teams wishing to generate custom NLP analytics at scale, SESAMm aims to become a central solution.
More importantly, we want to democratize NLP web data. This battle for good technology is our ultimate goal because every large company will need to address this topic at one point or another. So when it’s your turn, we want to be there to make it easier for you to achieve tangible results.
And last but not least, as a fintech company, we set our goals and ambitions on higher grounds whenever we complete a funding round. Our Series B with major private equity firm The Carlyle Group (CG) and New Alpha, a Paris-based fintech VC, was a significant step up. And the more we scale, the bigger we see the potential to apply our tools within existing or new fields, industries, use cases, and countries. This step-up naturally inspires us to plan for new ways to grow, whether with new services or reflecting on the potential of an upcoming funding round.
Our appreciation
Thank you. Without you, we wouldn’t be here. Special thanks to the SESAMm team. To our investors, The Carlyle Group, New Alpha, Havenrock, Caisse d’Epargne, AngelSquare, and more. To our partners and all who have supported us along this journey. And most of all, thank you, our clients. Because of you all, we have grown from a small-city-of-Metz team into an international company.
Cheers to you, us, and our future. Happy 8th anniversary, SESAMm!🥂
Oh, right! The trivia question! Here’s the answer. SESAMm is an acronym for:
Stock
Exchange
Statistical
Analysis
Mechanism
The “Mm” in SESAMm hints at the French pronunciation of sésame. But mostly, we used the small m from the word Mechanism instead of an e to guarantee that the URL would be available.
Over the past decade, many organizations have improved their carbon footprints, from recyclable and biodegradable packaging and single-use plastic to planting trees and reducing their greenhouse gas emissions. However, some businesses and companies looking to boost their eco-friendly image without committing to serious changes and addressing environmental issues have been associated with false green marketing. We call this "Greenwashing."
What is Greenwashing?
Greenpeace and the Environmental Protection Agency define greenwashing as making false and misleading claims about a product's environmental benefits or practices, services, technology, or company practices. Greenwashing typically involves companies spending more money on advertising and marketing than on implementing sustainable business practices that minimize environmental impact. These false green claims can deceive consumers into believing that a product or company is more environmentally friendly than it is, leading to increased sales and profits. As a result, false advertising, misleading initiatives, and groundless claims have increased green investors' exposure to risks emerging from potential lawsuits from activist groups, image deterioration, and some heavy loss in assets invested.
Why is Spotting Greenwashing Important?
Greenwashing is a growing concern for investors as they look to make sustainable and responsible investments. Therefore, spotting greenwashing practices is important for these firms. Here's why.
The deceptive practices used by greenwashers can have significant implications for the integrity of investments made in what investors believe to be sustainably operated companies or sustainable funds. In other words, greenwashing makes it difficult for investors to distinguish between genuinely committed to sustainability companies and those merely making false claims about their environmental practices. As a result, investors may unknowingly invest in companies that are not as sustainable as they claim to be, which can harm their financial returns and the environment. Therefore, it's essential for investors to be aware of greenwashing tactics and to carefully research companies before investing in them to ensure that their investments align with their values and contribute to a more sustainable future.
What Are the Challenges to Detecting Greenwashing?
It's challenging to produce an accurate assessment of environmental, social, and governance (ESG) factors, which gives companies the opportunity to cover or hide ineffective and fake green initiatives. According to Regtank, some of the main challenges to detecting greenwashing practices are the following:
Lack of reporting standards: some investors believe that we haven’t universally agreed upon a set of standards to determine whether a product is ESG compliant.
Lack of transparency: greenwashing companies don’t disclose the specificities of their “green campaigns,” which makes it difficult for investors and consumers to fact-check and evaluate their sustainability claims.
Limited consumer awareness: false marketing strategies could be based on a combination of the consumer’s eco-consciousness and brand loyalty. As a result, consumers become less aware of the misleading strategies greenwashing companies use to sell their products.
Ultimately, these factors may contribute to the inaccuracy and limitations of ESG data and scores, which makes it easier for greenwashers to get away with their false marketing campaigns. Consequently, detecting greenwashing requires scrutiny of environmental claims made by companies and an understanding of the complex supply chains and manufacturing processes involved in producing products and services.
To learn more about greenwashing and have access to real-life case studies, download this comprehensive report:
How Does Artificial Intelligence Detect Greenwashing?
As greenwashing practices increase, activist investors, experts, journalists, and even the general public are spreading awareness of the issue using social media, news outlets, forums, and blogs, among other means. Recently, artificial intelligence (AI), particularly natural language processing (NLP), has proven to be effective in the early detection of greenwashing by analyzing vast amounts of qualitative data publicly available on the web. At SESAMm, for example, we apply our NLP capabilities to identify companies likely to engage in greenwashing practices by analyzing text in billions of web-based articles. Our data lake contains over 25 billion web–sourced articles, sourced from four million news, blogs, social media, and forum discussions on five million public and private companies in more than 100 languages. We run these articles through our AI platform tool, TextReveal®, and systematically craft reliable, timely, and comprehensive insights to detect greenwashing, generate ESG alerts, and identify related risks.
The Rise of Greenwashing
Greenwashing, the deceptive practice where companies claim to be more environmentally friendly than they actually are, has become a growing concern in recent years. By analyzing the frequency of web mentions of greenwashing over time, we can observe important trends and understand the factors contributing to this phenomenon.
Recent analyses indicate a significant increase in greenwashing mentions since late 2019. This rise aligns with a growing public awareness of the climate emergency and the increase in media outlets and social media accounts dedicated to exposing greenwashing. The number of mentions escalated from fewer than 200 to over 23,000 in the last quarter of 2023, highlighting the increasing scrutiny of corporate environmental claims.
A noteworthy pattern is the regular occurrence of spikes in greenwashing mentions during the third quarter over the past three years. This timing corresponds with the "pre-COP" periods, leading to critical international climate change management conferences. These periods see heightened discussions around sustainability, with increased attention on companies' environmental practices.
Figure 1: Greenwashing mentions over time.
Greenwashing in the Energy Sector
The energy sector, particularly the oil industry, has faced significant scrutiny regarding greenwashing. In this context, companies like Shell and ENI have been prominent due to the frequency of greenwashing mentions associated with them.
Figure 2: Examples of greenwashing mentions in the energy sector over time.
For Shell and ENI, the volume of greenwashing mentions has fluctuated, with notable increases in specific quarters. For example, Shell saw spikes in mentions during the second quarter of both 2021 and 2022 while experiencing a drop in the third quarter of 2022. ENI has faced similar fluctuations, often linked to legal actions and publicized environmental issues.
Shell's Greenwashing Mentions, ESG Risks, and Initiatives
Shell, a British multinational and prominent player in this sector, has faced considerable scrutiny for such practices. The company has experienced notable spikes in greenwashing mentions and has been involved in several ESG-related risks.
Figure 3: Shell greenwashing and ESG mentions over time.
Greenwashing Mentions
We can see an increase in greenwashing mentions in the first half of 2023. Around that period, Shell faced allegations and lawsuits concerning its environmental claims. The company was criticized for misleading U.S. authorities and investors about its energy transition efforts. Additionally, Shell faced public backlash for labeling fossil gas as 'renewable' while reporting record profits. A notable incident involved a shareholder suing Shell's executives over climate risks.
ESG Risks
Shell has faced several ESG-related risks, including legal challenges and pollution issues. In 2021, the company was sued by New York City over climate change-related advertising and filed an arbitration claim against Nigeria concerning a spill dispute. In March 2023, Shell faced another oil spill, this time in another region in Nigeria, Rivers State, and also saw institutional investors backing a lawsuit against its board over climate risks. The mid-2023 period saw Shell agreeing to pay $10 million for air pollution violations at a Pennsylvania petrochemical plant. Despite its net-zero pledge, the company announced plans to increase fossil fuel production.
ESG Initiatives
Despite its challenges, Shell has also engaged in various sustainability initiatives. In late 2021, the company announced plans to purchase power from the world's largest offshore wind farm. Mid-2022 saw a leadership change with the company's CEO stepping down as Shell aimed to align with its climate goals. The company also planned to deploy 10,000 EV chargers across India as part of its global strategy. In mid-2023, Shell committed to investing $10–15 billion in developing low-carbon energy solutions. Although the company abandoned its lower oil production target, it maintained its commitment to reducing emissions.
Shell's journey underscores the challenges of aligning environmental claims with real actions, emphasizing the importance of transparency and genuine sustainability efforts.
ENI's Greenwashing Mentions, ESG Risks, and Initiatives
ENI, an Italian multinational oil and gas company, has faced scrutiny for such practices. The company has experienced fluctuations in greenwashing mentions and has been involved in a number of ESG-related risks.
Figure 4: ENI greenwashing and ESG mentions over time.
Greenwashing Mentions
ENI's greenwashing mentions are fairly low. However, the company has been featured in discussions about greenwashing, especially with recent developments. In early 2022, the company faced criticism for inconsistencies in emissions data and greenwashing activities, as highlighted by the Sereno Regis Study Center. Greenpeace also criticized ENI for using the Sanremo Music Festival as a platform for greenwashing. In May 2023, ENI faced a lawsuit for allegedly lobbying and greenwashing to promote fossil fuels despite being aware of their environmental risks. Greenpeace sued the company, accusing it of knowingly contributing to climate change.
ESG Risks
Over the past 4 years, the oil giant's ESG risks have been few but not inexistent. ENI has encountered several risks, including legal challenges and pollution issues. In 2022, ENI's environmental strategy was deemed a failure, and concerns arose about a pipeline spill into the East Irish Sea. The company also faced legal actions in 2021, including an appeal against a court ruling in an illegal waste case and warnings from the Legality Network to reduce greenhouse gas emissions or face prosecution.
The company faced a lawsuit in early 2023 for allegedly having prior knowledge of the climate crisis. In another incident, a report found that ENI and Shell were responsible for significant pollution in Bayelsa, requiring a $12 billion cleanup.
Shell and ENI both face the challenge of balancing economic interests with environmental responsibility. Despite allegations of greenwashing and environmental risks, both companies have taken steps towards sustainability, such as investing in low-carbon solutions and renewable energy projects. Their experiences highlight the importance of transparency, genuine commitment to environmental responsibility, and the role of public scrutiny in holding companies accountable.
Greenwashing and ESG Investing
In sum, certain companies advertise their sustainability and green initiatives, while in reality, they are making false claims and practicing greenwashing, as evidenced by our analysis using SESAMm's AI and ESG reports. We use AI through TextReveal to generate alternative data for use cases, such as ESG and SDG, sentiment, private equity due diligence, corporate studies, and more. Our technologies can reliably ensure the credibility of ecological initiatives and serve global investment firms, corporations, and investors, such as private equity firms, hedge funds, and other asset management firms, to enhance their investment strategies.
Conclusion
In conclusion, the issue of greenwashing represents a substantial obstacle in the journey towards genuine environmental sustainability, misleading consumers and investors and diluting the efforts of genuine sustainable enterprises. Nevertheless, the emergence of advanced technologies such as Artificial Intelligence (AI) and Natural Language Processing (NLP) indicated a new era of accountability. Innovators like SESAMm are at the forefront, deploying these technologies to effectively unravel and counteract greenwashing practices. This empowers investors, asset, and portfolio managers to discern and align their resources with legitimately sustainable entities. The call to action is clear: a collective demand for transparency and responsibility is crucial.
Reach out to SESAMm
TextReveal’s web data analysis of over five million public and private companies is essential for keeping tabs on ESG investment risks. To learn more about how you can analyze web data or to request a demo, reach out to one of our representatives.