Bigger is Not Always Better: Forecasting Commodities with NLP Data
November 8, 2022
•
5 mins read
SESAMm has a large data lake of more than 20 billion articles (growing by 5–10 million a day) and 14 years of data in 100 languages. But its size alone is not what makes it good; it’s a refined process to find the exact data you want that makes it better.
Here’s an example to help explain the point. We’re sometimes asked for help researching data to forecast and monitor the commodities market, even by large companies with their own commodities desk of traders and quant researchers. Why would they seek help from outside their firm?
Simply put, traders want an edge. They want information advantages that others are likely to miss, so they look to alternative data from various sources, anything that adds value and is from different angles. And, as it turns out, commodities are a more challenging segment to analyze when it comes to alternative text data. Unlike for companies, commodity texts are scarcer and need more domain knowledge to unravel their implications. A simple sentiment analysis doesn’t bring enough relevant information.
For a more in-depth view, join us as we discuss NLP-derived alternative data, its benefits, challenges for researchers, and why bigger isn’t always better in the world of data.
Identifying environmental, social, and governance (ESG) controversies is a complex challenge. The large amount of data that is added to the web daily makes it difficult to analyze, leaving important insights hidden among irrelevant information. Traditional risk identification methods struggle with this, making it difficult to uncover critical issues that could impact investments.
This article explores the intricacies of ESG data trends. As businesses worldwide strive to adopt more sustainable and ethical practices, the importance of ESG metrics has risen to the forefront of strategic planning and public discourse.
Identifying Controversies with AI
Traditional controversy detection methods often need help uncovering hidden risks buried within unstructured sources like social media, local news, and niche industry reports. This section explores the advantages of using AI tools—such as natural language processing and machine learning—to detect these risks more accurately and efficiently. By leveraging AI, firms can gain deeper insights and respond proactively to emerging ESG issues, ensuring more robust risk management and informed investment decisions.
Key Challenges in Identifying ESG Controversies
In the finance world, especially when dealing with small companies, sometimes private, identifying ESG controversies presents significant challenges. These companies often lack extensive public records, and the data that is available can be sparse, fragmented, or hidden within vast amounts of irrelevant information. Traditional methods of risk identification struggle to navigate this sea of digital noise, making it difficult for private equity firms to uncover critical issues that could impact their investments.
One of the primary hurdles is the lack of valuable, structured data on smaller firms. Unlike large corporations, which are often required to disclose detailed financial and operational information, small private companies might operate with minimal public visibility. This opacity complicates the identification of potential ESG risks, as relevant data is often buried in unstructured sources like social media, local news, or niche industry reports. The challenge is not just about finding information but also about extracting meaningful insights from a diverse array of sources that may not adhere to standardized reporting practices.
Additionally, the diversity in language and terminology used by smaller firms further complicates the identification of ESG controversies. Risks are often discussed in context-specific ways, using industry jargon or localized expressions that do not easily translate into a standard risk assessment framework. This linguistic variation can lead to misunderstandings or even the complete overlooking of critical ESG issues. Therefore, private equity firms require advanced tools capable of interpreting and standardizing this information to ensure comprehensive risk identification.
Artificial Intelligence vs. Traditional Methods
Artificial Intelligence (AI) has emerged as a game-changing tool for identifying ESG controversies, offering significant advantages over traditional methods. While conventional approaches rely heavily on structured data from formal reports and disclosures, AI technologies, such as natural language processing (NLP) and machine learning, can analyze vast amounts of unstructured data from diverse sources. This capability is particularly crucial for private equity firms focused on small companies, where relevant information may be scattered across social media posts, obscure local news articles, and other non-traditional outlets.
Traditional methods often fall short in dealing with the unstructured and fragmented nature of data related to smaller firms. These methods might miss emerging controversies discussed informally in niche blogs or industry-specific forums. In contrast, AI-powered tools can continuously monitor these sources in real time, identifying potential ESG risks before they escalate. This proactive approach allows firms to address issues early, providing a more comprehensive and nuanced understanding of the risks associated with their investments.
Moreover, AI's ability to process and analyze diverse languages and terminology offers a significant edge. By decoding industry-specific jargon and translating localized expressions into a standardized risk framework, AI helps private equity firms overcome the linguistic barriers that traditional methods struggle with. This capability ensures that no critical ESG controversy is overlooked due to language differences, thereby enhancing the accuracy and effectiveness of risk assessments.
To sum it up, while traditional methods have their place, AI technologies provide a more robust, dynamic, and precise approach to identifying ESG controversies. By leveraging AI, private equity firms can better navigate the complexities of data sourcing, interpretation, and risk management, ultimately leading to more secure and informed investment decisions.
Streamlining ESG Controversy Detection with AI
Detecting ESG controversies with AI involves several crucial steps, each contributing to the precise identification of potential risks. The attached diagram illustrates a generalized AI-driven approach to detecting ESG controversies.
Step 1: Data Collection
The first step in this AI process is collecting vast amounts of web-based information to create a comprehensive data lake. This data lake acts as a repository, storing raw data in its original format. AI systems thrive on large datasets to enhance accuracy, and the data lake ensures that this requirement is met by allowing real-time data ingestion. By preserving historical information, the system can perform trend analyses that are crucial for identifying emerging controversies.
Step 2: Organizing & Cleaning the Data
Once collected, the data undergoes an essential organization and cleaning process. This step involves standardizing and categorizing the data to make it more accessible for analysis. By filtering out irrelevant information and tagging essential data points, the system can quickly and efficiently process large datasets. This organization allows for faster analysis and ensures that only the most relevant information is considered, eliminating the noise that can obscure critical insights.
Step 3: Connecting the Dots
With the data organized, the AI system creates a Knowledge Graph (KG) that maps the relationships between key entities, topics, and themes. This step is crucial for understanding how different companies, products, and brands are interconnected. The Knowledge Graph is continuously updated to reflect new data, ensuring that the system remains accurate and relevant in its analysis.
Step 4: Adding Contextual Understanding
The AI system then moves on to interpret the text, employing various techniques such as Named Entity Recognition (NER) and lemmatization. These tools help the system identify and classify key elements within the data, allowing it to grasp the context and main points of the information. This step is vital for accurately understanding the specific topics and issues related to each company, enabling the system to group related articles and monitor the evolution of controversies.
Step 5: Analyzing with Algorithms
In this step, the AI applies sophisticated algorithms to the organized and contextualized data. These algorithms focus on uncovering insights such as sentiment analysis, ESG controversies, and impacts of Sustainable Development Goals (SDGs). The system continuously refines these algorithms to maintain high levels of accuracy and performance, ensuring that the analysis remains relevant as new data becomes available.
Step 6: Turning Analysis into Actionable Insights
Finally, the AI system transforms the analysis into actionable insights. By delivering these insights in a fast and easy-to-understand format, the system empowers users to make informed decisions quickly. For example, a controversy intensity score might be used to prioritize which issues require immediate attention, allowing users to focus on the most significant risks in their portfolios.
This AI-driven process, depicted in the attached diagram, showcases the streamlined approach to detecting ESG controversies, providing private equity firms with the tools they need to manage risks effectively and maintain a competitive edge in the market. For more detailed information on how SESAMm identifies insights with AI, please efer to this document.
Conclusion
To sum up, identifying ESG controversies, particularly in smaller, less visible companies, presents significant challenges for traditional risk assessment methods. However, integrating artificial intelligence offers a transformative solution. AI tools can effectively analyze vast amounts of unstructured data, revealing hidden risks and enabling informed investment decisions. As the demand for sustainable and ethical practices grows, leveraging AI will enhance risk management and foster responsible investment approaches, allowing firms to navigate the complexities of ESG data more effectively.
Reach out to SESAMm
TextReveal’s web data analysis of over five million public and private companies is essential for keeping tabs on ESG investment risks. To learn more about how you can analyze web data or to request a demo, reach out to one of our representatives.
Controversial business involvement screening is moving beyond its origins as a compliance exercise.
Under frameworks like SFDR and the EU Taxonomy, investors must prove that their portfolios not only promote sustainability but also exclude activities fundamentally at odds with environmental, social, or ethical principles. This marks a shift from static disclosure toward dynamic accountability, and it has broadened both the scope and ambition of ESG screening.
Historically, exclusions focused on a narrow range of activities - weapons, tobacco, or fossil fuels - and primarily applied to public equities. Today, that universe has expanded dramatically. Private markets, secondaries portfolios, and private credit exposures are now expected to undergo the same scrutiny as listed assets. This reflects not only regulatory alignment but also diversifying investor expectations, as institutions incorporate reputational, cultural, and mission-based constraints into their investment frameworks.
Modern exclusion policies increasingly include areas not yet covered by regulation but relevant to ethics, faith, or social impact. Examples range from pork-related activities in Sharia-compliant portfolios to emerging debates over cryptocurrency mining and trading, and even biotechnology topics such as human cloning or genetic manipulation that raise profound ethical questions. These additions illustrate how business involvement screening is evolving from a rule-based checklist into a reflection of each investor’s worldview and stakeholder commitments.
This evolution, however, brings complexity. Private assets and novel sectors often lack standardized data or public disclosures. ESG, compliance, and deal teams must process incomplete information, document decisions, and adapt quickly to new mandates - all without expanding headcount. The result is a growing need for automation that can adapt to human nuance.
SESAMm’s AI-powered business involvement screening meets that need. By allowing investors to screen based on their own exclusion categories and thresholds, it translates varied mandates - from regulatory to reputational - into a single, automated process.
Automating Controversial Business Involvement Screening in Public and Private Assets
SESAMm’s platform uses a new AI agent approach that scans and analyzes vast amounts of information. Below, we provide an overview of SESAMm’s business involvement screening capabilities and how they address investors’ needs for automation, thresholding, and flexible outputs.
Comprehensive Coverage through Big Data
SESAMm utilizes its AI engine to monitor over 30 billion articles and 10 million new documents daily from various sources, including news sites and NGOs. This extensive data collection spans multiple languages and local outlets, enabling it to detect obscure references to companies and raise alerts for issues such as misconduct. SESAMm's coverage encompasses millions of public and private companies, enabling users to conduct thorough screenings of any entity, including private companies and subsidiaries.
Customizable Exclusion Frameworks
SESAMm’s business involvement screening gives investors control over what to screen and how to classify it. Users can request customization of exclusion categories to mirror their own policy, whether based on regulation (e.g., SFDR, EU Taxonomy) or internal mandates (e.g., faith-based or reputational constraints). In addition to standard ESG categories like fossil fuels or weapons, investors can add custom topics. This flexibility allows ESG, compliance, and secondaries teams to tailor the tool to their precise needs,.
Threshold-Based Classification
SESAMm’s business involvement screening module is built around the concept of threshold-based flags. The AI utilizes structured data and unstructured signals to determine involvement levels. The output for each company is a clear classification: No Involvement, Limited Involvement, or Significant Involvement for each category. These classifications correspond to thresholds – limited might mean some involvement but below the exclusion threshold, significant means above the threshold or its a core business. By encoding the thresholds in the system, SESAMm ensures consistency with the investor’s policy. This is crucial for automation: rather than an analyst manually checking revenue percentages and news, the system does it automatically and provides clear justification.
Rapid Portfolio Screening Process
The system is designed for fast, self-contained screening. A user simply uploads a list or portfolio, and within hours receives a complete file summarizing involvement across all exclusion categories. The output includes company-level classifications, summaries of supporting evidence, and references to sources. This enables investors to integrate the results directly into due diligence workflows, risk committees, or regulatory reporting, with no ongoing manual data maintenance required.
Cost and Resource Efficiency
Automating this process saves substantial analyst time, particularly for rating agencies and secondaries investors managing high volumes of entities. Rating agencies can use the pre-classified results as a baseline input for their own ESG or credit assessments, reducing the manual data-gathering burden. LPs and GPs can run large private company universes in-house without additional research teams. In secondaries, where a full portfolio review can take days of analyst effort, SESAMm’s workflow compresses that timeline to just a few hours, enabling ESG validation to fit seamlessly into transaction schedules.
Auditability and Verification
Each classification is fully transparent. Analysts can drill down into the evidence behind a flag, including links to original articles, filings, or corporate statements, and verify the AI’s reasoning. Automatic translation ensures accessibility across languages. This transparency builds trust in the results and provides auditable documentation for LP reporting or regulator reviews.
As ESG investing matures, the leaders will be those who can implement exclusions transparently, efficiently, and in alignment with evolving norms. The next frontier is no longer just regulatory compliance - it is the ability to anticipate what clients and society will expect tomorrow, and to operationalize those expectations across all asset classes. SESAMm’s technology makes that possible: a platform that keeps pace with both policy evolution and moral expectations, bringing consistency and clarity to an increasingly complex ESG landscape.
As scrutiny of corporate supply chains intensifies, investors are demanding more than policy statements and third-party audits. In this webinar, SESAMm and Inrate explore two powerful lenses for evaluating risks and sustainability impacts across global supplier networks: SESAMm’s real-time controversy detection and Inrate’s impact-driven sustainability data and ratings. Together, these approaches cover both public and private companies, go beyond self-disclosures, and enable assessments across a wide range of suppliers.
Watch this instant replay to dive into:
Emerging trends shaping how investors assess ESG risks and impacts across supply chains
The expanding role of AI in identifying hidden exposures and mapping sustainability outcomes
Proven strategies for combining controversy signals, ESG ratings, and emissions data to drive more informed decisions
Stay ahead with the latest in ESG and AI intelligence
Join our mailing list to receive new reports, event invites, and updates from SESAMm directly to your inbox.