Generative AI and The Future of Finance: A Deep Dive with Sylvain Forté
July 19, 2023
•
5 mins read
Hello, and welcome to our ongoing series on Generative AI in Finance. I’m Sylvain Forté, CEO and co-founder of SESAMm, and in our first article of the series, we’ll explore how generative AI is reshaping the financial industry. At SESAMm, we’ve been fortunate to be at the forefront of this revolution, witnessing the transformational power of large language models like ChatGPT and its iterations.
Unprecedented Evolution in Generative AI
Let's begin with an overview of the current generative AI landscape. Over the last few years, we've seen an explosion in the capabilities of generative AI, particularly in text processing. From BERT to GPT4, large language models (LLMs) have demonstrated increasingly impressive capabilities. These models, performing at a human level for many tasks, are rapidly evolving, making the past six months feel like an exponential leap in the AI domain. Generative AI is no longer a speculative idea but rather an early adoption phase of a powerful technology. At SESAMm, we've leveraged our partnerships with OpenAI and other organizations to gain high-level access to these AI models, allowing us to harness this potential and democratize access to intelligence. It’s an exciting shift that, while reshuffling business models and job roles, promises enormous productivity gains and increased overall value. The key, of course, is ensuring that these benefits extend to society as a whole.
The Disruption in Financial Sector
The finance sector, with its vast array of text-based tasks, stands to gain enormously from generative AI. Any repetitive yet intelligence-heavy tasks — think verification, document generation, or client communication — are ripe for automation. Finance, despite being a highly intelligent sector, often sees that intelligence is misspent on routine tasks. Generative AI can realign this balance, reducing costs, enhancing service quality, and building trust. Whether private equity, asset management, or commercial banking, AI can streamline processes, delivering an efficiency boost that significantly enhances customer satisfaction. In private equity, the automation possibilities could reshape the sector, bringing it closer to the public markets. In asset management and banking, cost reduction and service enhancement could lead to a dramatic rise in customer satisfaction.
Concrete Use Cases of Generative AI in Finance
So, how does this look in practice? Generative AI can automate numerous finance tasks, including creating reports, verifying information, summarizing news or earnings calls, and even making internal data searchable. Imagine a system that can help asset managers match various types of datasets based on a user query in natural language, thereby making data access and interpretation simpler. This could vastly improve the user experience with business software, reducing effort and time spent. While some applications, like a fully automated financial advisor or AI-led trading and hedging, might present more significant challenges, their potential benefits could revolutionize these sectors.
Overcoming Roadblocks
Of course, every transformation comes with challenges. The key is discerning which use cases are suited for full automation and which require human oversight. Data privacy concerns will also influence decisions about whether to use proprietary or open-source models. There will inevitably be resistance to change within organizations, but the 'Google test' can help navigate data privacy issues: if an employee would conduct that search or share that data on Google, it's likely safe to share with a proprietary Generative AI solution.
Generative AI and Risk Mitigation
Risk mitigation strategies can greatly benefit from generative AI. From detecting and preventing fraud to managing market risks, generative AI can verify identities, cross-reference databases, and analyze vast amounts of data. For instance, at SESAMm, we're developing an ESG controversy detection solution that can be an invaluable tool for risk mitigation.
Improving Investment Decision-Making
Generative AI’s ability to process and analyze massive amounts of data accurately and quickly makes it a formidable tool for investment decision-making. By identifying patterns, trends, and correlations that humans might miss, generative AI can provide a more comprehensive, data-driven perspective, aiding portfolio optimization, asset allocation, and investment risk management.
Streamlining Operations for Efficiency
Generative AI's efficiency and accuracy promise to transform financial institutions. By automating time-consuming tasks like report generation and client communication, AI can free employees to focus on strategic tasks that require critical thinking. Imagine being able to ask complex questions to your banking app in natural language and getting immediate, accurate responses. Such high-quality service was unthinkable a few years ago, but with generative AI, it's within our reach. The future of the financial sector is undeniably tied to the successful implementation of generative AI solutions. The potential is vast, the challenges are surmountable, and the rewards are great. In my view, generative AI is the key to a more efficient, cost-effective, and customer-centric financial sector.
To learn more about SESAMm’s innovative solutions and how we’re pushing the boundaries with generative AI, read the second part of this series here.
Reach out to SESAMm
TextReveal's web data analysis of over five million public and private companies is essential for keeping tabs on ESG investment risks. To learn more about how you can analyze web data or request a demo, contact one of our representatives.
Financial and ESG insights begin with big data coupled with data science.
At SESAMm, our artificial intelligence (AI) and natural language processing (NLP) platform analyzes text in billions of web-based articles and messages. It generates investment insights and ESG analysis used in systematic trading, fundamental research, risk management, and sustainability analysis.
This technology enables a more quantitative approach to leveraging the value of web data that is less prone to human bias. It addresses a growing need in public and private investment sectors for robust, timely, and granular sentiment and environment, social, and governance (ESG) data. This article will outline how the data is derived and illustrate its effectiveness and predictive value.
Content coverage and ESG data collection
The genesis of SESAMm’s process is the high-quality content that comprises its data lake, the source from which it draws its insights. SESAMm scans over four million data sources rigorously selected and curated to maximize coverage of both public and private companies. Three guiding criteria—quality, quantity, and frequency—ensure a consistently high input value.
Every day the system adds millions of articles to the 16 billion already in the data lake, going back to 2008. The coverage is global, with 40% of the sources in English (the U.S. and international) and 60% in multiple languages. The data lake, expanding every month, comprises over 4 million sources, including professional news sites, blogs, social media, and discussion forums.
The following tables illustrate SESAMm’s data lake distribution (Q1 2022):
Respect for personal privacy figures highly in the data gathering process. We don’t capture personal data, like personally identifiable information (PII), and respect all website terms of service and global data handling and privacy laws. SESAMm’s data also doesn’t contain any material non-public information (MNPI).
Deriving financial signals and ESG performance indicators
SESAMm’s new TextReveal® Streams platform applies NLP and AI expertise to process the premium quality content gathered in its data lake. This complex process involves named entity recognition (NER) and disambiguation (NED)—the process of identifying entities and distinguishing like-named entities using contextual analysis—and mapping the complex interrelationships between tens of thousands of public and private entities, connecting companies, products, and brands by supply chain, location, or competitive relationship.
Process representation for NER and NED
Using SESAMm’s TextReveal Streams, this wealth of information is filtered to focus on four crucial contexts for systematic data processing, risk management, and alpha discovery:
Sentiment covering major global indices: world equities (and Small Caps, Emerging), U.S. 3000, Europe 600, KOSPI 50, Japan 500, Japan 225
Sentiment covering all assets and derivatives traded on the Euronext exchange
Private company sentiment on more than 25,000 private companies
ESG risks covering 90 major environmental, social, and governance risk categories for the entire company universe, which includes more than 10,000 public and more than 25,000 private companies with worldwide coverage
TextReveal Streams data sets and assessments are used by financial institutions, rating agencies, and the financial services sector, such as hedge funds (quantitative and fundamental) and asset managers, to optimize trade timing and identify new sustainable investment opportunities. Private equity deal and credit teams also use the data for deal sourcing and due diligence. Private equity ESG teams use it to manage initiatives like portfolio company environmental, social, and governance risk and reporting.
Methodology and technology for processing unstructured data
NLP workflow, from data extraction to granular insight aggregation
Data is continually extracted from an expanding universe of over four million sources daily. As it enters the system, it is time-stamped, tagged, indexed, and stored in our data lake to update a point-in-time history extending from 2008 to the present. The source material is then transformed from raw, unstructured text data into conformed, interconnected, machine-readable data with a precise topic.
NLP workflow for TextReveal Streams
Mapping relationships between entities with the Knowledge Graph
At the heart of the text analytics process is SESAMm’s proprietary Knowledge Graph, a vast map connecting and integrating over 70 million related entities and their keywords. It’s essentially a cross-referenced dictionary of keywords, relating each organization to its brands, products, associated executives, names, nicknames, and their exchange identifiers in the case of public companies.
Entities within the Knowledge Graph are updated weekly and tagged to ensure changes are correctly tracked. The CEO of a company today, for example, may not be the CEO tomorrow, and brands may be bought and sold, changing the parent company with each sale. Weekly updates within the Knowledge Graph ensure the system is aware of these changes.
Named entity disambiguation (named entity recognition plus entity linking) is one of the NLP techniques used to identify named entities in text sources using the entities mapped within the Knowledge Graph universe.
At SESAMm, NED identifies named entities based on their context and usage. Text referencing “Elon,” for example, could refer indirectly to Tesla through its CEO or to a university in North Carolina. Only the context allows us to differentiate, and NED considers that context when classifying entities. This method is superior to simple pattern matching, limiting the number of possible matches, requiring frequent manual adjustments, and cannot distinguish homophones.
SESAMm uses three other NLP tools to identify entities and create actionable insights. These are lemmatization, embeddings, and similarity. Each is explained in more detail below.
Analyzing the morphology of words with lemmatization
News articles, blog posts, and social media discussions reference organizations and associated entities in various forms and functions. Lemmatization seeks to standardize these references so the system knows they mean the same thing.
For example, “Tesla,” “his firm,” “the company,” and “it” are all noun phrases that can appear in a single article and refer to a single entity. Even where the reference is apparent, it can take different forms. For example, “Tesla” and “Teslas” both refer to the same entity but have slightly different meanings (semantics) and shapes (morphology).
The lemmatization process standardizes reference shape (morphology) to facilitate identification and aggregation. Lemmatization is a more sophisticated process than stemming, which truncates words to their stem and sometimes deletes information.
Encoding context and meaning with word embedding
In NLP, embedding is a numerical representation of a word that enables its manifold contextual meanings to be calculated relationally. Embeddings are typically real-valued vectors with hundreds of dimensions that encode the contexts in which words appear and, thus, also encode their meanings. Because they are vectors in a predefined vector space, they can be compared, scaled, added, and subtracted. An example of how this works is that the vector representations of king and queen bear the same relation to each other as the representations of man and woman once you subtract the vector that represents royal.
Vectorized representation of embeddings
Using embedding is key to analyzing how words change meaning depending on context and understanding the subtle differences between words that refer to the same concept: synonyms. For example, the words business, company, enterprise, and firm can all refer to the same thing if the context is “organizations.” But they represent different things and even different parts of speech if the context changes.
In the phrase, “[Tesla] will be by far the largest firm by market value ever to join the S&P,” for example, one could replace the word firm with company or enterprise without affecting the meaning significantly. Contrast that with “a firm handshake,” where a similar substitution would render the phrase meaningless.
Also, words referring to the same concept can emphasize slightly different aspects of the concept or imply specific qualities. For example, an enterprise might be assumed to be larger or to have more components than a firm. Embeddings enable machines to make these subtle distinctions.
One advantage of using embedding is that it’s practical because it’s empirically testable. In other words, we can look at actual usage to determine what a word means.
Another advantage is that embeddings are computationally tractable. This understanding of a word’s definition allows us to transform words into computation objects to programmatically examine the contexts in which they appear and, thus, derive their meaning.
As lemmatization is an improvement on stemming, embeddings improve techniques such as one-hot encoding, which is close to the common conception of a definition as a single entry in a dictionary.
SESAMm uses the global vectors for word representation (GloVe) algorithm to generate embeddings. It’s an unsupervised learning algorithm that begins by examining how frequently each word in a text corpus co-occurs with other words in the same corpus. The result is an embedding that encapsulates the word and its context together, allowing SESAMm to identify specific words in a list and different forms of the listed words and unlisted synonyms.
GloVe is an extension of recent approaches to vector representation, combining the global statistics of matrix factorization techniques like latent semantic analysis (LSA) with the local context-based learning of word2vec. The result is an unsupervised algorithm that performs well at capturing meaning and demonstrating it on tasks like calculating analogies and identifying synonyms.
BERT is another algorithm used by SESAMm to generate embeddings. BERT produces word representations that are dynamically informed by the words around them. Google developed the technique, and it’s what’s known as a transformer-based machine learning technique, which means it doesn’t process an input sequence token by token but instead takes the entire sequence as input in one go. This technique is a significant improvement over sequential recurrent neural network (RNN) based models because it can be accelerated by graphics processing units (GPUs).
SESAMm uses BERT for multilingual NLP of its extensive foreign language text because it has been retained using an extensive library of unlabeled data extracted from Wikipedia in over 102 languages. BERT model was trained to predict words from context and next sentence prediction where it was trained to predict if a chosen following sentence was probable or not given the first sentence. As a result of this training process, BERT learned contextual embeddings for words. Due to this comprehensive pre-training, BERT can be finetuned with fewer resources on smaller datasets to optimize its performance on specific tasks.
Linking words, sentences, and topics with cosine similarity
Cosine similarity with centered means it’s identical to the correlation coefficient, which highlights another element of the computational tractability of the embeddings approach. It makes it easy to compare words and contexts for similarity.
Converting words to vector representations means we can quickly and easily compare word similarity by comparing the angle between two vectors. This angle is a function of the projection of one vector onto another. It can identify similar, opposite, or wholly unrelated vectors, which allows us to compute the similarity of the underlying word that the vector represents.
Two vectors aligned in the same orientation will have a similarity measurement of 1, while two orthogonal vectors have a similarity of 0. If two vectors are diametrically opposed, the similarity measurement is -1. In practice, negative similarities are rare, so we clip negative values to 0.
Vectorized representation of cosine similarities
Cosine similarity measures whether two words, sentences, or corpora are close to one another in vector space or “about” the same thing in semantic space. To answer the question, “Is this sentence referencing company X?” we embed the sentence using the process described above and compute the cosine similarity between the sentence and the embedded company profile. Analogously, we compute similarities between sentences and the ESG topics SESAMm monitors by taking the maximum similarity between a sentence and each embedded keyword associated with an ESG topic.
These similarities allow us to identify whether a sentence references fraud, tax avoidance, pollution, or any other ESG risk topic among the more than 90 that SESAMm tracks across the web.
Similarities within ESG topics combine with word counts to resolve the recall and precision problem. Word counts are precise because if a word is identified within a context, then that context, by construction, references the topic.
The virtue of using these NLP techniques is that even if a given keyword list does not include every possible combination of words that a person might use to discuss a topic, relevant entities missed by the word-count process will be identified through vector similarity.
This is the power of SESAMm’s NLP expertise. We can scan many lifetimes’ worth of data in seconds to find the concepts you explicitly ask for and the concepts relevant to your search but that you did not think of yourself.
Sentiment analysis with deep learning and neural networks
Once we’ve identified the concepts and contexts of interest in all the forms they appear, we analyze the context to determine the speakers’ attitudes.
We use sentiment classification models to score a sentence with three possible outcomes: negative, neutral, or positive. The current classification models are based on deep learning AI technologies. Specifically, we stack convolutional neural networks with word embeddings and bayesian optimized hyperparameters—parameters not learned during training. This architecture improves the accuracy and enables fast shipping of production-ready models for a given language. We also produce state-of-the-art frameworks with architecture variations enabling multilingual capabilities, such as transformers and universal sentence encoders.
Condensing information and extracting insights with daily aggregation
Similarities, embedded word counts, and sentiment are state-of-the-art tools for processing unstructured text data. The same tools are effective cross-linguistically.
Once the information has been extracted from millions of data points, it’s aggregated and condensed into actionable insights.
All entities are referenced directly or indirectly within an article. Then, sentence-level references are aggregated to obtain an article-level perspective, and finally, all relevant articles are aggregated to gain an entity-level view of that day.
In this way, reams of data are compressed into several metrics to provide a daily aggregate view for each entity, highlighting trends at a sentence, article, and entity-level comparable over a multi-year history.
ESG analysis use cases
SESAMm’s TextReveal Streams is used in various investment domains, from asset selection to alpha generation and risk management. Systematic hedge funds track retail interest in real time to identify investment opportunities and protect their existing positions. In the Private Equity industry, equity and credit-deal teams use the data in various ways, from monitoring consumer perspectives via forums and customer reviews for evaluating deal prospects to estimating due diligence risks, all to help make investment decisions. Dedicated teams use our data for monitoring portfolio companies for ESG red flags that conventional ESG reporting might miss.
Below are two examples of how aggregated TextReveal Streams data can be used to help identify investment risk and opportunity.
LFIS CapitalL: ESG signals for equity trading
ESG controversies can significantly impact asset prices in the short term, and it’s now estimated that intangible assets, including a company’s ESG rating, account for 90% of its market value.
Working in partnership with LFIS Capital (LFIS), a quantitative asset manager and structured investment solutions provider, SESAMm developed machine learning and NLP algorithms that could analyze ESG keywords in articles, blogs, and social media, to generate a daily ESG score specific to each stock, which is part of the TextReveal Streams’ platform’s core functionality.
The results were promising when these scores were incorporated into a simulated strategy for trading stocks in the Stoxx600 ESG-X index.
A simulated long-only strategy running between 2015 and 2020, using the signals, delivered a 7.9% annualized return, 2.9% higher than the benchmark for similar annualized volatility (17.3% vs. 17.1%). The information ratio of the strategy was greater than 1, with a tracking error of 2.8%. Results for the previous three years were compelling, reflecting the growing interest and news flow around ESG themes.
Researchers also backtested a hypothetical long-short strategy for all stocks in the Stoxx600 ESG-X index with a market cap of over $7.5bn. This investment strategy delivered a Sharpe ratio of approximately 1 with annualized returns and volatility of 6.1% and 5.9%, respectively, between 2015 and 2020. Like the long-only strategy, returns were particularly robust over the three years up to 2020: +6.0% in 2018, +7.3% in 2019, and +11.3% in 2020.
Finally, a simulated “130/30” ESG strategy that combined 100% of the long-only ESG strategy and 30% of the long-short ESG strategy delivered a 10.8% annualized return, 5.8% higher than that of the Stoxx600 ESG-X index. Annualized volatility was similar at 16.9% vs. 17.1%. The strategy experienced a tracking error of 3.8% and an information ratio of over 1.5, with a consistent outperformance each year.
Disclaimer: Past performance is not an indicator of future results. Theoretical calculations are provided for illustrative purposes only. The investment theme illustrations presented herein do not represent transactions currently implemented in any fund or product managed by LFIS.
Wirecard: ESG sentiment and volume as predictive indicators
The Wirecard scandal broke on June 21, 2020, when newswires carried the story that the major German payment processor had filed for bankruptcy after admitting that €1.9 billion ($2.3 billion) of purported escrow deposits did not exist.
Could SESAMm’s TextReveal Streams platform have provided investors with an early warning that the scandal was about to break?
The following chart derived from the platform shows how key ESG metrics, including ESG scores (volumes) and ESG scores (sentiment), reacted to the news.
An analysis of the charts pinpoints a shallow rise in the ESG scores (volumes) time series in the early part of June before the eruption on June 21.
The ESG scores (sentiment) metric also shows a steady increase in negative sentiment for governance, the most relevant of the three ESG factors regarding the scandal.
How key ESG metrics, including ESG scores (volumes) and ESG scores (sentiment), reacted to the Wirecard scandal news.
Additionally, before the crash, governance was the most negative of the three ESG factors most of the time. This was especially the case from late March to early April, and then before the scandal in early June, negative governance sentiment diverged higher from the other two.
The rate-of-change of negative governance sentiment as it rose and peaked in early June before the scandal broke was also extremely high, perhaps providing the basis for an early warning signal.
Portfolio managers who had been keeping an eye on the reputational slide in Governance for Wirecard may have decided the company was at high risk of a negative controversy emerging, giving them cause to drop the stock before the event.
In this way, it can be seen how while not providing a hard and fast early warning signal, SESAMm’s ESG scores can, nevertheless, be used as the basis for developing a data-driven, rules-based portfolio management approach that can help investors avoid high-risk candidates like Wirecard.
SESAMm takes on ESG data challenges
SESAMm’s NLP and AI tools analyze over four million data sources daily to identify thousands of public and private companies and their related products, brands, identifiers, and nicknames, turning reams of unstructured text into structured and actionable data.
SESAMm’s TextReveal Streams platform can be used in many quantitative, quantamental, and ESG investment use cases. TextReveal is a solution that allows you to fully leverage NLP-driven insights and receive high-quality results through data streams, modular API and dashboard visualization, and signals and alerts.
Learn how SESAMm can support you in your investment decision-making and request a demo today.
To request a demo or for access to the full SESAMm Wirecard or LFIS reports, contact us here:
Paris, January 18, 2024 – SESAMm, a leader in AI-powered text analysis, is proud to announce its recognition in the Forrester report "ESG Data And Analytics Providers Landscape, Q1 2024."
Sylvain Forte, CEO of SESAMm, commented, “We are thrilled to be recognized by Forrester. Our AI-driven approach, particularly our flagship product, TextReveal®, positions us uniquely in the market. TextReveal® leverages a vast data lake and advanced natural language processing to provide comprehensive insights on public and private companies, a capability not commonly found in traditional ESG analytics.”
Alexandre Tiesset, Head of ESG at SESAMm, added, “Our technology's ability to process massive volumes of data, provide nuanced ESG insights and stakeholder feedback is a game changer. It significantly enhances our clients' ability to assess non-financial risks and monitor ESG-related events, offering them a more accurate and effective solution for their needs.”
SESAMm's AI technology sets it apart as a leader in ESG analytics, enabling more precise and timely insights into ESG controversies and positive events. This technology empowers private equity firms, asset managers, corporations, and financial institutions to monitor and identify company events and controversies, providing an edge in the rapidly evolving landscape of sustainable investment.
For more information about SESAMm and its innovative solutions, please visit www.sesamm.com.
SESAMm is an innovative leader in AI and big data analytics for investment. The company specializes in using advanced natural language processing and machine learning techniques to provide comprehensive ESG and thematic insights, aiding clients in identifying potential risks and opportunities in their investment portfolios.
SESAMm’s AI Technology Reveals ESG Insights
Discover unparalleled insights into ESG controversies, risks, and opportunities across industries. Learn more about how SESAMm can help you analyze millions of private and public companies using AI-powered text analysis tools.
At SESAMm, we’re committed to helping our clients navigate the complexities of ESG risk management and compliance. Our Partner Network, set to launch in 2025, is a new initiative designed to provide clients with seamless access to expert remediation services—something that is especially important in light of new regulations like CSDDD. Whether conducting on-site audits, providing grievance mechanisms, or offering cybersecurity due diligence, our partners are carefully selected to address diverse client needs.
Why the Partner Network?
Identifying risks is only part of the battle—effectively mitigating them is just as crucial. The SESAMm Partner Network bridges this gap by connecting clients with top-tier partners offering important mitigation services.
Key Features of the Partner Network
Remediation Options: Comprehensive services for CS3D compliance and financial regulations.
Streamlined Processes: Pre-negotiated agreements ensure fast access to expert teams.
Seamless Integration: Automated recommendations and real-time results tracking via SESAMm’s platform.
The Partner Network
Discover our partners below - and keep an eye out for new additions; the Partner Network is growing fast.
Proactive M&A Cyber Due Diligence with CybelAngel
CybelAngel offers a cutting-edge external risk protection platform that proactively identifies and mitigates external threats before they can be exploited. Recognized for its excellence by Gartner, CybelAngel has become a trusted partner for organizations globally.
CybelAngel's M&A Cyber Due Diligence reports provide a comprehensive external assessment of a third party's cybersecurity posture. By combining advanced threat intelligence with expert human analysis, these reports offer crucial context and actionable insights into potential cyber exposures. This approach enables organizations to make informed decisions about mergers, acquisitions, or partnerships by revealing vulnerabilities that may not be apparent through traditional due diligence methods.
The reports include:
Exposure scores and detailed metrics
Benchmark of cyber exposure to industry standards
Detailed analysis, technical findings, and data samples
ESG & Sustainability Assessments, Due Diligence, and more with Indefi
Indefi are strategy consultants dedicated to serving investment managers and financial investors globally. They partner with business leaders on growth strategy, market entry, product development and enhancement, sustainability, M&A and transaction support.
Through this collaboration, SESAMm clients can now access Indefi’s industry-leading services, including ESG and sustainability assessments, buy- and sell-side commercial due diligence, sustainable investment strategies, and sustainability reporting (SFDR, Article 29, and CSRD). This partnership ensures an efficient and effective response to identified risks while supporting clients in meeting regulatory and sustainability goals.
The SESAMm Partner Network will officially launch in Q1 2025. Stay tuned for updates as we welcome new partners to the network and continue to develop innovative solutions for risk and ESG management.
SESAMm’s AI Technology Reveals ESG Insights
Discover unparalleled insights into ESG controversies, risks, and opportunities across industries. Learn more about how SESAMm can help you analyze millions of private and public companies using AI-powered text analysis tools.
Stay ahead with the latest in ESG and AI intelligence
Join our mailing list to receive new reports, event invites, and updates from SESAMm directly to your inbox.