Cointime

Download App
iOS & Android

The Trust Dilemma: Overcoming LLM Hallucinations in Financial Services

Validated Project

From chainlink by Author: Laurence Moroney

This is a guest post from Laurence Moroney, Chainlink Advisor and former AI Lead at Google.

In recent years, large language models (LLMs) have become synonymous with artificial intelligence (AI), spurring massive investment and interest. However, the impressive capabilities afforded by LLMs are offset by a severe caveat — the tendency to ‘hallucinate’ or generate false and misleading information. This phenomenon can pose significant trust challenges and techniques to overcome them in high-stakes domains like financial services, where accuracy and reliability are paramount, presenting a massive opportunity. The intersection of AI and technologies like blockchain — where trust and integrity are baked into the platform — could be the solution.

First, let’s examine the problem of hallucinations and then explore why this new industry initiative with Chainlink, Euroclear, Swift, and six financial Institutions is so transformative.

Understanding LLM Hallucinations

A large language model is a predictor of tokens—fundamental units of text or data. Trained on massive amounts of text, using a transformer architecture that learns sequence-to-sequence patterns, LLMs like OpenAI’s GPT, Google’s Gemini, and Anthropic’s Claude have proven to be excellent models for artificially understanding and generating text. But, given their artificial nature, they don’t truly understand their outputs and instead predict the next statistically relevant token for an output.

Consider the phrase from a popular children’s song: If you are happy and you know it.

In your brain, you have learned what comes next. It’s likely the words “clap,” “your,” and “hands.” The transformer architecture mimics this. 

In some cultures, however, the next word is not “clap,” but in fact, “you,” and they sing, “If you are happy and you know it, you clap your hands”. So, if one is predicting the next token based on a training corpus of text, where most instances don’t use “you,” but some do, then the predictive subsequent token modeling would indicate a high likelihood that the next word is “clap,” a lower likelihood that it is “you,” and then very low likelihoods for all other words.

And this is for a well-known phrase. Now, consider what happens if a model, trained on text like this, is asked to predict the next token for something that it has never before seen, like a news story or a corporate action that has only just been written like “Company X today announced a stock split of…” — how would the LLM predict the next token? From its corpus, it has likely seen very similar phrases many times, but they would have many different subsequent tokens like “twenty to one,” “ten to one,” or “one to ten,” etc. 

The LLM would calculate the next likely token from the most common one in its training set and output that. (Just like “clap” instead of “you” for the children’s song). For example, it might output a phrase like “Company X today announced a stock split of ten to one.” 

If the reality is that Company X is factually doing a six-to-one split, we now have a hallucination!

Given that, for our scenario, it’s not the core usage of an LLM to generate content like this, but instead to parse existing content — such as reading a PDF of the corporate actions where the stock split is mentioned. We can have it artificially understand the contents on our behalf so we can question it. It is important to note that the underlying hallucination issue *still* applies. The text of the PDF might say that the split is six-to-one, but the LLM could hallucinate ten-to-one based on its statistical next-token analysis. The output it gives you when you ask about the PDF is still generating the subsequent tokens based on the LLM’s best guesses.

The Peril of Hallucinations in Financial Services

Trusting an LLM blindly is a big mistake for the reasons demonstrated above. For financial services, the consequences of this could be:

Misinformed Decision Making

Inaccurate data could lead to flawed risk assessments, suboptimal investment strategies, and inefficient capital allocation.

  • Regulatory and Reporting Issues: False information could lead to unintentional violations of regulatory and reporting requirements
  • Erosion of Trust: Clients or stakeholders discovering that any institution relies on unreliable, AI-generated information could severely damage trust and reputation
  • Financial Losses: Hallucinated data leading to bad advice or forecasting could lead to significant monetary losses

Thus, fully embracing LLMs for financial operations is fraught with risk. The need for accuracy and reliability in financial data and advice makes the current state of LLM technology challenging to integrate safely into many core processes.

Blockchain: A Path to Trust and Verifiability

While LLMs present challenges in accuracy and reliability, blockchain technology, with its core attributes of trust and verifiability, may be the key to a solution. Blockchain’s decentralized and immutable ledger system provides a framework for recording and verifying information that could be leveraged to help mitigate the risks associated with LLM hallucinations. Let’s explore how that might work, beginning with the idea of consensus.

Consensus: A Method for Trust

The scientific process begins with a theory. This theory is then supported with experimental evidence. This is then reviewed by trusted peers who come up with a consensus—opinions may vary. Still, when most peers support that the experimental evidence underpinning the theory is valid, the scientific discovery is validated and becomes the current ground truth.

Inspired by this process, Chainlink implemented a novel technique to overcome the risks of hallucination. 

They used several LLMs to have them artificially understand the contents of a corporate action and output it in machine-readable JSON format. Instead of trusting a single prompt to a single LLM, the idea was to have a swarm of LLM-prompt combinations to produce various results. 

The consensus could then be measured. If they all produced the same result, we could begin to trust it, and it could be placed on the blockchain as a unified golden record. This is a verifiable, persistent, updateable, and interoperable data container that is synchronized across blockchains.

Of course, if consensus is not attained, a manual process could be used to establish the ground truth and then publish it as a unified golden record.

This process greatly lowered the risk of hallucination, increasing trust in the automation of the process to reduce costs. The publication of the findings on-chain means that all parties can trust the data going forward. 

Thus, an end-to-end system for converting unstructured data to highly trusted unified golden records is attainable. Much of this system could be automated, increasing trust and reducing the costs and risks associated with using LLMs in financial services.

Chainlink used this process in an industry initiative conducted alongside Euroclear, Swift, and six major financial institutions. This project demonstrated the automation of taking unstructured financial data, artificially understanding it with LLMs to produce on-chain golden records, and avoiding the risks of LLM hallucination. 

Given a lack of standardization in reporting processes for corporate actions, significant human capital is needed to read diverse document types to understand data for these events. 75% of firms have to revalidate this data manually, and the inefficient processes cost businesses many millions of dollars to overcome. 

Transforming Asset Servicing With AI, Oracles, and Blockchains

Chainlink’s approach to solving this problem can be found in Transforming Asset Servicing With AI, Oracles, and Blockchains. It shows very encouraging results at the prototype stage with:

  • Data Extraction and Structuring: It establishes a novel data extraction and structuring process that leverages unstructured data from public company sources and turns this into structured data that adheres to regulatory frameworks such as SPMG
  • Consensus Framework: It successfully demonstrated an LLM consensus framework for financial data comparing the outputs of multiple LLMs, greatly enhancing the reliability of their outputs and mitigating the hallucination risks
  • Near Real-Time Data distribution: Once the consensus data was established, Chainlink’s industry initiative propagated it across multiple blockchain ecosystems and stored it as unified golden records in smart contracts. This makes it accessible to market participants and provides a framework for them to build new applications on top of.

Conclusion

We are only at the beginning of the AI revolution. It can be compared to the Internet at the dial-up stage. As novel solutions to existing problems arise, the opportunity to build more and better solutions becomes clearer. 

In this study, the power of AI and LLMs, held back by the risk of hallucination, could be unleashed by a novel intersection of data extraction for consensus, and data publication on a trusted, verifiable solution with blockchain. As LLMs evolve and hopefully improve, the underlying technique of driving consensus and publishing established consensus on-chain as a golden record will continue to show value.

Chainlink’s industry initiative is a very early prototype of what could be a powerful solution that opens many new opportunities for AI, blockchain, and financial services to build better, together.

Comments

All Comments

Recommended for you

  • Matrixport: Solana’s funding rate is currently as high as 70% annualized, and a price correction may occur

    According to a report, Matrixport has released a chart today stating that Grayscale has submitted an application to convert Solana Trust into a spot ETF. Although the current asset management scale of the product is relatively small at $134 million, if approved, it will set an important market precedent for other ETF issuers. It is important to note that Solana's financing rate is currently as high as 70% annualized, which creates significant pressure on leveraged long positions. Historical experience shows that similar high financing rates are often related to price corrections, as was the case in March of this year when the SOL-USDT price fell under similar financing rate backgrounds.

  • Japanese Prime Minister Shigeru Ishiba is cautious about separate taxation of cryptocurrencies and approval of ETFs

     Japanese Prime Minister Shizuo Shima expressed caution about the unified 20% separate taxation rule for cryptocurrency in a representative issue at a plenary session of the House of Representatives. "Is it appropriate to encourage investment in cryptocurrency such as stocks and investment trusts that have investor protection regulations? Will the public understand the idea of applying separate self-assessment taxation? There are several issues that need to be resolved. We need to consider it carefully." At the same time, "whether cryptocurrency should be included in ETFs depends on whether cryptocurrency is an asset that needs to be made more easily accessible to the public."

  • AI computing economy layer GAIB completes $5 million seed round of financing, led by Hack VC, Faction VC and Hashed

    GAIB, an AI computing economic layer, announced the completion of a $5 million seed round of financing, with Hack VC, Faction VC, and Hashed leading the investment. Other participating investors include Spartan, Animoca Brands, MH Ventures, Aethir, Near Foundation, Chris Yin from Plume Network, and Lucas Kozinski from Renzo Protocol.

  • Cadenza, an investment institution focusing on blockchain and AI, has raised $50 million for its early-stage AI venture capital fund

     Cadenza, a risk investment company focusing on blockchain and artificial intelligence, announced that its early AI venture capital fund has raised $50 million. The new fund will focus on seed and pre-seed investments, with a focus on infrastructure and enterprise applications. Cadenza's investment portfolio in the Web3 field currently includes: Web3 infrastructure Validation Cloud, Malaysian digital asset exchange Hata, Web3 API platform Uniblock, L1 blockchain Linera, and encrypted wallet application Zulu.

  • Union Completes $12 Million Series A Funding, Led by Gumi Cryptos Capital and Others

    cross-chain settlement layer Union has announced the completion of a $12 million Series A financing round, led by Gumi Cryptos Capital and Longhash Ventures, with participation from Borderless Capital and Blockchange, as well as blockchain founders from Polygon, Movement, and Berachain. The funding will be used for core team expansion, partner integration, and ecosystem development.

  • Russia sentences Hydra market founder to life in prison

     Stanislav Moiseev, founder of the online black market and cryptocurrency mixing service Hydra, has been sentenced to life imprisonment by a Russian court.

  • Portal Ventures raises oversubscribed $75 million crypto fund

    , Portal Ventures, a cryptocurrency venture capital fund before the seed round, raised a $75 million cryptocurrency fund with oversubscription, supported by Chris Dixon and Marc Andreessen.

  • Wall Street Bitcoin Miner BTC Digital Deploys 2,000 BITMAIN T21 Miners

    The T21 miners feature 190T performance and 3610W energy usage per unit. The firm also plans to expand operations in Arkansas, Tennessee, Georgia, and Missouri.

  • Messari ·

    State of Nym Q3 2024

    Nym (NYM) is an open-source, incentivized, and decentralized physical infrastructure (DePIN) protocol that protects privacy at the network level of any application, wallet, or digital service. As a mixnet, Nym protects against traffic pattern analysis and metadata surveillance. Nym exists as a tool to facilitate private end-to-end internet communication between any application, in addition, Nym built a proprietary application on top of the Mixnet, NymVPN. As such, it is not exclusive to blockchain-related activities. However, the Nym mixnet is closely linked to the NYM token, which runs on the Nyx Cosmos-based appchain to enable permissionless ‘bonding’ of new mix nodes to the Nym network and to pay for mixnet services.

  • Aptos Financial Ecosystem Analysis

    Aptos (APT) is a Layer-1 blockchain designed around the core tenets of scalability, safety, reliability, and upgradeability. Aptos was born out of Meta’s Diem and Novi projects, eventually launching in October 2022. Core developer Aptos Labs raised about $400 million in two 2022 private investor rounds.