Cointime

Download App
iOS & Android

How Effective Is GPT for Auditing Smart Contracts?

Introduction

Recently, ChatGPT has gained a great deal of popularity, impressing its users with its capacity to enhance traditional text, work efficiency, and provide concise overviews. Following closely behind is CodeGPT, a GPT-based plugin that further enhances coding efficiency. With the recent release of GPT-4, can it be applied to auditing blockchain and Solidity smart contracts? Based on this question, we conducted various feasibility tests.’

Testing Environment and Methodology

The comparison models used in this test are: GPT-3.5(Web),GPT-3.5-turbo-0301,GPT-4(Web).

Prompt used in the test: Help me discover vulnerabilities in this Solidity smart contract.

Comparison of Vulnerability Code Snippet Detectio

We performed three rounds of testing. In tests 1 and 2, we utilized historical vulnerability codes commonly encountered in the past as test cases to evaluate the model’s ability to detect fundamental vulnerabilities. In Test 3, we introduced moderately challenging vulnerability codes as the primary test cases.

Test 1:

Example: “Intro to Smart Contract Audit Series: Phishing With tx.orgin”

Vulnerability Code:

Sent to GPT:

GPT-3.5(Web) Response:

GPT-3.5-turbo-0301 Response:

GPT-4(Web) Response:

As you can see from the results, all three models identified critical issues related to tx.origin.

Test 2:

Example: “Intro to Smart Contract Security Audits | Overflow”

Sent to GPT:

GPT-3.5(Web) Response:

GPT-3.5-turbo-0301 Response:

GPT-4(Web) Response:

It is worth noting that both GPT-3.5 (Web) and gpt-3.5-turbo-0301 were able to identify a critical overflow vulnerability, whereas surprisingly, GPT-4 (Web) did not provide any relevant prompt.

Test 3:

Example: “Empty-handed with a White Wolf — Analysis of the Popsicle Hack”

Sent to GPT:

GPT-3.5(Web) Response:

GPT-3.5-turbo-0301 Response:

Looking at the results,, we can see that none of the three versions detected any of the critical vulnerability points.

Summary of Code Snippet Detection

While the GPT models displayed adequate detection capabilities for simple vulnerability code snippets, it falls short when it comes to identifying more complex ones. Throughout the tests, GPT-4 (Web) showcased exceptional readability and a clear output format. However, its ability to audit code does not appear to surpass that of GPT-3.5 (Web) or GPT-3.5-turbo-0301. In some cases, due to the inherent uncertainties in the transformer output, GPT-4 (Web) managed to overlook certain critical issues.

Comparative Detection of Known Vulnerabilities in Full Contracts

To better accommodate the practical requirements of projects during contract audits, we raised the difficulty level by importing contracts with an extensive codebase. This allowed us to comprehensively test the GPT-4 model’s auditing capabilities, as opposed to GPT-3 which has a smaller contextual character limit and thus was not evaluated in this context.

For this instance, we used previous case studies as a test template to simulate real-world scenarios:

Example: “Detailed analysis of the $31 Million MonoX Protocol Hack”.

To initiate the audit, we inputted the complete contract in batches and submitted a vulnerability detection request towards the end of the dialogue.

The following prompt was utilized for this test:

“Here is a Solidity smart contract”

Insert Contract Code

“The above is the complete code,help me discover vulnerabilities in this smart contract.”

As demonstrated, despite GPT-4 having the highest single input character limit, according to the information published by OpenAI, it still encountered contextual challenges due to text overflow during the final vulnerability detection request. Consequently, the model can only identify a portion of the content, rendering it incapable of conducting a thorough contextual audit for large-scale contracts.

Batched Auditing: Unpacked Contracts through Incremental Input and Detection:

Prompt 1:

“Help me discover vulnerabilities in this Solidity smart contract.”

Batch 1 of the contract code.

Prompt 2:

“Help me discover vulnerabilities in this Solidity smart contract.”

Batch 2 of the contract code.

Prompt 3:

“Help me discover vulnerabilities in this Solidity smart contract.”

Batch 3 of the contract code.

It is worth mentioning that GPT-4 failed to identify any critical vulnerability points.

Summary: While the current state of GPT’s capabilities may not be entirely suitable for contract analysis, the potential of AI in this domain remains impressive.

Advantages:

While GPT’s detection capabilities for complex vulnerabilities in contract code may be limited, it has shown impressive partial detection capabilities for basic and simple vulnerabilities. Additionally, once a vulnerability is identified, the model provides an explanation in an easily understandable and user-readable format. This unique feature is especially beneficial for novice contract auditors who require quick guidance and straightforward answers during their initial training phase.

Challenges:

There is a certain amount of variation in GPT’s output for each dialogue, which can be adjusted through API interface parameters. However, the output is still not constant. Although such variability is beneficial for language dialogues and greatly enhances the authenticity of the conversation, it is not ideal for code analysis work. In order to cover multiple possible vulnerability answers that AI may provide, we had to make multiple requests for the same question and compare and filter the results. This inadvertently increases the workload, ultimately undermining the fundamental objective of AI in assisting humans to improve their efficiency.

For instance, we conducted an additional test by running Test 2 of the Comparison of Vulnerability Code Snippet Detection with a slight modification of the function name before generating again.

As we can see, its output results have added some additional content compared to the previous test.

There is still significant room for improvement in its vulnerability analysis capabilities.

It is worth noting that the current (as of March 16, 2024) training models of GPT are unable to accurately analyze and identify critical vulnerability points for slightly complex vulnerabilities.

Despite the current limitations of GPT’s analysis and mining capabilities for contract vulnerabilities, its ability to analyze and generate reports on simple code blocks for common vulnerabilities still sparks excitement among users. With continued training and development of GPT and other AI models, we firmly believe that assisted auditing of large and complex contracts will achieve faster, more intelligent, and more comprehensive outcomes in the foreseeable future. As technological development exponentially improves human efficiency, a transformative shift is imminent. We eagerly anticipate the benefits of AI in enhancing blockchain security and remain vigilant in monitoring the impact of emerging AI products on this vital field. In the visible future, we will inevitably integrate with AI to some extent. May AI and blockchain be with you.

Read more: https://slowmist.medium.com/how-effective-is-gpt-for-auditing-smart-contracts-cdeddfa76dbe

Comments

All Comments

Recommended for you

  • Putin: Russia "supports" Harris, calls her smile "contagious"

    According to foreign media such as TASS and Russia's Sputnik News, Jinse Finance reported that on the afternoon of September 5th local time, Russian President Putin said at the plenary session of the Eastern Economic Forum 2024 that Russia will "support" the US Democratic Party presidential candidate and vice president Harris as recommended by the US President Biden in the upcoming US presidential election. When asked how he viewed the 2024 US election, Putin said it was the choice of the American people. The new US president will be elected by the American people, and Russia will respect the choice of the American people. Putin also said that just as Biden suggested his supporters to support Harris, "we will do the same, we will support her." The report said that Putin also joked that Harris' laughter is "expressive and infectious," which shows that "she is doing everything well." He added that this may mean that she will avoid further sanctions against Russia.

  • An ETH whale repurchased 5,153 ETH with 12.23 million USDT 20 minutes ago

    A certain high-frequency trading ETH whale monitored by on-chain analyst Yu Jin bought 5,153 ETH with 12.23 million USDT 20 minutes ago.

  • CFTC: Uniswap Labs has actively cooperated with the investigation and only needs to pay a fine of US$175,000

    The CFTC has filed a lawsuit against Uniswap Labs and reached a settlement. It was found that Uniswap Labs illegally provided leveraged or margined retail commodity transactions of digital assets through a decentralized digital asset trading protocol. Uniswap Labs was required to pay a civil penalty of $175,000 and cease violations of the Commodity Exchange Act (CEA). The CFTC acknowledged that Uniswap Labs actively cooperated with law enforcement agencies in the investigation and reduced the civil penalty.

  • Federal Reserve Beige Book: Respondents generally expect economic activity to remain stable or improve

    The Federal Reserve's Beige Book pointed out that economic activity in three regions has slightly increased, while the number of regions reporting flat or declining economic activity has increased from five in the previous quarter to nine in this quarter. Overall employment levels remain stable, although some reports indicate that companies are only filling necessary positions, reducing working hours and shifts, or reducing overall employment levels through natural attrition. However, reports of layoffs are still rare. Generally speaking, wage growth is moderate, and the growth rate of labor input costs and sales prices ranges from slight to moderate. Consumer spending has declined in most regions, while in the previous reporting period, consumer spending remained stable overall.

  • Puffpaw Completes $6 Million Seed Round with Lemniscap Ventures as Participant

    Puffpaw has announced the completion of a $6 million seed round of financing, with participation from Lemniscap Ventures. The Puffpaw project plans to launch a blockchain-enabled electronic cigarette aimed at helping users reduce nicotine intake through token incentives. The project encourages users to quit smoking by recording their smoking habits and rewarding them with tokens. Puffpaw's token economics aims to cover 30% of the cost of users' first month of using their product and provide social rewards. The project also considers possible system abuse, but the issue of users potentially reporting smoking habits dishonestly is not yet clear.

  • Affected by Ethervista and others, Ethereum Gas temporarily rose to 33gwei

    According to Etherscan, due to the influence of contracts such as Ethervista, Ethereum Gas has temporarily risen to 33gwei, with the top three being EthervistaRouter, UniswapRouter, and BananaGun.

  • The probability of the Fed cutting interest rates by 25 basis points in September is 55%.

    The probability of the Federal Reserve cutting interest rates by 25 basis points in September is 55.0%, while the probability of a 50 basis point cut is 45.0%. The probability of the Federal Reserve cutting interest rates by a cumulative 50 basis points by November is 32.1%, by 75 basis points is 49.2%, and by 100 basis points is 18.8%.

  • Nvidia: No subpoena received from the US Department of Justice

    Nvidia (NVDA.O) stated that it has not received a subpoena from the US Department of Justice.

  • US SEC again postpones decision on environmentally friendly Bitcoin ETF listing application

    The US Securities and Exchange Commission (SEC) has once again postponed its final decision on the New York Stock Exchange (NYSE) Arca's application for a carbon offset Bitcoin ETF. According to a document dated September 4th, the decision has been extended to November 21st. The ETF aims to provide a Bitcoin investment exposure in an environmentally friendly way by offsetting carbon emissions, tracking an investment portfolio composed of 80% Bitcoin and 20% carbon credit futures. Tidal Investments submitted the fund registration application in December 2023, while NYSE Arca submitted the initial application in March. Concerns have been raised about the environmental impact of Bitcoin mining, with the International Monetary Fund (IMF) reporting that cryptocurrency mining accounts for 1% of global greenhouse gas emissions. The delay in this decision also includes the postponement of approval for the Nasdaq One-Stop Cryptocurrency Investment Portfolio ETF.

  • Nvidia delays next gen AI chip as investors issue ‘bubble’ warning

    After briefly breaking the $3 trillion market capitalization mark in June, things have taken a negative turn for the world’s most valuable chipmaker.