Open-Source Tool Using AI to Check Code from Max Tegmark and colleagues

October 30, 2024

by Foundational Questions Institute, FQxI

FQxI's Max Tegmark and colleagues have created an open-source tool for checking that code is free from bugs.

From the team's new preprint:

We introduce DafnyBench, the largest benchmark of its kind for training and evaluating machine learning systems for formal software verification. We test the ability of LLMs such as GPT-4 and Claude 3 to auto-generate enough hints for the Dafny formal verification engine to successfully verify over 750 programs with about 53,000 lines of code. The best model and prompting scheme achieved 68% success rate, and we quantify how this rate improves when retrying with error message feedback and how it deteriorates with the amount of required code and hints. We hope that DafnyBench will enable rapid improvements from this baseline as LLMs and verification techniques grow in quality.

In a separate AI study, Tegmark and other colleagues report a surprising geometric structure in LLM-learned concepts. On X/Twitter, Tegmark notes that they form brain-like "lobes," and "semantic crystals." Read more in their preprint on "The Geometry of Concepts".

We use cookies

This website uses cookies in order to enhance the overall user experience.

Essential cookies

There are some cookies that we have to include in order for certain web pages to function. For this reason, they do not require your consent.

fqxi_cookie_consent

1 year 1 month 1 day

Used to store the user's cookie consent preferences.
fqxi_session

2 hours

Used to identify the user's browsing session.
XSRF-TOKEN

2 hours

Used to secure both the user and our website against cross-site request forgery attacks.

More details

Analytics cookies

We use these for internal research on how we can improve the service we provide for all our users. These cookies assess how you interact with our website.

_ga

2 years 2 months 2 days

Main cookie used by Google Analytics, enables a service to distinguish one visitor from another.
_ga_E1YLE558B4

2 years 2 months 2 days

Used by Google Analytics to persist session state.
_gid

1 day

Used by Google Analytics to identify the user.
_gat

1 minute

Used by Google Analytics to throttle the request rate.

More details

Optional cookies

These cookies enable features that could improve your user experience, but their absence will not impact your ability to browse our website.

_fdt_form_events

1 minute

Enables functioning of embedded user signup form from Flodesk.
YSC

1 minute

Set by YouTube to track user interactions with embedded videos, specifically to remember user input and associate a user’s actions.
VISITOR_INFO1_LIVE

1 month 2 days

Set by YouTube/Google to collect data on your video viewing habits, search queries, and interactions with content. This information is used to serve you targeted ads and recommendations based on your interests.
VISITOR_PRIVACY_METADATA

1 month 2 days

Set by YouTube/Google to store the user's cookie consent state for the current domain.

More details