Code analysis is hard even for experienced auditors, regular devs without a security background have a very hard time being aware of what to test for. This tool tries to bridge that knowledge and skill gap somewhat by using GPT as a "cognitive offload" tool, allowing security knowledge to be encoded and act as a "security copilot" when writing code, helping devs think of test cases and risk categories, and generating Z3 code for a formal model based off of the code, that is able to automatically check if any important invariants are violated via SMT solving.
The pipeline can be extended to use different backends, and open-source LLMs that can be fine tuned on this domain and be kept up to date with new vulnerabilities and specifics (as the ecosystem and libs change, e.g. for new versions of the EVM).
There is surprisingly no Solidity AST parser module in python that is up to date and actually works with the latest compiler. We had to write a quick and dirty parser in python using regex, will have to rewrite it properly after the hackathon.
GPT also is not as good as people think at programming, it takes a lot more work to guide it through finding vulnerabilities than one might expect, especially for finding anything non-trivial.
The context window is also a limitation - if you have whitelisted access to GPT4, you can get up to 8K context widow but that can still be too small. In the future, with a better solidity parser, I'm planning to automatically rewrite code that gets fed in to be more compressed / strip out unnecessary lines that are not relevant to the analysis at hand, and abstract away functions that Z3/GPT doesn't need to know the details of to perform the analysis.
Another approach might be to use an open source model like LED or LongFormer to get a 16k context window - maybe something that can be dropped into LLaMA in the future to get gpt3.5 ish capabilities with a larger window.