CHATGPT is amazing but only a text model. What if it was able to take decisions and perform actions like an AGI?
In this hackathon we tried to solve this problem end-to-end for a browser. SAMARTH is capable of taking any instruction which can be done in a browser from user form CLI or voice and executing it automatically. For example, you can ask it to find someone one LinkedIn and send a connection request with a message and it will be done fully automatically for you.
You can also use it for more complex tasks like booking a cab from one location to another. It can be used to automate many day-to-day activites in browser and save you a ton of time.
We have used Playwright for scrapping and navigating across websites and GPT-3 API for processing information and taking actions. Presently, it is a CLI tool and also can also take voice input.
It has displayed near-AGI results and with more powerful GPT model and better prompt engineering, it can achieve much more.
The most challenging part was scrapping the website and presenting it in an understandable way to GPT so that it can give read in a single prompt. We spent a lot of time to extract the DOM out of the website using playwright by playing around with Chrome Debugger Protocol.
We've also faced issues in generating the prompts from GPT model because it's limited to certain number of tokens per day which made us difficult to test in various scenarios.
We also tried to make the script login automatically in some websites where Sign in is requried but there are some permissions related to bot activity which doesn't allow this process.