Cerebrus - AI CyberShield

AI-Powered Malware Detection: Static, Dynamic, Hybrid Analysis, and Reporting

Created on 13th April 2025

•

Cerebrus - AI CyberShield

AI-Powered Malware Detection: Static, Dynamic, Hybrid Analysis, and Reporting

The problem Cerebrus - AI CyberShield solves

In the ever-evolving cybersecurity landscape, traditional malware detection methods are struggling to keep pace with increasingly sophisticated threats. Malware is becoming more diverse, and the volume of new strains (including zero-day attacks) is overwhelming.

Key challenges include:

Zero-Day Attacks: Malware that hasn't been seen before, making signature-based detection ineffective.
Manual Analysis: Security analysts are bombarded with huge numbers of files (e.g., .exe, .dll, scripts, etc.) that need to be manually analyzed, an inefficient and error-prone process.
Lack of Transparency: Traditional tools flag potential malware but often fail to explain the reasoning behind their decisions, making it difficult to trust their output.
Cerebrus addresses these issues by offering an AI-powered malware detection solution that combines static, dynamic, and real-time analysis, all while providing explainable AI for transparency.

Key Features & Benefits:
AI-Driven Static Analysis:
How It Helps: Before executing any file, Cerebrus performs static analysis using Machine Learning (trained via a RandomForestClassifier on PE file features) to detect known and potentially unknown malicious patterns. This pre-execution analysis helps mitigate the risk of running dangerous files.
Benefit: Developers and security analysts can identify harmful files early, minimizing the risk of infections and saving time.

Dynamic Analysis:
How It Helps: Cerebrus also supports dynamic analysis, which involves running the file in a controlled environment (sandbox) to observe its behavior in real-time. This allows for detection of malware that may evade static analysis by using techniques like obfuscation or polymorphism.
Benefit: Security analysts can identify threats that may not be detected in a static analysis, providing a more comprehensive detection

Challenges we ran into

Implementing Explainable AI (XAI):
A significant challenge was ensuring that the AI model’s predictions were explainable. While the Random Forest model performed well in terms of classification accuracy, it is inherently a black box. We needed a way to make the model's decisions transparent to build trust with security analysts.

Key Issues:
Interpreting SHAP Values: The raw SHAP values provided by the model were complex to interpret, especially when analyzing feature interactions within the tree-based Random Forest model.
SHAP Library Compatibility: SHAP’s TreeExplainer is tailored for tree-based models like Random Forest, but there were nuances in handling the feature interactions properly.

How We Overcame It:
We developed custom functions in model_explainer.py to correctly calculate SHAP values for our specific Random Forest model.
We created a way to process the raw SHAP values and extract the top contributing features, both positive and negative, for each file.
We implemented visualization functions (generate_explanation_plot) to display these contributions as waterfall or bar plots, making the model's decision process clearer and more understandable.

Technologies used

Flask

scikit-learn

pandas

Matplotlib

Python

Joblib

psutil

Discussion

Builders also viewed

See more projects on Devfolio