Smart-Hire

SmartHire is an AI-powered screening system

Built at KnowCode 3.0

Created on 25th January 2026

•

Smart-Hire

SmartHire is an AI-powered screening system

The problem Smart-Hire solves

The Problem it Solves
In the modern recruitment landscape, HR teams are overwhelmed by two major issues: Volume and Authenticity.

How it makes tasks easier:
Massive Time Savings: Manually reading a resume takes 2-3 minutes. This system parses and scores a resume in less than 2 seconds, allowing recruiters to focus only on the top 5% of candidates.

Information Extraction: It automatically identifies and extracts contact info, education, and years of experience, saving the recruiter from "hunting" through different PDF layouts.

Skill Gap Analysis: It doesn't just look for keywords; it compares the Resume against the Job Description (JD) and explicitly lists Missing Skills, helping recruiters understand exactly what a candidate lacks.

How it makes tasks safer:
AI Detection (The Integrity Check): With the rise of ChatGPT, many candidates generate "perfect" resumes that don't reflect their actual skills. My tool uses Machine Learning (Voting Classifier) to detect AI patterns, ensuring companies hire genuine talent.

Objectivity: By using a mathematical Weighted Scoring Algorithm, it removes human fatigue and unconscious bias from the initial screening phase.

Challenges we ran into

Building an end-to-end AI system comes with unique technical hurdles. Here are the specific challenges I faced:

A. The "Date Parsing" Nightmare (Experience Calculation)
The Hurdle: Extracting "Years of Experience" is surprisingly difficult because everyone writes dates differently (e.g., "Jan 2020 - Present", "05/19 to 08/22", or just "2 years"). Simple keyword matching wasn't enough.

The Solution: I implemented a complex Regex-based logic combined with Python's dateutil library. I built a parser that recognizes multiple date formats, converts them into a standardized datetime object, and calculates the delta (difference) between them while handling "Present" as the current date.

B. Feature Engineering for AI Detection
The Hurdle: Initially, my AI detector was giving too many "False Positives" (marking humans as AI). High-quality human writing often looks "too perfect," which confuses basic models.

The Solution: Instead of just looking at words, I shifted to Linguistic Features. I integrated textstat to calculate Perplexity (predictability) and Burstiness (variation in sentence length). Humans tend to have "bursty" writing (mixing long and short sentences), while AI is more uniform. Adding these as features to my Voting Classifier significantly improved accuracy.

C. PDF Layout Complexity
The Hurdle: Some resumes are single-column, others are double-column. Standard text extractors often mix up the text from the left and right columns, making the data nonsensical.

The Solution: I switched to PyMuPDF (fitz), which allows for better block-level text extraction. I also added a fallback mechanism to PyPDF2 and python-docx to ensure that even if one library fails to parse a specific format, the system remains functional.

D. Memory and Efficiency
The Hurdle: Loading large NLP models (like NLTK corpora) every time a page refreshed was making the Flask app slow.

The Solution: I used an _ensure_nltk() function to check for resources only once at startup and optimized the database queries using SQLAlchemy ORM to ensure the app stays lightweight and fast.

Tracks Applied (1)

Ethereum Track

A. The "Date Parsing" Complexity (Experience Calculation) The Hurdle: Extracting "Total Years of Experience" is notoriou...Read More

ETHIndia

Technologies used

HTML

CSS

JavaScript

Flask

Python

Discussion

Builders also viewed

See more projects on Devfolio