JobFitAI: Complete Resume Analyzer Challenge

In at present’s aggressive job market, making your resume stand out is essential. JobFitAI is an progressive answer designed to assist each job seekers and recruiters by analyzing resumes and providing actionable suggestions. Conventional keyword-based filtering strategies can overlook essential nuances in a candidate’s profile. To beat these challenges, AI-powered techniques could be leveraged to investigate resumes, extract key expertise, and match them successfully with job descriptions.

Studying Aims

  • Set up all required libraries and configure your surroundings with DeepInfra API key.
  • Discover ways to create a AI resume analyzer that processes each PDF and audio information.
  • Make the most of DeepSeek-R1 by way of DeepInfra to extract related data from resumes.
  • Develop an interactive net app utilizing Gradio for seamless consumer interplay.
  • Apply sensible enhancements and troubleshoot widespread points, including important worth to your resume analyzer.

This text was revealed as part of the Information Science Blogathon.

What’s Deepseek R1

DeepSeek-R1 is a complicated open-source AI mannequin designed for pure language processing (NLP) duties. It’s a transformer-based massive language mannequin (LLM) educated to grasp and generate human-like textual content. DeepSeek-R1 can carry out duties akin to textual content summarization, query answering, language translation, and extra. As a result of it’s open-source, builders can combine it into varied functions, fine-tune it for particular wants, and run it on their {hardware} with out counting on proprietary techniques. It’s significantly helpful for analysis, automation, and AI-driven functions.

Additionally Learn: Decoding DeepSeek R1’s Superior Reasoning Capabilities

Understanding Gradio

Gradio is a user-friendly Python library that helps builders create interactive net interfaces for machine studying fashions and different functions. With just some traces of code, Gradio permits customers to construct shareable functions with enter parts (akin to textual content containers, sliders, and picture uploads) and output shows (akin to textual content, pictures, or audio). It’s broadly used for AI mannequin demonstrations, fast prototyping, and user-friendly interfaces for non-technical customers. Gradio additionally helps simple mannequin deployment, permitting builders to share their functions by way of public hyperlinks with out requiring advanced net improvement expertise.

This information presents JobFitAI, an end-to-end answer that extracts textual content, generates an in depth evaluation, and offers suggestions on how nicely the resume matches a given job description utilizing cutting-edge applied sciences:

  • DeepSeek-R1: A strong AI mannequin that extracts key expertise, experiences, training, and achievements from resume texts.
  • DeepInfra: Gives a sturdy OpenAI-compatible API interface that enables us to work together with AI fashions like DeepSeek-R1 in a seamless method.
  • Gradio: A user-friendly framework that allows you to construct interactive net interfaces for machine studying functions shortly and simply.

Challenge Structure

The JobFitAI venture is constructed round a modular structure, the place every part performs a selected function in processing resumes. Beneath is an summary:

JobFitAI/ 
│── src/
│   ├── __pycache__/  (compiled Python information)
│   ├── analyzer.py
│   ├── audio_transcriber.py
│   ├── feedback_generator.py
│   ├── pdf_extractor.py
│   ├── resume_pipeline.py
│── .env  (surroundings variables)
│── .gitignore
│── app.py  (Gradio interface)
│── LICENSE
│── README.md
│── necessities.txt  (dependencies)

Setting Up the Setting

Earlier than diving into the code, you should arrange your improvement surroundings.

Making a Digital Setting and Putting in Dependencies

First, create a digital surroundings in your venture folder to handle your dependencies. Open your terminal and run:

python3 -m venv jobfitai
supply jobfitai/bin/activate  # On macOS/Linux

python -m venv jobfitai
jobfitaiScriptsactivate # On Home windows - cmd

Subsequent, create a file named necessities.txt and add the next libraries:

requests 
whisper
PyPDF2
python-dotenv
openai
torch
torchvision
torchaudio
gradio

Set up the dependencies by operating:

pip set up -r necessities.txt

Setting Up Setting Variables

The venture requires an API token to work together with the DeepInfra API. Create a .env file in your venture’s root listing and add your API token:

DEEPINFRA_TOKEN="your_deepinfra_api_token_here"

Be certain to switch your_deepinfra_api_token_here with the precise token supplied by DeepInfra.

Study to entry the DeepInfra API key; right here.

Challenge Walkthrough

The venture is structured into a number of Python modules. Within the following sections, we’ll perceive the aim of every file and its context within the venture.

src/audio_transcriber.py

Resumes could not at all times be in textual content format. In circumstances the place you obtain an audio resume, the AudioTranscriber class comes into play. This file makes use of OpenAI’s Whisper mannequin to transcribe audio information into textual content. The transcription is then utilized by the analyzer to extract resume particulars.

import whisper

class AudioTranscriber:
    """Transcribe audio information utilizing OpenAI Whisper."""
    def __init__(self, model_size: str = "base"):
        """
        Initializes the Whisper mannequin for transcription.
        
        Args:
            model_size (str): The scale of the Whisper mannequin to load. Defaults to "base".
        """
        self.model_size = model_size 
        self.mannequin = whisper.load_model(self.model_size)

    def transcribe(self, audio_path: str) -> str:
        """
        Transcribes the given audio file and returns the textual content.
        
        Args:
            audio_path (str): The trail to the audio file to be transcribed.
        
        Returns:
            str: The transcribed textual content.
        
        Raises:
            Exception: If transcription fails.
        """
        attempt:
            consequence = self.mannequin.transcribe(audio_path)
            return consequence["text"]
        besides Exception as e:
            print(f"Error transcribing audio: {e}")
            return ""

Most resumes can be found in PDF format. The PDFExtractor class is answerable for extracting textual content from PDF information utilizing the PyPDF2 library. This module loops via all pages of a PDF doc, extracts the textual content, and compiles it right into a single string for additional evaluation.

import PyPDF2

class PDFExtractor:
    """Extract textual content from PDF information utilizing PyPDF2."""

    def __init__(self):
        """Initialize the PDFExtractor."""
        move

    def extract_text(self, pdf_path: str) -> str:
        """
        Extract textual content content material from a given PDF file.

        Args:
            pdf_path (str): Path to the PDF file.

        Returns:
            str: Extracted textual content from the PDF.

        Raises:
            FileNotFoundError: If the file doesn't exist.
            Exception: For different sudden errors.
        """
        textual content = ""
        attempt:
            with open(pdf_path, "rb") as file:
                reader = PyPDF2.PdfReader(file)
                for web page in reader.pages:
                    page_text = web page.extract_text()
                    if page_text:
                        textual content += page_text + "n"
        besides FileNotFoundError:
            print(f"Error: The file '{pdf_path}' was not discovered.")
        besides Exception as e:
            print(f"An error occurred whereas extracting textual content: {e}")
        
        return textual content 

src/resume_pipeline.py

The ResumePipeline module acts because the orchestrator for processing resumes. It integrates each the PDF extractor and the audio transcriber. Based mostly on the file sort supplied by the consumer, it directs the resume to the proper processor and returns the extracted textual content. This modular design permits for simple enlargement if extra resume codecs have to be supported sooner or later.

from src.pdf_extractor import PDFExtractor
from src.audio_transcriber import AudioTranscriber

class ResumePipeline:
    """
    Course of resume information (PDF or audio) and return extracted textual content.
    """

    def __init__(self):
        """Initialize the ResumePipeline with PDFExtractor and AudioTranscriber."""
        self.pdf_extractor = PDFExtractor()
        self.audio_transcriber = AudioTranscriber()

    def process_resume(self, file_path: str, file_type: str) -> str:
        """
        Course of a resume file and extract textual content primarily based on its sort.

        Args:
            file_path (str): Path to the resume file.
            file_type (str): Sort of the file ('pdf' or 'audio').

        Returns:
            str: Extracted textual content from the resume.

        Raises:
            ValueError: If the file sort is unsupported.
            FileNotFoundError: If the desired file doesn't exist.
            Exception: For different sudden errors.
        """
        attempt:
            file_type_lower = file_type.decrease()
            if file_type_lower == "pdf":
                return self.pdf_extractor.extract_text(file_path)
            elif file_type_lower in ["audio", "wav", "mp3"]:
                return self.audio_transcriber.transcribe(file_path)
            else:
                increase ValueError("Unsupported file sort. Use 'pdf' or 'audio'.")
        besides FileNotFoundError:
            print(f"Error: The file '{file_path}' was not discovered.")
            return ""
        besides ValueError as ve:
            print(f"Error: {ve}")
            return ""
        besides Exception as e:
            print(f"An sudden error occurred: {e}")
            return ""

src/analyzer.py

This module is the spine of the resume analyzer. It initializes the connection to DeepInfra’s API utilizing the DeepSeek-R1 mannequin. The primary operate on this file is analyze_text, which takes resume textual content as enter and returns evaluation summarizing key particulars from the resume. This file ensures that our resume textual content is processed by an AI mannequin tailor-made for resume evaluation.

import os
from openai import OpenAI 
from dotenv import load_dotenv

# Load surroundings variables from .env file
load_dotenv()

class DeepInfraAnalyzer:
    """
    Calls DeepSeek-R1 mannequin on DeepInfra utilizing an OpenAI-compatible interface.
    This class processes resume textual content and extracts structured data utilizing AI.
    """ 
    def __init__(
        self,
        api_key: str= os.getenv("DEEPINFRA_TOKEN"),
        model_name: str = "deepseek-ai/DeepSeek-R1"
    ):
        """
        Initializes the DeepInfraAnalyzer with API key and mannequin title.

        :param api_key: API key for authentication 
        :param model_name: The title of the mannequin to make use of 
        """
        attempt:
            self.openai_client = OpenAI(
                api_key=api_key, 
                base_url="https://api.deepinfra.com/v1/openai",
            )
            self.model_name = model_name 
        besides Exception as e:
            increase RuntimeError(f"Did not initialize OpenAI shopper: {e}")
    

    def analyze_text(self, textual content: str) -> str:
        """
        Processes the given resume textual content and extracts key data in JSON format.
        The response will include structured particulars about key expertise, expertise, training, and so forth.

        :param textual content: The resume textual content to investigate
        :return: JSON string with structured resume evaluation
        """
        immediate = (
            "You might be an AI job resume matcher assistant. "
            "DO NOT present your chain of thought. "
            "Reply ONLY in English. "
            "Extract the important thing expertise, experiences, training, achievements, and so forth. from the next resume textual content. "
            "Then produce the ultimate output as a well-structured JSON with a top-level key referred to as "evaluation". "
            "Inside "evaluation", you may have subkeys like "key_skills", "experiences", "training", and so forth. "
            "Return ONLY the ultimate JSON, with no additional commentary.nn"
            f"Resume Textual content:n{textual content}nn"
            "Required Format (instance):n"
            "```n"
            "{n"
            "  "evaluation": {n"
            "    "key_skills": [...],n"
            "    "experiences": [...],n"
            "    "training": [...],n"
            "    "achievements": [...],n"
            "    ...n"
            "  }n"
            "}n"
            "```n"
        ) 
        attempt:
            response = self.openai_client.chat.completions.create(
                mannequin=self.model_name,
                messages=[{"role": "user", "content": prompt}], 
            )
            return response.decisions[0].message.content material
        besides Exception as e:
            increase RuntimeError(f"Error processing resume textual content: {e}") 

src/feedback_generator.py

After extracting particulars from the resume, the subsequent step is to match the resume towards a selected job description. The FeedbackGenerator module takes the evaluation from the resume and offers a match rating together with suggestions for enchancment. This module is essential for job seekers aiming to refine their resumes to higher align with job descriptions, growing their probabilities of passing via ATS techniques.

from src.analyzer import DeepInfraAnalyzer 

class FeedbackGenerator:
    """
    Generates suggestions for resume enchancment primarily based on a job description 
    utilizing the DeepInfraAnalyzer.
    """

    def __init__(self, analyzer: DeepInfraAnalyzer):
        """
        Initializes the FeedbackGenerator with an occasion of DeepInfraAnalyzer.

        Args:
            analyzer (DeepInfraAnalyzer): An occasion of the DeepInfraAnalyzer class.
        """
        self.analyzer = analyzer 

    def generate_feedback(self, resume_text: str, job_description: str) -> str:
        """
        Generates suggestions on how nicely a resume aligns with a job description.

        Args:
            resume_text (str): The extracted textual content from the resume.
            job_description (str): The job posting or job description.

        Returns:
            str: A JSON-formatted response containing:
                - "match_score" (int): A rating from 0-100 indicating job match high quality.
                - "job_alignment" (dict): Categorization of sturdy and weak matches.
                - "missing_skills" (listing): Abilities lacking from the resume.
                - "suggestions" (listing): Actionable options for enchancment.

        Raises:
            Exception: If an sudden error happens throughout evaluation.
        """
        attempt:
            immediate = (
                "You might be an AI job resume matcher assistant. "
                "DO NOT present your chain of thought. "
                "Reply ONLY in English. "
                "Evaluate the next resume textual content with the job description. "
                "Calculate a match rating (0-100) for the way nicely the resume matches. "
                "Determine key phrases from the job description which are lacking within the resume. "
                "Present bullet-point suggestions to enhance the resume for higher alignment.nn"
                f"Resume Textual content:n{resume_text}nn"
                f"Job Description:n{job_description}nn"
                "Return JSON ONLY on this format:n"
                "{n"
                "  "job_match": {n"
                "    "match_score": <integer>,n"
                "    "job_alignment": {n"
                "      "strong_match": [...],n"
                "      "weak_match": [...]n"
                "    },n"
                "    "missing_skills": [...],n"
                "    "suggestions": [n"
                "      "<Actionable Suggestion 1>",n"
                "      "<Actionable Suggestion 2>",n"
                "      ...n"
                "    ]n"
                "  }n"
                "}"
            ) 
            return self.analyzer.analyze_text(immediate)

        besides Exception as e:
            print(f"Error in producing suggestions: {e}")
            return "{}"  # Returning an empty JSON string in case of failure

app.py

The app.py file is the primary entry level of the JobFitAI venture. It integrates all of the modules described above and builds an interactive net interface utilizing Gradio. Customers can add a resume/CV file (PDF or audio) and enter a job description. The appliance then processes the resume, runs the evaluation, generates suggestions, and returns a structured JSON response with each the evaluation and proposals.

import os
from dotenv import load_dotenv
load_dotenv()

import gradio as gr 
from src.resume_pipeline import ResumePipeline
from src.analyzer import DeepInfraAnalyzer
from src.feedback_generator import FeedbackGenerator

# Pipeline for PDF/audio
resume_pipeline = ResumePipeline()

# Initialize the DeepInfra analyzer   
analyzer = DeepInfraAnalyzer()

# Suggestions generator
feedback_generator = FeedbackGenerator(analyzer) 
 
def analyze_resume(resume_path, job_desc):
    """
    Gradio callback operate to investigate a resume towards a job description.

    Args:
        resume_path (str): Path to the uploaded resume file (PDF or audio).
        job_desc (str): The job description textual content for comparability.
    
    """ 
    attempt:
        if not resume_path or not job_desc:
            return {"error": "Please add a resume and enter a job description."}

        # Decide file sort from extension
        lower_name = resume_path.decrease()
        file_type = "pdf" if lower_name.endswith(".pdf") else "audio"

        # Extract textual content from the resume
        resume_text = resume_pipeline.process_resume(resume_path, file_type)

        # Analyze extracted textual content
        analysis_result = analyzer.analyze_text(resume_text)

        # Generate suggestions and proposals
        suggestions = feedback_generator.generate_feedback(resume_text, job_desc)

        # Return structured response
        return {
            "evaluation": analysis_result,
            "suggestions": suggestions
        }
    besides ValueError as e:
        return {"error": f"Unsupported file sort or processing error: {str(e)}"}
    besides Exception as e:
        return {"error": f"An sudden error occurred: {str(e)}"}
    
# Outline Gradio interface
demo = gr.Interface(
    fn=analyze_resume,
    inputs=[
        gr.File(label="Resume (PDF/Audio)", type="filepath"),
        gr.Textbox(lines=5, label="Job Description"),
    ],
    outputs="json",
    title="JobFitAI: AI Resume Analyzer",
    description="""
Add your resume/cv (PDF or audio) and paste the job description to get a match rating,
lacking key phrases, and actionable suggestions.""",
)

if __name__ == "__main__": 
    demo.launch(server_name="0.0.0.0", server_port=8000) 

Operating the Software with Gradio

After organising your surroundings and reviewing all code parts, you’re able to run the applying.

  • Begin the Software: In your terminal, navigate to your venture listing and execute the beneath code
python app.py

  • This command will launch the Gradio interface regionally. Open the supplied URL in your browser to see the interactive resume analyzer.
  • Take a look at the JobFitAI:
    • Add a Resume/CV: Choose a PDF file or an audio file containing a recorded resume.
    • Enter a Job Description: Paste or sort in a job description
    • Evaluation the Output: The system will show a JSON response that features each an in depth evaluation of the resume, matching rating, lacking key phrases and suggestions with options for enchancment.

You’ll find all of the code information in Github repo – right here.

Use Instances and Sensible Functions

The JobFitAI resume analyzer could be utilized in varied real-world situations:

Bettering Resume High quality

  • Self-Evaluation: Candidates can use the device to self-assess their resumes earlier than making use of. By understanding the match rating and the areas that want enchancment, they’ll higher tailor their resumes for particular roles.
  • Suggestions Loop: The structured JSON suggestions generated by the device could be built-in into profession counseling platforms, offering personalised resume enchancment suggestions.

Instructional and Coaching Functions

  • Profession Workshops: Instructional establishments and profession teaching platforms can incorporate JobFitAI into their curriculum. It serves as a sensible demonstration of how AI can be utilized to boost profession readiness.
  • Coding and AI Initiatives: Aspiring knowledge scientists and builders can study integrating a number of AI companies (akin to transcription, PDF extraction, and pure language processing) right into a cohesive venture.

Troubleshooting and Extensions

Allow us to now discover troubleshooting and extensions below-

Frequent Points and Options

  • API Token Points: If the DeepInfra API token is lacking or incorrect, the analyzer module will fail. At all times confirm that your .env file incorporates the proper token and that the token is energetic.
  • Unsupported File Sorts: The appliance at present helps solely PDF and audio codecs. Should you try and add one other file sort (akin to DOCX), the system will increase an error. Future extensions can embody help for added codecs.
  • Transcription Delays: Audio transcription can typically take longer, particularly for bigger information. Think about using a higher-specification machine or a cloud-based answer when you plan on processing many audio resumes.

Concepts for Additional Growth

  • Help Extra File Codecs: Lengthen the resume pipeline to help extra file sorts like DOCX or plain textual content.
  • Enhanced Suggestions Mechanism: Combine extra subtle pure language processing fashions to offer richer, extra nuanced suggestions past the fundamental match rating.
  • Consumer Authentication: Implement consumer authentication to permit job seekers to avoid wasting their evaluation and observe enhancements over time.
  • Dashboard Integration: Construct a dashboard the place recruiters can handle and evaluate resume analyses throughout a number of candidates.
  • Efficiency Optimization: Optimize the audio transcription and PDF extraction processes for quicker evaluation on large-scale datasets.

Conclusion

The JobFitAI resume analyzer is a sturdy, multi-functional device that leverages state-of-the-art AI fashions to bridge the hole between resumes and job descriptions. By integrating DeepSeek-R1 by way of DeepInfra, together with transcription and PDF extraction capabilities, you now have a whole answer to robotically analyze resumes and generate suggestions for improved job alignment.

This information supplied a complete walk-through—from organising the surroundings to understanding every module’s function and eventually operating the interactive Gradio interface. Whether or not you’re a developer seeking to increase your portfolio, an HR skilled desirous to streamline candidate screening, or a job seeker aiming to boost your resume, the JobFitAI venture provides sensible insights and a very good place to begin for additional exploration.

Embrace the ability of AI, experiment with new options, and proceed refining the venture to fit your wants. The way forward for job functions is right here, and it’s smarter than ever!

Key Takeaways

  • JobFitAI leverages DeepSeek-R1 and DeepInfra to extract expertise, experiences, and achievements from resumes for higher job matching.
  • The system helps each PDF and audio resumes, utilizing PyPDF2 for textual content extraction and Whisper for audio transcription.
  • Gradio allows a seamless, user-friendly net interface for real-time resume evaluation and suggestions.
  • The venture makes use of a modular structure and surroundings setup with API keys for easy integration and scalability.
  • Builders can fine-tune DeepSeek-R1, troubleshoot points, and increase performance for extra strong AI-driven resume screening.

Regularly Requested Questions

Q1: What forms of resumes does JobFitAI help?

A: The present model helps resumes in PDF and audio codecs. Future updates could embody help for added codecs akin to DOCX or plain textual content.

Q2: Is DeepInfra API free?

A: No, accessing the DeepSeek-R1 mannequin via the DeepInfra API requires a paid plan. For detailed pricing data, please go to DeepInfra’s official web page.  

Q3: Can I customise the suggestions supplied by the analyzer?

A: Sure! You may modify the immediate or combine extra fashions to tailor the suggestions to your particular necessities.

This fall: What ought to I do if I encounter points with audio transcription?

A: Audio transcription could typically be delayed, particularly for bigger information. Confirm that your surroundings meets the mandatory computational necessities, and take into account optimizing the transcription course of or utilizing cloud-based sources if wanted.

The media proven on this article will not be owned by Analytics Vidhya and is used on the Creator’s discretion.

Howdy! I am a passionate AI and Machine Studying fanatic at present exploring the thrilling realms of Deep Studying, MLOps, and Generative AI. I take pleasure in diving into new tasks and uncovering progressive methods that push the boundaries of know-how. I will be sharing guides, tutorials, and venture insights primarily based by myself experiences, so we are able to study and develop collectively. Be part of me on this journey as we discover, experiment, and construct wonderful options on this planet of AI and past!

Login to proceed studying and luxuriate in expert-curated content material.