Build an AI PDF Analyzer - Day 2: Multi-PDF Comparison

This program lets users ask questions related to multiple PDFs, synthesizing and comparing data between them.

Apr 09, 2026

∙ Paid

Projects in this week’s series:

This week, we build a smart PDF analyzer powered by LangChain and Google’s Gemini AI that lets you upload PDFs and ask questions about them.

Day 1: PDF Question Answering
Day 2: Multi-PDF Comparison (Today)
Day 3: PDF Analyzer Web App

View All Projects This Week

Today’s Project

Yesterday we built a tool to ask questions about a single PDF. Today we’re expanding to multiple PDFs — upload several documents and ask comparative questions across all of them!

Compare resumes, analyze multiple research papers, find differences in contracts, or synthesize information from multiple sources!

Project Task

Create a multi-PDF comparison tool that:

Loads multiple PDF files from a directory
Extracts text from all documents
Creates context with all PDFs combined
Answers questions across all documents
Compares and contrasts information
Identifies which document contains specific information
Maintains conversation history
Works entirely from command line

This project gives you hands-on practice with batch file processing, multi-document analysis, comparative AI reasoning, and building practical document comparison tools — essential skills for real-world AI applications!

Expected Output

When the user runs the program, they are asked to enter the directory where the PDF files are located (e.g., documents):

In the next step, the user asks a question about the information contained in the PDF files. In our example, we have some university student data in the PDFs and some students are in both PDFs so below we ask the program to give us the student names that are in both documents:

As you can see the program is aware of the PDF file names and gives us the list of shared student names in both documents, and also highlighting some potential issues with the data integrity in the documents which is great information to have.

Here is the PDF files we used for this example:

Report

7.95KB ∙ PDF file

Download

Report2

8.06KB ∙ PDF file

Download

Setup Instructions

Install Required Packages:

pip install langchain-google-genai pypdf

Get Your Google API Key:

Go to Google AI Studio
Click “Create API Key”
Copy your key
Paste it in the script where it says YOUR_GOOGLE_API_KEY

Prepare Your PDFs:

Create a directory with multiple PDFs:

./documents/
├── report.pdf
├── report2.pdf

Run the tool:

python solution.py

Enter the directory path and start comparing!

Understanding Multi-Document Analysis

How it works:

Directory → Load all PDFs → Extract text from each → Combine into context
                                                              ↓
                                                    Single system prompt with all docs
                                                              ↓
Your question → "Which document mentions X?" → AI compares all documents → Answer

Key technique: Document labeling

context = ""
for pdf_file in pdf_files:
    content = extract_text(pdf_file)
    context += f"\n\n=== {pdf_file.name} ===\n{content}\n"

This labels each document so the AI knows which content comes from which file!

Multi-document capabilities:

Comparison - “Which resume is stronger?”
Contrast - “What are the differences between these papers?”
Synthesis - “What do all these contracts have in common?”
Attribution - “Which document mentions the refund policy?”
Ranking - “Order these candidates by experience level”

Use cases:

HR: Compare job applicant resumes
Research: Analyze multiple papers on the same topic
Legal: Compare contract versions or terms
Business: Analyze competitor reports
Academic: Compare different textbook chapters

Coming Tomorrow

Tomorrow we’re building a web app with Flask/Streamlit that lets anyone upload PDFs through a browser, ask questions, and get instant answers — no command line needed!

View Code Evolution

Compare today’s solution with earlier versions and see how we evolved from single-PDF analysis to multi-document comparison:

Keep reading with a 7-day free trial

Subscribe to Daily Python Projects to keep reading this post and get 7 days of free access to the full post archives.