paperpetch

PaperFetch is an automated research paper discovery and analysis tool that searches for recent academic papers, processes them with AI for intelligent summarization and relevance scoring, and delivers the results via email.

Features

🔍 Multi-Source Paper Discovery: Searches CrossRef API and optionally Nature/Springer databases for papers published in the last week
🤖 AI-Powered Analysis: Uses LLM to summarize papers and rate their relevance to your research interests
📧 Email Delivery: Sends beautifully formatted HTML email reports with paper summaries and ratings
⚡ Concurrent Processing: Efficiently processes multiple papers simultaneously
🛡️ Smart Rate Limiting: Configurable limits to prevent excessive API usage
🔄 Retry Logic: Robust error handling with automatic retries for API calls
🔎 Flexible Query Syntax: Supports both list-based and string-based queries with multi-term search

Prerequisites

Python 3.13 or higher
UV package manager (recommended) or pip
Access to an OpenAI-compatible API (OpenAI, local LLM server, etc.)
Email account with SMTP access (Gmail, etc.)
(Optional) Springer API key for Nature/Springer database access

Installation

Clone the repository:

git clone https://github.com/LouisFaure/paperfetch.git
cd paperfetch

Install dependencies:

With UV (recommended):
```
uv sync
```
With pip:
```
pip install -r requirements.txt
```

Configuration

Copy the example configuration:
```
cp config_example.toml config.toml
```
Edit config.toml with your settings:

# PaperFetch Configuration File

[search]
# The search query to use for finding papers
# Can be a list of terms or a single string (for backward compatibility)
# When using a list, terms are joined with spaces for CrossRef and with " AND " for Nature/Springer
query = ["single-cell", "tissue ecosystem"]
# Alternative single string format (deprecated but still supported):
# query = "single-cell tissue ecosystem"

# Optional: a short text describing the researcher's current interests.
# If provided, PaperFetch will include this text alongside the query when asking the LLM
# to rate relevance. Example: "causal inference, interpretability, healthcare"
researcher_interests = """
I am currently developing a VAE model for scRNASeq that enhances interpretability. 
I am interested in causal inference, health data, cancer research"""
# Maximum number of papers to process with LLM (set to 0 to disable LLM processing entirely)
max_papers_for_llm = 100
# Number of days to check for new papers (default is 7)
days_to_check = 7

[api]
# Email address for CrossRef API requests (required for polite usage)
mailto = "[email protected]"
# OpenAI API key or compatible API key
openai_api = "sk-your-api-key-here"
# API base URL (use OpenAI's URL or your local server)
openai_url = "https://api.openai.com/v1"
# Model name to use for processing
openai_model = "gpt-4o-mini"
# Number of attemps for LLM calls
max_attempts = 3
# Check or not SSL (use at your own risk!)
ssl_verify = true

# Nature/Springer API configuration (optional)
# Set enable_springer to true to also search Nature/Springer databases
enable_springer = false
springer_api_key = "your_springer_api_key_here"

[email]
# Email configuration for sending results
smtp_server = "smtp.gmail.com"
smtp_port = 587
sender_email = "[email protected]"
# For Gmail, use an App Password instead of your regular password
sender_password = "your_app_password_here"
recipient_email = "[email protected]"
subject_prefix = "PaperFetch Results"

Gmail Setup

For Gmail users:

Enable 2-factor authentication
Generate an App Password: Google Account → Security → 2-Step Verification → App passwords
Use the App Password in the sender_password field

Nature/Springer API Setup (Optional)

To enable searching Nature and Springer journals:

Register for a Springer API key at Springer Developer Portal

Add your API key to config.toml under [api] section:

enable_springer = true
springer_api_key = "your_springer_api_key_here"

When enabled, PaperFetch will search both CrossRef and Nature/Springer databases and merge the results

Usage

Basic Usage

Run with the query from your config file:

uv run main.py

Or with UV script syntax:

uv run --script main.py

Custom Query

Override the config query with command-line arguments:

Single search term:

uv run main.py "quantum computing"

Multiple search terms:

uv run main.py "single-cell" "tissue ecosystem"

When using multiple command-line arguments, each term is treated separately and joined appropriately for each API:

CrossRef: Terms joined with spaces (e.g., single-cell tissue ecosystem)
Nature/Springer: Terms joined with AND logic (e.g., "single-cell" AND "tissue ecosystem")

What Happens

Paper Discovery:
- Searches CrossRef for papers published in the last 7 days matching your query
- If enabled, also searches Nature/Springer databases
- Merges results from both sources (duplicates by title are handled)
AI Analysis: For each paper (up to your configured limit):
- Generates 3-5 key bullet points summarizing the abstract
- Rates relevance on a scale of 0-10. If you provide search.researcher_interests in config.toml, the LLM will rate relevance using both the query and your described researcher interests (preferred when present).
Email Report: Sends an HTML email with:
- Papers sorted by relevance rating
- Clickable titles linking to the papers
- Color-coded interest ratings
- Bullet-point summaries
Backup: Saves results to results.pkl for debugging

Rate Limiting

If more papers are found than your max_papers_for_llm setting, the tool will:

Skip AI processing to avoid excessive API costs
Send an email with just the paper titles and links
Suggest adjusting your search scope or increasing the limit

Output

Successful Processing Email

Papers sorted by AI-generated relevance scores
Color-coded ratings (red: 0-3, orange: 4-6, green: 7-10)
Bullet-point summaries of key findings
Direct links to papers via DOI

Rate-Limited Email

List of paper titles with links
Explanation of why AI processing was skipped
Suggestions for configuration adjustments

File Structure

PaperFetch/
├── main.py              # Main script and orchestration
├── crossref.py          # CrossRef API interaction
├── nature.py            # Nature/Springer API interaction
├── llm.py              # AI processing and summarization
├── mail.py             # Email formatting and sending
├── config_example.toml  # Configuration template
├── config.toml         # Your configuration (not in git)
├── pyproject.toml      # Project dependencies
└── README.md           # This file

Contribution

This project was entirely vibe coded using Claude

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

paperpetch

Features

Prerequisites

Installation

Configuration

Gmail Setup

Nature/Springer API Setup (Optional)

Usage

Basic Usage

Custom Query

What Happens

Rate Limiting

Output

Successful Processing Email

Rate-Limited Email

File Structure

Contribution

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config_example.toml		config_example.toml
crossref.py		crossref.py
llm.py		llm.py
mail.py		mail.py
main.py		main.py
nature.py		nature.py
pyproject.toml		pyproject.toml

License

LouisFaure/paperfetch

Folders and files

Latest commit

History

Repository files navigation

paperpetch

Features

Prerequisites

Installation

Configuration

Gmail Setup

Nature/Springer API Setup (Optional)

Usage

Basic Usage

Custom Query

What Happens

Rate Limiting

Output

Successful Processing Email

Rate-Limited Email

File Structure

Contribution

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages