Release Page Summarizer AI

A Python-based web scraper and summarization tool designed to extract, filter, and analyze data from OpenShift release stream pages. The tool offers integration with JIRA and GitHub, supporting both issue-id based and username based data retrieval and summarization. Supports environment-based configuration, and includes automated testing and CI workflows.

Key Features

Automated Web Scraping: Extracts metadata and release information from OpenShift release page urls and/or JIRA and/or GITHUB urls and correlates related release data under matching projects.
Summary Generation: Creates structured summaries of release data using LLM (Local Mistral or Google Gemini).
Configurable Data Sources: Supports multiple backends including GitHub (PRs, Commits) and JIRA (summary and description fields of all JIRA Artifacts). Fetches publicly available data only.
Secure Environment Management: Credentials and configuration managed via .env.
Test Coverage: Includes unit tests for core logic and controllers.
CI/CD Integration: GitHub Actions used for continuous integration and scheduled execution.
AI-Powered Issue Triage: Automatic issue labeling and assessment using AI to streamline issue management.

Getting Started

Prerequisites

Python 3.8+
pip package manager
Ollama (for local LLM) or Google API Key (for Gemini)

Environment Setup

Set up required environment variables:

# Required for GitHub access
export GH_API_TOKEN=your_github_api_token_here

# Required only if using Gemini as LLM
export GOOGLE_API_KEY=your_gemini_token_here

Get your github API key here: https://docs.github.com/en/rest/authentication/authenticating-to-the-rest-api

Get your gemini API key here: https://ai.google.dev/gemini-api/docs/api-key

Install dependencies and set up the environment:

sh setup.sh

Start the LLM server (if using local Mistral):

sh start_llm.sh

For detailed LLM setup instructions, refer to LLM_SETUP.md.

Usage

The tool provides three main commands:

1. Scrape Data

# Scrape from a URL
python main.py scrape --url <release_page_url>

# Scrape from JIRA
python main.py scrape --issue-ids "ISSUE-1,ISSUE-2" --jira-server "https://jira.example.com"

# Scrape from GitHub with authentication
python main.py scrape --url <release_page_url> --github-token <token> --github-server "https://github.com"

# Enable filtering while scraping
python main.py scrape --url <url> --filter-on

2. Correlate Data

# Correlate previously scraped data
python main.py correlate

3. Generate Summaries

# Generate summaries from correlated data
python main.py summarize --url <url>

# Generate summaries with specific data sources
python main.py summarize --url <url> --issue-ids "ISSUE-1,ISSUE-2" --github-token <token>

Common Options

--filter-on: Enable filtering of data based on configured rules
--url: URL to scrape data from
--issue-ids: Comma-separated list of JIRA issue IDs
--jira-usernames: Comma-separated list of JIRA usernames to fetch data for
--jira-server: JIRA server URL
--jira-username: JIRA username (optional)
--jira-password: JIRA password (optional)
--github-server: GitHub server URL
--github-token: GitHub API token
--github-username: GitHub username (optional)
--github-password: GitHub password (optional)

Output

Generated summaries and data will be stored in the following locations:

Scraped data: data/
Summaries: data/summaries/

Configuration

To disable summary generation, set SUMMARIZE_ENABLED=False in your environment.
For LLM configuration, refer to LLM_SETUP.md.
Additional configuration options can be found in the config/ directory.

Issue Management

The repository uses AI-powered issue triage to automatically categorize and label new issues:

Issue Labels

bug: Bug reports and software issues
enhancement: Feature requests and improvements
question: General questions and help requests
documentation: Documentation-related issues
dependencies: Package and dependency-related issues
security: Security-related concerns

When a new issue is created:

AI automatically analyzes the issue content
Appropriate labels are applied based on the analysis
A comment is added with the AI assessment
Maintainers are notified for review

The AI assessment helps prioritize and route issues to the appropriate team members more efficiently.

Name		Name	Last commit message	Last commit date
Latest commit History 252 Commits
.github		.github
chains		chains
cli		cli
clients		clients
config		config
correlators		correlators
filters		filters
models		models
scrapers		scrapers
summarizers		summarizers
tests		tests
utils		utils
.env.example		.env.example
.gitignore		.gitignore
LLM_SETUP.md		LLM_SETUP.md
README.md		README.md
example_summary.md		example_summary.md
main.py		main.py
required_github_fields.json		required_github_fields.json
requirements.txt		requirements.txt
runner.py		runner.py
setup.sh		setup.sh
start_llm.sh		start_llm.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Release Page Summarizer AI

Key Features

Getting Started

Prerequisites

Environment Setup

Usage

1. Scrape Data

2. Correlate Data

3. Generate Summaries

Common Options

Output

Configuration

Issue Management

Issue Labels

About

Uh oh!

Releases

Packages

Languages

kenjpais/web-page-summarizer-ai

Folders and files

Latest commit

History

Repository files navigation

Release Page Summarizer AI

Key Features

Getting Started

Prerequisites

Environment Setup

Usage

1. Scrape Data

2. Correlate Data

3. Generate Summaries

Common Options

Output

Configuration

Issue Management

Issue Labels

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages