diff --git a/README.md b/README.md
index 3ca94e8..e7f33d9 100644
--- a/README.md
+++ b/README.md
@@ -1,8 +1,10 @@
# RateMyProfessors API Client (Python)
+[](https://pypi.org/project/ratemyprofessors-client/) [](https://pypi.org/project/ratemyprofessors-client/) [](https://amaanjaved1.github.io/Rate-My-Professors-API-Client/)
+
A typed, retrying, rate-limited **unofficial** client for [RateMyProfessors](https://www.ratemyprofessors.com).
-> **Disclaimer:** This library is unofficial and may break if RMP changes their internal API. Use responsibly and respect rate limits.
+> **Looking for TypeScript?** Check out the [TypeScript version](https://github.com/amaanjaved1/rate-my-professors-client-ts).
## Requirements
@@ -114,8 +116,8 @@ from rmp_client import (
)
```
-- `normalize_comment(text)` — Normalize text for deduplication (lowercase, collapse whitespace)
-- `is_valid_comment(text, min_len=10)` — Check if a comment is non-empty and meets a minimum length
+- `normalize_comment(text, *, strip_html=True, strip_punctuation=False)` — Normalize text for deduplication (trim, strip HTML, lowercase, collapse whitespace; optionally strip punctuation)
+- `is_valid_comment(text, *, min_len=10)` — Validate a comment and return a `ValidationResult` with diagnostics (empty, too short, all caps, excessive repeats, no alpha)
- `clean_course_label(raw)` — Clean scraped course labels (remove counts, normalize whitespace)
- `build_course_mapping(scraped, valid)` — Map scraped labels to known course codes
- `analyze_sentiment(text)` — Compute sentiment label from text (uses TextBlob)
diff --git a/docs/configuration.html b/docs/configuration.html
new file mode 100644
index 0000000..5fa92b8
--- /dev/null
+++ b/docs/configuration.html
@@ -0,0 +1,76 @@
+
+
+
+
+
+ Configuration — RMP Client
+
+
+
+
+
+
+
+
+
+
+
+ Configuration
+ The client is configured via RMPClientConfig. All fields have sensible defaults.
+ from rmp_client import RMPClientConfig, RMPClient
+
+config = RMPClientConfig(
+ base_url="https://www.ratemyprofessors.com/graphql",
+ timeout_seconds=10.0,
+ max_retries=3,
+ rate_limit_per_minute=60,
+)
+with RMPClient(config) as client:
+ ...
+
+ Available Options #
+
+ Option Type Default Description
+ base_urlstrhttps://...graphqlGraphQL endpoint URL
+ timeout_secondsfloat10.0HTTP request timeout
+ max_retriesint3Number of retry attempts for failed requests
+ rate_limit_per_minuteint60Max requests per minute (token bucket)
+ user_agentstrFirefox UA User-Agent header
+ default_headersMapping[str, str]UA + Accept-Language Default headers for all requests
+
+
+ Rate Limiting #
+ The client uses a token-bucket algorithm. Tokens replenish continuously. Each request consumes one token. If no tokens are available, the request blocks until one becomes available.
+ config = RMPClientConfig(rate_limit_per_minute=30)
+
+ Retries #
+ On 5xx errors or network failures, the client retries up to max_retries times. 4xx errors are not retried. After exhausting retries, a RetryError is raised.
+ config = RMPClientConfig(max_retries=5)
+
+ Timeouts #
+ The timeout_seconds value applies to each individual HTTP request (connect + read).
+ config = RMPClientConfig(timeout_seconds=30.0)
+
+
+
+
+
+
diff --git a/docs/configuration.md b/docs/configuration.md
deleted file mode 100644
index f215d2f..0000000
--- a/docs/configuration.md
+++ /dev/null
@@ -1,52 +0,0 @@
-### Configuration
-
-The client is configured via `RMPClientConfig`. All fields have sensible defaults.
-
-```python
-from rmp_client import RMPClientConfig, RMPClient
-
-config = RMPClientConfig(
- base_url="https://www.ratemyprofessors.com/graphql",
- timeout_seconds=10.0,
- max_retries=3,
- rate_limit_per_minute=60,
-)
-
-with RMPClient(config) as client:
- ...
-```
-
-#### Available options
-
-| Option | Type | Default | Description |
-|--------|------|---------|-------------|
-| `base_url` | `str` | `https://www.ratemyprofessors.com/graphql` | GraphQL endpoint URL |
-| `timeout_seconds` | `float` | `10.0` | HTTP request timeout |
-| `max_retries` | `int` | `3` | Number of retry attempts for failed requests |
-| `rate_limit_per_minute` | `int` | `60` | Max requests per minute (token bucket) |
-| `user_agent` | `str` | Firefox UA | User-Agent header sent with every request |
-| `default_headers` | `Mapping[str, str]` | UA + Accept-Language | Default headers for all requests |
-
-#### Rate limiting
-
-The client uses a token-bucket algorithm. Tokens replenish continuously at `rate_limit_per_minute / 60` tokens per second. Each request consumes one token. If no tokens are available, the request blocks until one becomes available.
-
-```python
-config = RMPClientConfig(rate_limit_per_minute=30) # half the default rate
-```
-
-#### Retries
-
-On 5xx errors or network failures, the client retries up to `max_retries` times. 4xx errors are **not** retried. After exhausting retries, a `RetryError` is raised containing the last underlying exception.
-
-```python
-config = RMPClientConfig(max_retries=5) # more retries for flaky networks
-```
-
-#### Timeouts
-
-The `timeout_seconds` value applies to each individual HTTP request (connect + read).
-
-```python
-config = RMPClientConfig(timeout_seconds=30.0) # generous timeout
-```
diff --git a/docs/extras.html b/docs/extras.html
new file mode 100644
index 0000000..a70f51e
--- /dev/null
+++ b/docs/extras.html
@@ -0,0 +1,101 @@
+
+
+
+
+
+ Extras — RMP Client
+
+
+
+
+
+
+
+
+
+
+
+ Extras
+ Helpers for ingestion pipelines. Import them from rmp_client:
+ from rmp_client import (
+ analyze_sentiment, normalize_comment,
+ is_valid_comment, build_course_mapping,
+ clean_course_label,
+)
+
+ Sentiment #
+ Compute a sentiment score and label from comment text (uses TextBlob internally).
+ result = analyze_sentiment("Great prof, explains concepts clearly.")
+print(result.score, result.label) # e.g. 0.65 "positive"
+
+ Helpers #
+
+
+ normalize_comment(text: str, *, strip_html: bool = True, strip_punctuation: bool = False) -> str
+ Normalizes a comment for comparison or deduplication. Trims whitespace, strips HTML tags (opt-out), lowercases, and collapses runs of whitespace. Optionally strips punctuation for looser matching.
+
+ Parameter Type Default Description
+ textstr— Comment text
+ strip_htmlboolTrueRemove HTML tags
+ strip_punctuationboolFalseRemove all punctuation
+
+ a = normalize_comment(" Great Professor! ")
+b = normalize_comment("great professor!")
+assert a == b # True
+
+normalize_comment("<b>Loved</b> this class") # "loved this class"
+normalize_comment("Hello, world!", strip_punctuation=True) # "hello world"
+
+
+ is_valid_comment(text: str, *, min_len: int = 10) -> ValidationResult
+ Validates a comment and returns detailed diagnostics. Checks for empty text, insufficient length, all-caps, excessive repeated characters, and absence of alphabetic characters.
+
+ Parameter Type Default Description
+ textstr— Comment text
+ min_lenint10Minimum character length
+
+ Returns: ValidationResult with valid (bool) and issues (list of CommentIssue).
+ Each issue has a code ("empty", "too_short", "all_caps", "excessive_repeats", "no_alpha") and a human-readable message.
+ result = is_valid_comment("Good")
+# ValidationResult(valid=False, issues=[CommentIssue(code="too_short", ...)])
+
+result = is_valid_comment("Great class, learned a lot")
+# ValidationResult(valid=True, issues=[])
+
+result = is_valid_comment("WORST PROF EVER!!!")
+# ValidationResult(valid=False, issues=[CommentIssue(code="all_caps", ...)])
+
+ Course Code Helpers #
+ Map scraped RMP course labels to your course catalog.
+ scraped = ["ANAT 215 (12)", "phys115"]
+valid = ["ANAT 215", "PHYS 115"]
+
+mapping = build_course_mapping(scraped, valid)
+# {"ANAT 215 (12)": "ANAT 215", "phys115": "PHYS 115"}
+
+cleaned = clean_course_label("MATH 101 (5)")
+# "MATH 101"
+
+
+
+
+
+
diff --git a/docs/extras.md b/docs/extras.md
deleted file mode 100644
index e09c426..0000000
--- a/docs/extras.md
+++ /dev/null
@@ -1,48 +0,0 @@
-### Helpers for Ingestion Pipelines
-
-These helpers are part of the main package. Import them from `rmp_client`:
-
-```python
-from rmp_client import (
- analyze_sentiment,
- normalize_comment,
- is_valid_comment,
- build_course_mapping,
- clean_course_label,
-)
-```
-
-#### Sentiment
-
-Compute a sentiment score and label from comment text (uses TextBlob internally).
-
-```python
-result = analyze_sentiment("Great prof, explains concepts clearly.")
-print(result.score, result.label) # e.g. 0.65 "positive"
-```
-
-#### Dedupe helpers
-
-Normalize comments for deduplication and filter out low-quality entries.
-
-```python
-raw = " This prof is AMAZING!!! "
-normalized = normalize_comment(raw) # "this prof is amazing!!!"
-if is_valid_comment(normalized, min_len=10):
- print("Valid comment")
-```
-
-#### Course code helpers
-
-Map scraped RMP course labels to your course catalog.
-
-```python
-scraped = ["ANAT 215 (12)", "phys115"]
-valid = ["ANAT 215", "PHYS 115"]
-
-mapping = build_course_mapping(scraped, valid)
-# {"ANAT 215 (12)": "ANAT 215", "phys115": "PHYS 115"}
-
-cleaned = clean_course_label("MATH 101 (5)")
-# "MATH 101"
-```
diff --git a/docs/index.html b/docs/index.html
new file mode 100644
index 0000000..dadebd4
--- /dev/null
+++ b/docs/index.html
@@ -0,0 +1,92 @@
+
+
+
+
+
+ RateMyProfessors API Client (Python)
+
+
+
+
+
+
+
+
+
+
+
+ RateMyProfessors API Client
+
+
+
+
+
+ An unofficial, typed Python client for RateMyProfessors .
+ All data is fetched via RMP's GraphQL API — no HTML scraping or browser automation required.
+
+
+
+ Disclaimer: This library is unofficial and may break if RMP changes their internal API. Use responsibly and respect rate limits.
+
+
+ Features #
+
+ Strong typing via Pydantic models
+ Automatic retries with configurable max attempts
+ Token-bucket rate limiting (default 60 req/min)
+ In-memory caching for ratings pages
+ Cursor-based pagination for all list/search endpoints
+ Clear error hierarchy for precise exception handling
+ Built-in helpers for ingestion workflows (sentiment, comment validation, course codes)
+
+
+ Requirements #
+
+ Python 3.10 or later
+ Works with type checkers (Pydantic models, fully typed API)
+
+
+ Installation #
+ pip install ratemyprofessors-client
+
+ Quick Start #
+ from rmp_client import RMPClient
+
+with RMPClient() as client:
+ prof = client.get_professor("2823076")
+ print(prof.name, prof.overall_rating)
+
+ for rating in client.iter_professor_ratings(prof.id):
+ print(rating.date, rating.quality, rating.comment)
+
+ Documentation #
+
+ Usage — Quickstart examples for every endpoint
+ Configuration — Tuning retries, rate limits, timeouts, and headers
+ API Reference — Full method and type reference
+ Extras — Ingestion helpers (sentiment, comment validation, course mapping)
+
+
+
+
+
+
+
diff --git a/docs/index.md b/docs/index.md
deleted file mode 100644
index faae2f6..0000000
--- a/docs/index.md
+++ /dev/null
@@ -1,35 +0,0 @@
-### RateMyProfessors API Client
-
-An unofficial, typed Python client for [RateMyProfessors](https://www.ratemyprofessors.com).
-
-All data is fetched via RMP's GraphQL API -- no HTML scraping or browser automation required.
-
-**Features:**
-
-- Strong typing via Pydantic models
-- Automatic retries with configurable max attempts
-- Token-bucket rate limiting (default 60 req/min)
-- In-memory caching for ratings pages (pre-fetches all ratings on first load)
-- Cursor-based pagination for all list/search endpoints
-- Clear error hierarchy for precise exception handling
-- Built-in helpers for ingestion workflows (sentiment, dedupe, course codes)
-
-**Quick start:**
-
-```python
-from rmp_client import RMPClient
-
-with RMPClient() as client:
- prof = client.get_professor("2823076")
- print(prof.name, prof.overall_rating)
-
- for rating in client.iter_professor_ratings(prof.id):
- print(rating.date, rating.quality, rating.comment)
-```
-
-**Documentation:**
-
-- [Usage](usage.md) — Quickstart examples for every endpoint
-- [Configuration](configuration.md) — Tuning retries, rate limits, timeouts, and headers
-- [API Reference](reference.md) — Full method and type reference
-- [Extras](extras.md) — Ingestion helpers (sentiment, dedupe, course mapping)
diff --git a/docs/reference.html b/docs/reference.html
new file mode 100644
index 0000000..0a675f0
--- /dev/null
+++ b/docs/reference.html
@@ -0,0 +1,124 @@
+
+
+
+
+
+ API Reference — RMP Client
+
+
+
+
+
+
+
+
+
+
+
+ API Reference
+ The main entry point is RMPClient. Use as a context manager or call close() when done.
+ from rmp_client import RMPClient, RMPClientConfig
+
+with RMPClient(config=RMPClientConfig()) as client:
+ ...
+
+ School Methods #
+
+ Method Returns Description
+ search_schools(query, *, page_size=20, cursor=None)SchoolSearchResultSearch schools by name
+ get_school(school_id)SchoolFetch a single school with category ratings
+ get_compare_schools(school_id_1, school_id_2)CompareSchoolsResultFetch two schools side by side
+ get_school_ratings_page(school_id, *, cursor=None, page_size=20)SchoolRatingsPageGet one page of school ratings (cached)
+ iter_school_ratings(school_id, *, page_size=20, since=None)Iterator[SchoolRating]Iterate all school ratings
+
+
+ Professor Methods #
+
+ Method Returns Description
+ search_professors(query, *, school_id=None, page_size=20, cursor=None)ProfessorSearchResultSearch professors by name
+ list_professors_for_school(school_id, *, query=None, page_size=20, cursor=None)ProfessorSearchResultList professors at a school
+ iter_professors_for_school(school_id, *, query=None, page_size=20)Iterator[Professor]Iterate all professors at a school
+ get_professor(professor_id)ProfessorFetch a single professor
+ get_professor_ratings_page(professor_id, *, cursor=None, page_size=20, course_filter=None)ProfessorRatingsPageGet one page of professor ratings (cached)
+ iter_professor_ratings(professor_id, *, page_size=20, since=None, course_filter=None)Iterator[Rating]Iterate all professor ratings
+
+
+ Low-level #
+
+ Method Returns Description
+ raw_query(payload)dictSend a raw GraphQL payload
+ close()NoneClose the HTTP client and clear caches
+
+
+
+
+ Models #
+ All models are Pydantic BaseModel subclasses.
+
+ School
+ id, name, location, overall_quality, num_ratings, reputation, safety, happiness, facilities, social, location_rating, clubs, opportunities, internet, food
+
+ Professor
+ id, name, department, school (School), url, overall_rating, num_ratings, percent_take_again, level_of_difficulty, tags, rating_distribution
+
+ Rating
+ date, comment, quality, difficulty, tags, course_raw, details, thumbs_up, thumbs_down
+
+ SchoolRating
+ date, comment, overall, category_ratings (dict), thumbs_up, thumbs_down
+
+ ProfessorSearchResult / SchoolSearchResult
+ professors/schools, total, page_size, has_next_page, next_cursor
+
+ ProfessorRatingsPage / SchoolRatingsPage
+ professor/school, ratings, has_next_page, next_cursor
+
+ CompareSchoolsResult
+ school_1, school_2
+
+
+
+ Errors #
+ All errors extend RMPError.
+
+ Error Description
+ HttpErrorNon-2xx HTTP response. Has status_code, url, body.
+ ParsingErrorCould not parse the GraphQL response.
+ RateLimitErrorLocal rate limiter blocked the request.
+ RetryErrorAll retry attempts exhausted.
+ RMPAPIErrorGraphQL API returned an errors array.
+ ConfigurationErrorInvalid client configuration.
+
+ from rmp_client import RMPClient, HttpError, ParsingError
+
+with RMPClient() as client:
+ try:
+ prof = client.get_professor("999999")
+ except ParsingError:
+ print("Professor not found")
+ except HttpError as e:
+ print(f"HTTP {e.status_code}")
+
+
+
+
+
+
diff --git a/docs/reference.md b/docs/reference.md
deleted file mode 100644
index bbe5356..0000000
--- a/docs/reference.md
+++ /dev/null
@@ -1,91 +0,0 @@
-### API Reference
-
-#### RMPClient
-
-The main entry point. Use as a context manager or call `close()` when done.
-
-```python
-from rmp_client import RMPClient, RMPClientConfig
-
-with RMPClient(config=RMPClientConfig()) as client:
- ...
-```
-
-**School methods:**
-
-| Method | Returns | Description |
-|--------|---------|-------------|
-| `search_schools(query, *, page_size=20, cursor=None)` | `SchoolSearchResult` | Search schools by name |
-| `get_school(school_id)` | `School` | Fetch a single school with category ratings |
-| `get_compare_schools(school_id_1, school_id_2)` | `CompareSchoolsResult` | Fetch two schools side by side |
-| `get_school_ratings_page(school_id, *, cursor=None, page_size=20)` | `SchoolRatingsPage` | Get one page of school ratings (cached) |
-| `iter_school_ratings(school_id, *, page_size=20, since=None)` | `Iterator[SchoolRating]` | Iterate all school ratings |
-
-**Professor methods:**
-
-| Method | Returns | Description |
-|--------|---------|-------------|
-| `search_professors(query, *, school_id=None, page_size=20, cursor=None)` | `ProfessorSearchResult` | Search professors by name |
-| `list_professors_for_school(school_id, *, query=None, page_size=20, cursor=None)` | `ProfessorSearchResult` | List professors at a school |
-| `iter_professors_for_school(school_id, *, query=None, page_size=20)` | `Iterator[Professor]` | Iterate all professors at a school |
-| `get_professor(professor_id)` | `Professor` | Fetch a single professor |
-| `get_professor_ratings_page(professor_id, *, cursor=None, page_size=20, course_filter=None)` | `ProfessorRatingsPage` | Get one page of professor ratings (cached) |
-| `iter_professor_ratings(professor_id, *, page_size=20, since=None, course_filter=None)` | `Iterator[Rating]` | Iterate all professor ratings |
-
-**Low-level:**
-
-| Method | Returns | Description |
-|--------|---------|-------------|
-| `raw_query(payload)` | `dict` | Send a raw GraphQL payload |
-| `close()` | `None` | Close the HTTP client and clear caches |
-
----
-
-#### Models
-
-All models are Pydantic `BaseModel` subclasses.
-
-**`School`** — `id`, `name`, `location`, `overall_quality`, `num_ratings`, `reputation`, `safety`, `happiness`, `facilities`, `social`, `location_rating`, `clubs`, `opportunities`, `internet`, `food`
-
-**`Professor`** — `id`, `name`, `department`, `school` (School), `url`, `overall_rating`, `num_ratings`, `percent_take_again`, `level_of_difficulty`, `tags`, `rating_distribution`
-
-**`Rating`** — `date`, `comment`, `quality`, `difficulty`, `tags`, `course_raw`, `details`, `thumbs_up`, `thumbs_down`
-
-**`SchoolRating`** — `date`, `comment`, `overall`, `category_ratings` (dict), `thumbs_up`, `thumbs_down`
-
-**`ProfessorSearchResult`** — `professors`, `total`, `page_size`, `has_next_page`, `next_cursor`
-
-**`SchoolSearchResult`** — `schools`, `total`, `page_size`, `has_next_page`, `next_cursor`
-
-**`ProfessorRatingsPage`** — `professor`, `ratings`, `has_next_page`, `next_cursor`
-
-**`SchoolRatingsPage`** — `school`, `ratings`, `has_next_page`, `next_cursor`
-
-**`CompareSchoolsResult`** — `school_1`, `school_2`
-
----
-
-#### Errors
-
-All errors extend `RMPError`.
-
-| Error | Description |
-|-------|-------------|
-| `HttpError` | Non-2xx HTTP response. Has `status_code`, `url`, `body`. |
-| `ParsingError` | Could not parse the GraphQL response (e.g. entity not found). |
-| `RateLimitError` | Local rate limiter blocked the request. |
-| `RetryError` | All retry attempts exhausted. Wraps the last exception. |
-| `RMPAPIError` | GraphQL API returned an `errors` array. Has `details`. |
-| `ConfigurationError` | Invalid client configuration. |
-
-```python
-from rmp_client import RMPClient, HttpError, ParsingError
-
-with RMPClient() as client:
- try:
- prof = client.get_professor("999999")
- except ParsingError:
- print("Professor not found")
- except HttpError as e:
- print(f"HTTP {e.status_code}")
-```
diff --git a/docs/script.js b/docs/script.js
new file mode 100644
index 0000000..738a89a
--- /dev/null
+++ b/docs/script.js
@@ -0,0 +1,78 @@
+(function () {
+ var MOON = ' ';
+ var SUN = ' ';
+
+ function isDark() {
+ return document.documentElement.classList.contains('dark');
+ }
+
+ function updateIcon() {
+ var btn = document.getElementById('theme-toggle');
+ if (btn) btn.innerHTML = isDark() ? SUN : MOON;
+ }
+
+ updateIcon();
+
+ document.addEventListener('DOMContentLoaded', function () {
+ updateIcon();
+
+ var toggle = document.getElementById('theme-toggle');
+ if (toggle) {
+ toggle.addEventListener('click', function () {
+ var dark = !isDark();
+ document.documentElement.classList.toggle('dark', dark);
+ localStorage.setItem('rmp-docs-theme', dark ? 'dark' : 'light');
+ updateIcon();
+ });
+ }
+
+ buildTOC();
+ if (typeof hljs !== 'undefined') hljs.highlightAll();
+ });
+
+ function buildTOC() {
+ var list = document.getElementById('toc-list');
+ var tocEl = document.querySelector('.toc');
+ if (!list || !tocEl) return;
+
+ var headings = document.querySelectorAll('main h2[id]');
+ if (headings.length === 0) {
+ tocEl.style.display = 'none';
+ document.querySelector('.content').style.marginRight = '0';
+ return;
+ }
+
+ headings.forEach(function (h) {
+ var li = document.createElement('li');
+ var a = document.createElement('a');
+ a.href = '#' + h.id;
+ var text = '';
+ for (var i = 0; i < h.childNodes.length; i++) {
+ var n = h.childNodes[i];
+ if (n.nodeType === 3) text += n.textContent;
+ else if (!n.classList || !n.classList.contains('anchor')) text += n.textContent;
+ }
+ a.textContent = text.trim();
+ a.addEventListener('click', function (e) {
+ e.preventDefault();
+ h.scrollIntoView({ behavior: 'smooth', block: 'start' });
+ history.replaceState(null, '', '#' + h.id);
+ });
+ li.appendChild(a);
+ list.appendChild(li);
+ });
+
+ var tocLinks = list.querySelectorAll('a');
+ var observer = new IntersectionObserver(function (entries) {
+ entries.forEach(function (entry) {
+ if (entry.isIntersecting) {
+ tocLinks.forEach(function (a) { a.classList.remove('active'); });
+ var link = list.querySelector('a[href="#' + entry.target.id + '"]');
+ if (link) link.classList.add('active');
+ }
+ });
+ }, { rootMargin: '-60px 0px -75% 0px' });
+
+ headings.forEach(function (h) { observer.observe(h); });
+ }
+})();
diff --git a/docs/style.css b/docs/style.css
new file mode 100644
index 0000000..9763133
--- /dev/null
+++ b/docs/style.css
@@ -0,0 +1,581 @@
+@import url('https://fonts.googleapis.com/css2?family=Inter:wght@400;500;600;700&display=swap');
+
+*,
+*::before,
+*::after {
+ box-sizing: border-box;
+ margin: 0;
+ padding: 0;
+}
+
+:root {
+ --primary: #10b981;
+ --primary-hover: #059669;
+ --primary-subtle: #ecfdf5;
+ --bg: #ffffff;
+ --bg-surface: #f8fafc;
+ --bg-code: #1e1e2e;
+ --text: #1e293b;
+ --text-secondary: #64748b;
+ --text-code: #d4d4d8;
+ --border: #e2e8f0;
+ --border-subtle: #f1f5f9;
+ --shadow-sm: 0 1px 2px rgba(0, 0, 0, 0.05);
+ --shadow-md: 0 4px 6px -1px rgba(0, 0, 0, 0.07), 0 2px 4px -2px rgba(0, 0, 0, 0.05);
+ --shadow-lg: 0 10px 25px -5px rgba(0, 0, 0, 0.1), 0 8px 10px -6px rgba(0, 0, 0, 0.1);
+ --font-sans: 'Inter', -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, sans-serif;
+ --font-mono: 'SF Mono', 'Cascadia Code', 'Fira Code', Consolas, monospace;
+ --topbar-height: 52px;
+ --sidebar-width: 240px;
+ --toc-width: 200px;
+ --radius: 8px;
+ --radius-sm: 6px;
+}
+
+html.dark {
+ --primary: #34d399;
+ --primary-hover: #6ee7b7;
+ --primary-subtle: rgba(16, 185, 129, 0.1);
+ --bg: #0a0a0a;
+ --bg-surface: #141414;
+ --bg-code: #161616;
+ --text: #e5e5e5;
+ --text-secondary: #a0a0a0;
+ --text-code: #d4d4d8;
+ --border: #262626;
+ --border-subtle: #1a1a1a;
+ --shadow-sm: 0 1px 2px rgba(0, 0, 0, 0.4);
+ --shadow-md: 0 4px 6px -1px rgba(0, 0, 0, 0.5);
+ --shadow-lg: 0 10px 25px -5px rgba(0, 0, 0, 0.6);
+}
+
+html {
+ font-size: 16px;
+ scroll-behavior: smooth;
+ -webkit-font-smoothing: antialiased;
+ -moz-osx-font-smoothing: grayscale;
+}
+
+body {
+ font-family: var(--font-sans);
+ color: var(--text);
+ background: var(--bg);
+ line-height: 1.7;
+}
+
+a {
+ color: var(--primary);
+ text-decoration: none;
+ transition: color 0.15s ease;
+}
+
+a:hover {
+ color: var(--primary-hover);
+}
+
+::selection {
+ background: var(--primary);
+ color: #fff;
+}
+
+/* ------------------------------------------------------------------ */
+/* Top Bar */
+/* ------------------------------------------------------------------ */
+
+.topbar {
+ position: fixed;
+ top: 0;
+ left: 0;
+ right: 0;
+ height: var(--topbar-height);
+ background: var(--bg);
+ border-bottom: 1px solid var(--border);
+ display: flex;
+ align-items: center;
+ justify-content: flex-end;
+ padding: 0 1.25rem;
+ z-index: 100;
+}
+
+.topbar-right {
+ display: flex;
+ align-items: center;
+ gap: 6px;
+}
+
+.topbar-btn {
+ display: inline-flex;
+ align-items: center;
+ justify-content: center;
+ gap: 6px;
+ padding: 6px 10px;
+ border-radius: var(--radius-sm);
+ border: 1px solid var(--border);
+ background: var(--bg-surface);
+ color: var(--text-secondary);
+ cursor: pointer;
+ font-family: var(--font-sans);
+ font-size: 0.8rem;
+ transition: all 0.15s ease;
+ text-decoration: none;
+ line-height: 1;
+}
+
+.topbar-btn:hover {
+ color: var(--text);
+ border-color: var(--primary);
+ text-decoration: none;
+}
+
+.topbar-btn svg {
+ flex-shrink: 0;
+}
+
+/* ------------------------------------------------------------------ */
+/* Sidebar */
+/* ------------------------------------------------------------------ */
+
+.sidebar {
+ width: var(--sidebar-width);
+ position: fixed;
+ top: var(--topbar-height);
+ left: 0;
+ bottom: 0;
+ background: var(--bg-surface);
+ border-right: 1px solid var(--border);
+ padding: 1.5rem 1rem;
+ overflow-y: auto;
+ display: flex;
+ flex-direction: column;
+ gap: 1.5rem;
+ z-index: 50;
+}
+
+.sidebar-brand {
+ font-size: 1.15rem;
+ font-weight: 700;
+ color: var(--text);
+ letter-spacing: -0.02em;
+ padding: 0 0.5rem;
+}
+
+.sidebar nav ul {
+ list-style: none;
+ display: flex;
+ flex-direction: column;
+ gap: 2px;
+}
+
+.sidebar nav a {
+ display: block;
+ padding: 0.4rem 0.75rem;
+ border-radius: var(--radius-sm);
+ font-size: 0.875rem;
+ font-weight: 500;
+ color: var(--text-secondary);
+ transition: all 0.15s ease;
+}
+
+.sidebar nav a:hover {
+ color: var(--text);
+ background: var(--border-subtle);
+ text-decoration: none;
+}
+
+.sidebar nav a.active {
+ color: var(--primary);
+ background: var(--primary-subtle);
+ font-weight: 600;
+}
+
+/* ------------------------------------------------------------------ */
+/* Table of Contents (right sidebar) */
+/* ------------------------------------------------------------------ */
+
+.toc {
+ width: var(--toc-width);
+ position: fixed;
+ top: var(--topbar-height);
+ right: 0;
+ bottom: 0;
+ padding: 1.5rem 1rem 1.5rem 0;
+ overflow-y: auto;
+ z-index: 50;
+}
+
+.toc-title {
+ font-size: 0.7rem;
+ font-weight: 600;
+ text-transform: uppercase;
+ letter-spacing: 0.06em;
+ color: var(--text-secondary);
+ padding: 0 0 0.6rem 0.75rem;
+}
+
+.toc ul {
+ list-style: none;
+ border-left: 1px solid var(--border);
+}
+
+.toc a {
+ display: block;
+ padding: 0.25rem 0.75rem;
+ font-size: 0.78rem;
+ color: var(--text-secondary);
+ border-left: 2px solid transparent;
+ margin-left: -1px;
+ transition: all 0.15s ease;
+ text-decoration: none;
+}
+
+.toc a:hover {
+ color: var(--text);
+}
+
+.toc a.active {
+ color: var(--primary);
+ border-left-color: var(--primary);
+}
+
+/* ------------------------------------------------------------------ */
+/* Main content */
+/* ------------------------------------------------------------------ */
+
+.content {
+ margin-left: var(--sidebar-width);
+ margin-right: var(--toc-width);
+ max-width: 54rem;
+ padding: calc(var(--topbar-height) + 2.5rem) 3rem 4rem;
+}
+
+/* ------------------------------------------------------------------ */
+/* Typography */
+/* ------------------------------------------------------------------ */
+
+h1 {
+ font-size: 2rem;
+ font-weight: 700;
+ letter-spacing: -0.025em;
+ margin-bottom: 0.5rem;
+ line-height: 1.3;
+}
+
+h1 + p {
+ font-size: 1.05rem;
+ color: var(--text-secondary);
+ margin-bottom: 2rem;
+}
+
+h2 {
+ font-size: 1.35rem;
+ font-weight: 600;
+ margin-top: 3rem;
+ margin-bottom: 0.75rem;
+ letter-spacing: -0.015em;
+ display: flex;
+ align-items: center;
+ gap: 0.5rem;
+ scroll-margin-top: calc(var(--topbar-height) + 1.5rem);
+}
+
+h2 .anchor {
+ color: var(--border);
+ font-weight: 400;
+ font-size: 1rem;
+ text-decoration: none;
+ opacity: 0;
+ transition: opacity 0.15s ease;
+}
+
+h2:hover .anchor {
+ opacity: 1;
+ color: var(--primary);
+}
+
+h3 {
+ font-size: 1.1rem;
+ font-weight: 600;
+ margin-top: 2rem;
+ margin-bottom: 0.5rem;
+ letter-spacing: -0.01em;
+ scroll-margin-top: calc(var(--topbar-height) + 1.5rem);
+}
+
+p {
+ margin-bottom: 0.85rem;
+}
+
+ul,
+ol {
+ margin-left: 1.5rem;
+ margin-bottom: 0.85rem;
+}
+
+li {
+ margin-bottom: 0.35rem;
+}
+
+strong {
+ font-weight: 600;
+}
+
+hr {
+ border: none;
+ border-top: 1px solid var(--border);
+ margin: 2.5rem 0;
+}
+
+/* ------------------------------------------------------------------ */
+/* Inline code */
+/* ------------------------------------------------------------------ */
+
+code {
+ font-family: var(--font-mono);
+ font-size: 0.85em;
+ background: var(--bg-surface);
+ border: 1px solid var(--border);
+ padding: 0.15em 0.4em;
+ border-radius: 4px;
+ font-weight: 500;
+}
+
+/* ------------------------------------------------------------------ */
+/* Code blocks */
+/* ------------------------------------------------------------------ */
+
+pre {
+ background: var(--bg-code);
+ border: 1px solid var(--border);
+ border-radius: var(--radius);
+ padding: 1.25rem 1.5rem;
+ overflow-x: auto;
+ margin-bottom: 1.25rem;
+ box-shadow: var(--shadow-md);
+ line-height: 1.6;
+}
+
+html.dark pre {
+ border-color: #2a2a2a;
+}
+
+pre code {
+ background: none;
+ border: none;
+ padding: 0;
+ font-size: 0.84rem;
+ color: var(--text-code);
+ font-weight: 400;
+}
+
+/* Styled scrollbar for code blocks */
+pre::-webkit-scrollbar {
+ height: 6px;
+}
+
+pre::-webkit-scrollbar-track {
+ background: transparent;
+}
+
+pre::-webkit-scrollbar-thumb {
+ background: #4a4a5a;
+ border-radius: 3px;
+}
+
+pre::-webkit-scrollbar-thumb:hover {
+ background: #5a5a6a;
+}
+
+.method-sig::-webkit-scrollbar {
+ height: 6px;
+}
+
+.method-sig::-webkit-scrollbar-track {
+ background: transparent;
+}
+
+.method-sig::-webkit-scrollbar-thumb {
+ background: #4a4a5a;
+ border-radius: 3px;
+}
+
+/* Firefox scrollbar */
+pre,
+.method-sig {
+ scrollbar-width: thin;
+ scrollbar-color: #4a4a5a transparent;
+}
+
+/* ------------------------------------------------------------------ */
+/* Highlight.js overrides */
+/* ------------------------------------------------------------------ */
+
+pre code.hljs {
+ background: transparent !important;
+ padding: 0 !important;
+}
+
+.hljs {
+ background: transparent !important;
+}
+
+/* ------------------------------------------------------------------ */
+/* Tables */
+/* ------------------------------------------------------------------ */
+
+table {
+ width: 100%;
+ border-collapse: collapse;
+ margin-bottom: 1.25rem;
+ font-size: 0.875rem;
+ border-radius: var(--radius);
+ overflow: hidden;
+ box-shadow: var(--shadow-sm);
+ border: 1px solid var(--border);
+}
+
+th,
+td {
+ padding: 0.65rem 1rem;
+ text-align: left;
+}
+
+th {
+ background: var(--bg-surface);
+ font-weight: 600;
+ color: var(--text);
+ border-bottom: 2px solid var(--border);
+ font-size: 0.8rem;
+ text-transform: uppercase;
+ letter-spacing: 0.04em;
+}
+
+td {
+ border-bottom: 1px solid var(--border-subtle);
+}
+
+tr:last-child td {
+ border-bottom: none;
+}
+
+tr:hover td {
+ background: var(--bg-surface);
+}
+
+/* ------------------------------------------------------------------ */
+/* Callout */
+/* ------------------------------------------------------------------ */
+
+.callout {
+ border-left: 3px solid var(--primary);
+ background: var(--primary-subtle);
+ padding: 0.85rem 1.15rem;
+ border-radius: 0 var(--radius-sm) var(--radius-sm) 0;
+ margin-bottom: 1.25rem;
+ font-size: 0.9rem;
+ line-height: 1.6;
+}
+
+/* ------------------------------------------------------------------ */
+/* Method signatures */
+/* ------------------------------------------------------------------ */
+
+.method-sig {
+ background: var(--bg-code);
+ color: var(--text-code);
+ padding: 0.7rem 1.1rem;
+ border: 1px solid var(--border);
+ border-radius: var(--radius-sm);
+ font-family: var(--font-mono);
+ font-size: 0.84rem;
+ margin-bottom: 0.75rem;
+ overflow-x: auto;
+ box-shadow: var(--shadow-sm);
+}
+
+html.dark .method-sig {
+ border-color: #2a2a2a;
+}
+
+/* ------------------------------------------------------------------ */
+/* Badges */
+/* ------------------------------------------------------------------ */
+
+.badges {
+ display: flex;
+ gap: 6px;
+ flex-wrap: wrap;
+ margin-bottom: 1.25rem;
+}
+
+.badges img {
+ height: 22px;
+}
+
+/* ------------------------------------------------------------------ */
+/* Responsive */
+/* ------------------------------------------------------------------ */
+
+@media (max-width: 1100px) {
+ .toc {
+ display: none;
+ }
+
+ .content {
+ margin-right: 0;
+ }
+}
+
+@media (max-width: 860px) {
+ .topbar {
+ padding-left: 1.25rem;
+ }
+
+ .sidebar {
+ position: static;
+ width: 100%;
+ border-right: none;
+ border-bottom: 1px solid var(--border);
+ padding: 1rem;
+ gap: 0.75rem;
+ }
+
+ .sidebar nav ul {
+ flex-direction: row;
+ flex-wrap: wrap;
+ gap: 4px;
+ }
+
+ .content {
+ margin-left: 0;
+ padding: calc(var(--topbar-height) + 1.5rem) 1.25rem 3rem;
+ }
+}
+
+@media (max-width: 480px) {
+ h1 {
+ font-size: 1.6rem;
+ }
+
+ h2 {
+ font-size: 1.2rem;
+ }
+
+ .content {
+ padding-left: 1rem;
+ padding-right: 1rem;
+ }
+
+ pre {
+ padding: 1rem;
+ border-radius: var(--radius-sm);
+ }
+
+ table {
+ font-size: 0.8rem;
+ }
+
+ th,
+ td {
+ padding: 0.5rem 0.6rem;
+ }
+}
diff --git a/docs/usage.html b/docs/usage.html
new file mode 100644
index 0000000..d6e5dfb
--- /dev/null
+++ b/docs/usage.html
@@ -0,0 +1,125 @@
+
+
+
+
+
+ Usage — RMP Client
+
+
+
+
+
+
+
+
+
+
+
+ Usage
+ All examples use the RMPClient context manager, which handles connection setup and teardown.
+
+ Search Schools #
+ from rmp_client import RMPClient
+
+with RMPClient() as client:
+ result = client.search_schools("queens")
+ for school in result.schools:
+ print(school.name, school.location, school.overall_quality)
+
+ if result.has_next_page:
+ page2 = client.search_schools("queens", cursor=result.next_cursor)
+
+ Get a School by ID #
+ with RMPClient() as client:
+ school = client.get_school("1466")
+ print(school.name, school.location, school.overall_quality)
+ print(f"Reputation: {school.reputation}, Safety: {school.safety}")
+
+ Compare Two Schools #
+ with RMPClient() as client:
+ result = client.get_compare_schools("1466", "1491")
+ print(result.school_1.name, "vs", result.school_2.name)
+
+ Search Professors #
+ with RMPClient() as client:
+ result = client.search_professors("Smith")
+ for prof in result.professors:
+ print(prof.name, prof.overall_rating, prof.school.name if prof.school else "")
+
+ result = client.search_professors("Smith", school_id="1530")
+
+ List Professors at a School #
+ with RMPClient() as client:
+ result = client.list_professors_for_school(1466, page_size=20)
+ for prof in result.professors:
+ print(prof.name, prof.department)
+
+ Iterate All Professors at a School #
+ with RMPClient() as client:
+ for prof in client.iter_professors_for_school(1466, page_size=50):
+ print(prof.name, prof.num_ratings)
+
+ Get a Professor by ID #
+ with RMPClient() as client:
+ prof = client.get_professor("2823076")
+ print(prof.name, prof.department, prof.overall_rating)
+ print(f"Difficulty: {prof.level_of_difficulty}")
+ print(f"Would take again: {prof.percent_take_again}%")
+
+ Professor Ratings (Paginated, Cached) #
+ with RMPClient() as client:
+ page = client.get_professor_ratings_page("2823076", page_size=10)
+ print(f"Professor: {page.professor.name}")
+ for rating in page.ratings:
+ print(rating.date, rating.quality, rating.comment[:50])
+
+ if page.has_next_page:
+ page2 = client.get_professor_ratings_page("2823076", cursor=page.next_cursor)
+
+ Iterate All Professor Ratings #
+ from datetime import date
+from rmp_client import RMPClient
+
+with RMPClient() as client:
+ for rating in client.iter_professor_ratings("2823076", since=date(2024, 1, 1)):
+ print(rating.date, rating.quality, rating.comment)
+
+ School Ratings (Paginated, Cached) #
+ with RMPClient() as client:
+ page = client.get_school_ratings_page("1466", page_size=10)
+ for rating in page.ratings:
+ print(rating.date, rating.overall, rating.category_ratings)
+
+ Iterate All School Ratings #
+ with RMPClient() as client:
+ for rating in client.iter_school_ratings("1466"):
+ print(rating.date, rating.overall, rating.comment[:50])
+
+ Raw GraphQL Query #
+ with RMPClient() as client:
+ data = client.raw_query({"query": "query { viewer { id } }", "variables": {}})
+ print(data)
+
+
+
+
+
+
diff --git a/docs/usage.md b/docs/usage.md
deleted file mode 100644
index cfecfa7..0000000
--- a/docs/usage.md
+++ /dev/null
@@ -1,127 +0,0 @@
-### Usage
-
-All examples use the `RMPClient` context manager, which handles connection setup and teardown.
-
-#### Search schools
-
-```python
-from rmp_client import RMPClient
-
-with RMPClient() as client:
- result = client.search_schools("queens")
- for school in result.schools:
- print(school.name, school.location, school.overall_quality)
-
- # Cursor pagination
- if result.has_next_page:
- page2 = client.search_schools("queens", cursor=result.next_cursor)
-```
-
-#### Get a school by ID
-
-```python
-with RMPClient() as client:
- school = client.get_school("1466")
- print(school.name, school.location, school.overall_quality)
- print(f"Reputation: {school.reputation}, Safety: {school.safety}")
-```
-
-#### Compare two schools
-
-```python
-with RMPClient() as client:
- result = client.get_compare_schools("1466", "1491")
- print(result.school_1.name, "vs", result.school_2.name)
-```
-
-#### Search professors
-
-```python
-with RMPClient() as client:
- result = client.search_professors("Smith")
- for prof in result.professors:
- print(prof.name, prof.overall_rating, prof.school.name if prof.school else "")
-
- # Filter by school
- result = client.search_professors("Smith", school_id="1530")
-```
-
-#### List professors at a school
-
-```python
-with RMPClient() as client:
- result = client.list_professors_for_school(1466, page_size=20)
- for prof in result.professors:
- print(prof.name, prof.department)
-```
-
-#### Iterate all professors at a school
-
-```python
-with RMPClient() as client:
- for prof in client.iter_professors_for_school(1466, page_size=50):
- print(prof.name, prof.num_ratings)
-```
-
-#### Get a professor by ID
-
-```python
-with RMPClient() as client:
- prof = client.get_professor("2823076")
- print(prof.name, prof.department, prof.overall_rating)
- print(f"Difficulty: {prof.level_of_difficulty}")
- print(f"Would take again: {prof.percent_take_again}%")
-```
-
-#### Fetch professor ratings (paginated, cached)
-
-```python
-with RMPClient() as client:
- page = client.get_professor_ratings_page("2823076", page_size=10)
- print(f"Professor: {page.professor.name}")
- for rating in page.ratings:
- print(rating.date, rating.quality, rating.comment[:50])
-
- # Load more (served from cache, no extra network request)
- if page.has_next_page:
- page2 = client.get_professor_ratings_page("2823076", cursor=page.next_cursor)
-```
-
-#### Iterate all professor ratings
-
-```python
-from datetime import date
-from rmp_client import RMPClient
-
-with RMPClient() as client:
- for rating in client.iter_professor_ratings("2823076", since=date(2024, 1, 1)):
- print(rating.date, rating.quality, rating.comment)
-```
-
-#### Fetch school ratings (paginated, cached)
-
-```python
-with RMPClient() as client:
- page = client.get_school_ratings_page("1466", page_size=10)
- for rating in page.ratings:
- print(rating.date, rating.overall, rating.category_ratings)
-```
-
-#### Iterate all school ratings
-
-```python
-with RMPClient() as client:
- for rating in client.iter_school_ratings("1466"):
- print(rating.date, rating.overall, rating.comment[:50])
-```
-
-#### Send a raw GraphQL query
-
-```python
-with RMPClient() as client:
- data = client.raw_query({
- "query": "query { viewer { id } }",
- "variables": {},
- })
- print(data)
-```
diff --git a/pyproject.toml b/pyproject.toml
index 0bc5526..1f6af5b 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
[project]
name = "ratemyprofessors-client"
-version = "2.0.0"
+version = "2.1.0"
description = "Typed, retrying, rate-limited unofficial Python client for the RateMyProfessors GraphQL API."
readme = "README.md"
requires-python = ">=3.10"
diff --git a/src/rmp_client/__init__.py b/src/rmp_client/__init__.py
index 3174647..d1e638d 100644
--- a/src/rmp_client/__init__.py
+++ b/src/rmp_client/__init__.py
@@ -15,6 +15,8 @@
from .extras import (
SentimentResult,
analyze_sentiment,
+ CommentIssue,
+ ValidationResult,
is_valid_comment,
normalize_comment,
build_course_mapping,
@@ -34,6 +36,8 @@
"TokenBucket",
"SentimentResult",
"analyze_sentiment",
+ "CommentIssue",
+ "ValidationResult",
"is_valid_comment",
"normalize_comment",
"build_course_mapping",
diff --git a/src/rmp_client/extras/__init__.py b/src/rmp_client/extras/__init__.py
index 3abf6b9..0f89db8 100644
--- a/src/rmp_client/extras/__init__.py
+++ b/src/rmp_client/extras/__init__.py
@@ -1,13 +1,15 @@
-# Ingestion helpers: sentiment, dedupe, course_codes.
+# Ingestion helpers: sentiment, helpers, course_codes.
# Re-exported from rmp_client so you can: from rmp_client import analyze_sentiment, ...
from .sentiment import SentimentResult, analyze_sentiment
-from .dedupe import is_valid_comment, normalize_comment
+from .helpers import CommentIssue, ValidationResult, is_valid_comment, normalize_comment
from .course_codes import build_course_mapping, clean_course_label
__all__ = [
"SentimentResult",
"analyze_sentiment",
+ "CommentIssue",
+ "ValidationResult",
"is_valid_comment",
"normalize_comment",
"build_course_mapping",
diff --git a/src/rmp_client/extras/dedupe.py b/src/rmp_client/extras/dedupe.py
deleted file mode 100644
index 8f1e6e2..0000000
--- a/src/rmp_client/extras/dedupe.py
+++ /dev/null
@@ -1,14 +0,0 @@
-from __future__ import annotations
-
-import re
-
-
-def normalize_comment(text: str) -> str:
- """Lowercase and collapse whitespace for comment comparison."""
- return re.sub(r"\s+", " ", text.strip().lower())
-
-
-def is_valid_comment(text: str, *, min_len: int = 10) -> bool:
- """Basic heuristic to filter out empty/very short comments."""
- return bool(text and len(text.strip()) >= min_len)
-
diff --git a/src/rmp_client/extras/helpers.py b/src/rmp_client/extras/helpers.py
new file mode 100644
index 0000000..b27be42
--- /dev/null
+++ b/src/rmp_client/extras/helpers.py
@@ -0,0 +1,100 @@
+"""Helpers for normalizing and validating rating comments."""
+
+from __future__ import annotations
+
+import re
+from dataclasses import dataclass, field
+from typing import Literal
+
+
+def _strip_html(text: str) -> str:
+ """Strip HTML tags from text (RMP comments occasionally contain markup)."""
+ return re.sub(r"<[^>]*>", "", text)
+
+
+def normalize_comment(
+ text: str,
+ *,
+ strip_html: bool = True,
+ strip_punctuation: bool = False,
+) -> str:
+ """Normalize a comment for comparison or deduplication.
+
+ - Trims leading/trailing whitespace
+ - Strips HTML tags (opt-out via *strip_html*)
+ - Lowercases
+ - Collapses runs of whitespace to a single space
+ - Optionally strips punctuation for looser matching
+ """
+ out = text.strip()
+ if strip_html:
+ out = _strip_html(out)
+ out = re.sub(r"\s+", " ", out.lower())
+ if strip_punctuation:
+ out = re.sub(r"[^\w\s]", "", out)
+ return out
+
+
+IssueCode = Literal[
+ "empty",
+ "too_short",
+ "all_caps",
+ "excessive_repeats",
+ "no_alpha",
+]
+
+
+@dataclass
+class CommentIssue:
+ code: IssueCode
+ message: str
+
+
+@dataclass
+class ValidationResult:
+ valid: bool
+ issues: list[CommentIssue] = field(default_factory=list)
+
+
+def is_valid_comment(text: str, *, min_len: int = 10) -> ValidationResult:
+ """Validate a comment and return detailed diagnostics.
+
+ Checks for:
+ - Empty or whitespace-only text
+ - Below minimum length (*min_len*, default 10)
+ - All uppercase (shouting)
+ - Excessive repeated characters (e.g. "aaaaaaa")
+ - No alphabetic characters at all
+ """
+ issues: list[CommentIssue] = []
+ trimmed = (text or "").strip()
+
+ if not trimmed:
+ issues.append(CommentIssue(code="empty", message="Comment is empty"))
+ return ValidationResult(valid=False, issues=issues)
+
+ if len(trimmed) < min_len:
+ issues.append(
+ CommentIssue(
+ code="too_short",
+ message=f"Comment is {len(trimmed)} chars (minimum {min_len})",
+ )
+ )
+
+ if len(trimmed) > 3 and trimmed == trimmed.upper() and re.search(r"[A-Z]", trimmed):
+ issues.append(CommentIssue(code="all_caps", message="Comment is all uppercase"))
+
+ if re.search(r"(.)\1{4,}", trimmed, re.IGNORECASE):
+ issues.append(
+ CommentIssue(
+ code="excessive_repeats",
+ message="Comment contains excessive repeated characters",
+ )
+ )
+
+ if not re.search(r"[a-zA-Z]", trimmed):
+ issues.append(
+ CommentIssue(code="no_alpha", message="Comment contains no alphabetic characters")
+ )
+
+ return ValidationResult(valid=len(issues) == 0, issues=issues)