diff --git a/README.md b/README.md index 3ca94e8..e7f33d9 100644 --- a/README.md +++ b/README.md @@ -1,8 +1,10 @@ # RateMyProfessors API Client (Python) +[![PyPI](https://img.shields.io/pypi/v/ratemyprofessors-client?color=10b981)](https://pypi.org/project/ratemyprofessors-client/) [![downloads](https://img.shields.io/pepy/dt/ratemyprofessors-client)](https://pypi.org/project/ratemyprofessors-client/) [![docs](https://img.shields.io/badge/docs-website-10b981)](https://amaanjaved1.github.io/Rate-My-Professors-API-Client/) + A typed, retrying, rate-limited **unofficial** client for [RateMyProfessors](https://www.ratemyprofessors.com). -> **Disclaimer:** This library is unofficial and may break if RMP changes their internal API. Use responsibly and respect rate limits. +> **Looking for TypeScript?** Check out the [TypeScript version](https://github.com/amaanjaved1/rate-my-professors-client-ts). ## Requirements @@ -114,8 +116,8 @@ from rmp_client import ( ) ``` -- `normalize_comment(text)` — Normalize text for deduplication (lowercase, collapse whitespace) -- `is_valid_comment(text, min_len=10)` — Check if a comment is non-empty and meets a minimum length +- `normalize_comment(text, *, strip_html=True, strip_punctuation=False)` — Normalize text for deduplication (trim, strip HTML, lowercase, collapse whitespace; optionally strip punctuation) +- `is_valid_comment(text, *, min_len=10)` — Validate a comment and return a `ValidationResult` with diagnostics (empty, too short, all caps, excessive repeats, no alpha) - `clean_course_label(raw)` — Clean scraped course labels (remove counts, normalize whitespace) - `build_course_mapping(scraped, valid)` — Map scraped labels to known course codes - `analyze_sentiment(text)` — Compute sentiment label from text (uses TextBlob) diff --git a/docs/configuration.html b/docs/configuration.html new file mode 100644 index 0000000..5fa92b8 --- /dev/null +++ b/docs/configuration.html @@ -0,0 +1,76 @@ + + + + + + Configuration — RMP Client + + + + + + +
+
+ + + + +
+
+ + + +
+

Configuration

+

The client is configured via RMPClientConfig. All fields have sensible defaults.

+
from rmp_client import RMPClientConfig, RMPClient
+
+config = RMPClientConfig(
+    base_url="https://www.ratemyprofessors.com/graphql",
+    timeout_seconds=10.0,
+    max_retries=3,
+    rate_limit_per_minute=60,
+)
+with RMPClient(config) as client:
+    ...
+ +

Available Options #

+ + + + + + + + +
OptionTypeDefaultDescription
base_urlstrhttps://...graphqlGraphQL endpoint URL
timeout_secondsfloat10.0HTTP request timeout
max_retriesint3Number of retry attempts for failed requests
rate_limit_per_minuteint60Max requests per minute (token bucket)
user_agentstrFirefox UAUser-Agent header
default_headersMapping[str, str]UA + Accept-LanguageDefault headers for all requests
+ +

Rate Limiting #

+

The client uses a token-bucket algorithm. Tokens replenish continuously. Each request consumes one token. If no tokens are available, the request blocks until one becomes available.

+
config = RMPClientConfig(rate_limit_per_minute=30)
+ +

Retries #

+

On 5xx errors or network failures, the client retries up to max_retries times. 4xx errors are not retried. After exhausting retries, a RetryError is raised.

+
config = RMPClientConfig(max_retries=5)
+ +

Timeouts #

+

The timeout_seconds value applies to each individual HTTP request (connect + read).

+
config = RMPClientConfig(timeout_seconds=30.0)
+
+ + + + + diff --git a/docs/configuration.md b/docs/configuration.md deleted file mode 100644 index f215d2f..0000000 --- a/docs/configuration.md +++ /dev/null @@ -1,52 +0,0 @@ -### Configuration - -The client is configured via `RMPClientConfig`. All fields have sensible defaults. - -```python -from rmp_client import RMPClientConfig, RMPClient - -config = RMPClientConfig( - base_url="https://www.ratemyprofessors.com/graphql", - timeout_seconds=10.0, - max_retries=3, - rate_limit_per_minute=60, -) - -with RMPClient(config) as client: - ... -``` - -#### Available options - -| Option | Type | Default | Description | -|--------|------|---------|-------------| -| `base_url` | `str` | `https://www.ratemyprofessors.com/graphql` | GraphQL endpoint URL | -| `timeout_seconds` | `float` | `10.0` | HTTP request timeout | -| `max_retries` | `int` | `3` | Number of retry attempts for failed requests | -| `rate_limit_per_minute` | `int` | `60` | Max requests per minute (token bucket) | -| `user_agent` | `str` | Firefox UA | User-Agent header sent with every request | -| `default_headers` | `Mapping[str, str]` | UA + Accept-Language | Default headers for all requests | - -#### Rate limiting - -The client uses a token-bucket algorithm. Tokens replenish continuously at `rate_limit_per_minute / 60` tokens per second. Each request consumes one token. If no tokens are available, the request blocks until one becomes available. - -```python -config = RMPClientConfig(rate_limit_per_minute=30) # half the default rate -``` - -#### Retries - -On 5xx errors or network failures, the client retries up to `max_retries` times. 4xx errors are **not** retried. After exhausting retries, a `RetryError` is raised containing the last underlying exception. - -```python -config = RMPClientConfig(max_retries=5) # more retries for flaky networks -``` - -#### Timeouts - -The `timeout_seconds` value applies to each individual HTTP request (connect + read). - -```python -config = RMPClientConfig(timeout_seconds=30.0) # generous timeout -``` diff --git a/docs/extras.html b/docs/extras.html new file mode 100644 index 0000000..a70f51e --- /dev/null +++ b/docs/extras.html @@ -0,0 +1,101 @@ + + + + + + Extras — RMP Client + + + + + + +
+
+ + + + +
+
+ + + +
+

Extras

+

Helpers for ingestion pipelines. Import them from rmp_client:

+
from rmp_client import (
+    analyze_sentiment, normalize_comment,
+    is_valid_comment, build_course_mapping,
+    clean_course_label,
+)
+ +

Sentiment #

+

Compute a sentiment score and label from comment text (uses TextBlob internally).

+
result = analyze_sentiment("Great prof, explains concepts clearly.")
+print(result.score, result.label)  # e.g. 0.65 "positive"
+ +

Helpers #

+ +

normalize_comment

+
normalize_comment(text: str, *, strip_html: bool = True, strip_punctuation: bool = False) -> str
+

Normalizes a comment for comparison or deduplication. Trims whitespace, strips HTML tags (opt-out), lowercases, and collapses runs of whitespace. Optionally strips punctuation for looser matching.

+ + + + + +
ParameterTypeDefaultDescription
textstrComment text
strip_htmlboolTrueRemove HTML tags
strip_punctuationboolFalseRemove all punctuation
+
a = normalize_comment("  Great  Professor!  ")
+b = normalize_comment("great professor!")
+assert a == b  # True
+
+normalize_comment("<b>Loved</b> this class")  # "loved this class"
+normalize_comment("Hello, world!", strip_punctuation=True)  # "hello world"
+ +

is_valid_comment

+
is_valid_comment(text: str, *, min_len: int = 10) -> ValidationResult
+

Validates a comment and returns detailed diagnostics. Checks for empty text, insufficient length, all-caps, excessive repeated characters, and absence of alphabetic characters.

+ + + + +
ParameterTypeDefaultDescription
textstrComment text
min_lenint10Minimum character length
+

Returns: ValidationResult with valid (bool) and issues (list of CommentIssue).

+

Each issue has a code ("empty", "too_short", "all_caps", "excessive_repeats", "no_alpha") and a human-readable message.

+
result = is_valid_comment("Good")
+# ValidationResult(valid=False, issues=[CommentIssue(code="too_short", ...)])
+
+result = is_valid_comment("Great class, learned a lot")
+# ValidationResult(valid=True, issues=[])
+
+result = is_valid_comment("WORST PROF EVER!!!")
+# ValidationResult(valid=False, issues=[CommentIssue(code="all_caps", ...)])
+ +

Course Code Helpers #

+

Map scraped RMP course labels to your course catalog.

+
scraped = ["ANAT 215 (12)", "phys115"]
+valid = ["ANAT 215", "PHYS 115"]
+
+mapping = build_course_mapping(scraped, valid)
+# {"ANAT 215 (12)": "ANAT 215", "phys115": "PHYS 115"}
+
+cleaned = clean_course_label("MATH 101 (5)")
+# "MATH 101"
+
+ + + + + diff --git a/docs/extras.md b/docs/extras.md deleted file mode 100644 index e09c426..0000000 --- a/docs/extras.md +++ /dev/null @@ -1,48 +0,0 @@ -### Helpers for Ingestion Pipelines - -These helpers are part of the main package. Import them from `rmp_client`: - -```python -from rmp_client import ( - analyze_sentiment, - normalize_comment, - is_valid_comment, - build_course_mapping, - clean_course_label, -) -``` - -#### Sentiment - -Compute a sentiment score and label from comment text (uses TextBlob internally). - -```python -result = analyze_sentiment("Great prof, explains concepts clearly.") -print(result.score, result.label) # e.g. 0.65 "positive" -``` - -#### Dedupe helpers - -Normalize comments for deduplication and filter out low-quality entries. - -```python -raw = " This prof is AMAZING!!! " -normalized = normalize_comment(raw) # "this prof is amazing!!!" -if is_valid_comment(normalized, min_len=10): - print("Valid comment") -``` - -#### Course code helpers - -Map scraped RMP course labels to your course catalog. - -```python -scraped = ["ANAT 215 (12)", "phys115"] -valid = ["ANAT 215", "PHYS 115"] - -mapping = build_course_mapping(scraped, valid) -# {"ANAT 215 (12)": "ANAT 215", "phys115": "PHYS 115"} - -cleaned = clean_course_label("MATH 101 (5)") -# "MATH 101" -``` diff --git a/docs/index.html b/docs/index.html new file mode 100644 index 0000000..dadebd4 --- /dev/null +++ b/docs/index.html @@ -0,0 +1,92 @@ + + + + + + RateMyProfessors API Client (Python) + + + + + + +
+
+ + + + +
+
+ + + +
+

RateMyProfessors API Client

+
+ PyPI + downloads +
+

+ An unofficial, typed Python client for RateMyProfessors. + All data is fetched via RMP's GraphQL API — no HTML scraping or browser automation required. +

+ +
+ Disclaimer: This library is unofficial and may break if RMP changes their internal API. Use responsibly and respect rate limits. +
+ +

Features #

+ + +

Requirements #

+ + +

Installation #

+
pip install ratemyprofessors-client
+ +

Quick Start #

+
from rmp_client import RMPClient
+
+with RMPClient() as client:
+    prof = client.get_professor("2823076")
+    print(prof.name, prof.overall_rating)
+
+    for rating in client.iter_professor_ratings(prof.id):
+        print(rating.date, rating.quality, rating.comment)
+ +

Documentation #

+ +
+ + + + + diff --git a/docs/index.md b/docs/index.md deleted file mode 100644 index faae2f6..0000000 --- a/docs/index.md +++ /dev/null @@ -1,35 +0,0 @@ -### RateMyProfessors API Client - -An unofficial, typed Python client for [RateMyProfessors](https://www.ratemyprofessors.com). - -All data is fetched via RMP's GraphQL API -- no HTML scraping or browser automation required. - -**Features:** - -- Strong typing via Pydantic models -- Automatic retries with configurable max attempts -- Token-bucket rate limiting (default 60 req/min) -- In-memory caching for ratings pages (pre-fetches all ratings on first load) -- Cursor-based pagination for all list/search endpoints -- Clear error hierarchy for precise exception handling -- Built-in helpers for ingestion workflows (sentiment, dedupe, course codes) - -**Quick start:** - -```python -from rmp_client import RMPClient - -with RMPClient() as client: - prof = client.get_professor("2823076") - print(prof.name, prof.overall_rating) - - for rating in client.iter_professor_ratings(prof.id): - print(rating.date, rating.quality, rating.comment) -``` - -**Documentation:** - -- [Usage](usage.md) — Quickstart examples for every endpoint -- [Configuration](configuration.md) — Tuning retries, rate limits, timeouts, and headers -- [API Reference](reference.md) — Full method and type reference -- [Extras](extras.md) — Ingestion helpers (sentiment, dedupe, course mapping) diff --git a/docs/reference.html b/docs/reference.html new file mode 100644 index 0000000..0a675f0 --- /dev/null +++ b/docs/reference.html @@ -0,0 +1,124 @@ + + + + + + API Reference — RMP Client + + + + + + +
+
+ + + + +
+
+ + + +
+

API Reference

+

The main entry point is RMPClient. Use as a context manager or call close() when done.

+
from rmp_client import RMPClient, RMPClientConfig
+
+with RMPClient(config=RMPClientConfig()) as client:
+    ...
+ +

School Methods #

+ + + + + + + +
MethodReturnsDescription
search_schools(query, *, page_size=20, cursor=None)SchoolSearchResultSearch schools by name
get_school(school_id)SchoolFetch a single school with category ratings
get_compare_schools(school_id_1, school_id_2)CompareSchoolsResultFetch two schools side by side
get_school_ratings_page(school_id, *, cursor=None, page_size=20)SchoolRatingsPageGet one page of school ratings (cached)
iter_school_ratings(school_id, *, page_size=20, since=None)Iterator[SchoolRating]Iterate all school ratings
+ +

Professor Methods #

+ + + + + + + + +
MethodReturnsDescription
search_professors(query, *, school_id=None, page_size=20, cursor=None)ProfessorSearchResultSearch professors by name
list_professors_for_school(school_id, *, query=None, page_size=20, cursor=None)ProfessorSearchResultList professors at a school
iter_professors_for_school(school_id, *, query=None, page_size=20)Iterator[Professor]Iterate all professors at a school
get_professor(professor_id)ProfessorFetch a single professor
get_professor_ratings_page(professor_id, *, cursor=None, page_size=20, course_filter=None)ProfessorRatingsPageGet one page of professor ratings (cached)
iter_professor_ratings(professor_id, *, page_size=20, since=None, course_filter=None)Iterator[Rating]Iterate all professor ratings
+ +

Low-level #

+ + + + +
MethodReturnsDescription
raw_query(payload)dictSend a raw GraphQL payload
close()NoneClose the HTTP client and clear caches
+ +
+ +

Models #

+

All models are Pydantic BaseModel subclasses.

+ +

School

+

id, name, location, overall_quality, num_ratings, reputation, safety, happiness, facilities, social, location_rating, clubs, opportunities, internet, food

+ +

Professor

+

id, name, department, school (School), url, overall_rating, num_ratings, percent_take_again, level_of_difficulty, tags, rating_distribution

+ +

Rating

+

date, comment, quality, difficulty, tags, course_raw, details, thumbs_up, thumbs_down

+ +

SchoolRating

+

date, comment, overall, category_ratings (dict), thumbs_up, thumbs_down

+ +

ProfessorSearchResult / SchoolSearchResult

+

professors/schools, total, page_size, has_next_page, next_cursor

+ +

ProfessorRatingsPage / SchoolRatingsPage

+

professor/school, ratings, has_next_page, next_cursor

+ +

CompareSchoolsResult

+

school_1, school_2

+ +
+ +

Errors #

+

All errors extend RMPError.

+ + + + + + + + +
ErrorDescription
HttpErrorNon-2xx HTTP response. Has status_code, url, body.
ParsingErrorCould not parse the GraphQL response.
RateLimitErrorLocal rate limiter blocked the request.
RetryErrorAll retry attempts exhausted.
RMPAPIErrorGraphQL API returned an errors array.
ConfigurationErrorInvalid client configuration.
+
from rmp_client import RMPClient, HttpError, ParsingError
+
+with RMPClient() as client:
+    try:
+        prof = client.get_professor("999999")
+    except ParsingError:
+        print("Professor not found")
+    except HttpError as e:
+        print(f"HTTP {e.status_code}")
+
+ + + + + diff --git a/docs/reference.md b/docs/reference.md deleted file mode 100644 index bbe5356..0000000 --- a/docs/reference.md +++ /dev/null @@ -1,91 +0,0 @@ -### API Reference - -#### RMPClient - -The main entry point. Use as a context manager or call `close()` when done. - -```python -from rmp_client import RMPClient, RMPClientConfig - -with RMPClient(config=RMPClientConfig()) as client: - ... -``` - -**School methods:** - -| Method | Returns | Description | -|--------|---------|-------------| -| `search_schools(query, *, page_size=20, cursor=None)` | `SchoolSearchResult` | Search schools by name | -| `get_school(school_id)` | `School` | Fetch a single school with category ratings | -| `get_compare_schools(school_id_1, school_id_2)` | `CompareSchoolsResult` | Fetch two schools side by side | -| `get_school_ratings_page(school_id, *, cursor=None, page_size=20)` | `SchoolRatingsPage` | Get one page of school ratings (cached) | -| `iter_school_ratings(school_id, *, page_size=20, since=None)` | `Iterator[SchoolRating]` | Iterate all school ratings | - -**Professor methods:** - -| Method | Returns | Description | -|--------|---------|-------------| -| `search_professors(query, *, school_id=None, page_size=20, cursor=None)` | `ProfessorSearchResult` | Search professors by name | -| `list_professors_for_school(school_id, *, query=None, page_size=20, cursor=None)` | `ProfessorSearchResult` | List professors at a school | -| `iter_professors_for_school(school_id, *, query=None, page_size=20)` | `Iterator[Professor]` | Iterate all professors at a school | -| `get_professor(professor_id)` | `Professor` | Fetch a single professor | -| `get_professor_ratings_page(professor_id, *, cursor=None, page_size=20, course_filter=None)` | `ProfessorRatingsPage` | Get one page of professor ratings (cached) | -| `iter_professor_ratings(professor_id, *, page_size=20, since=None, course_filter=None)` | `Iterator[Rating]` | Iterate all professor ratings | - -**Low-level:** - -| Method | Returns | Description | -|--------|---------|-------------| -| `raw_query(payload)` | `dict` | Send a raw GraphQL payload | -| `close()` | `None` | Close the HTTP client and clear caches | - ---- - -#### Models - -All models are Pydantic `BaseModel` subclasses. - -**`School`** — `id`, `name`, `location`, `overall_quality`, `num_ratings`, `reputation`, `safety`, `happiness`, `facilities`, `social`, `location_rating`, `clubs`, `opportunities`, `internet`, `food` - -**`Professor`** — `id`, `name`, `department`, `school` (School), `url`, `overall_rating`, `num_ratings`, `percent_take_again`, `level_of_difficulty`, `tags`, `rating_distribution` - -**`Rating`** — `date`, `comment`, `quality`, `difficulty`, `tags`, `course_raw`, `details`, `thumbs_up`, `thumbs_down` - -**`SchoolRating`** — `date`, `comment`, `overall`, `category_ratings` (dict), `thumbs_up`, `thumbs_down` - -**`ProfessorSearchResult`** — `professors`, `total`, `page_size`, `has_next_page`, `next_cursor` - -**`SchoolSearchResult`** — `schools`, `total`, `page_size`, `has_next_page`, `next_cursor` - -**`ProfessorRatingsPage`** — `professor`, `ratings`, `has_next_page`, `next_cursor` - -**`SchoolRatingsPage`** — `school`, `ratings`, `has_next_page`, `next_cursor` - -**`CompareSchoolsResult`** — `school_1`, `school_2` - ---- - -#### Errors - -All errors extend `RMPError`. - -| Error | Description | -|-------|-------------| -| `HttpError` | Non-2xx HTTP response. Has `status_code`, `url`, `body`. | -| `ParsingError` | Could not parse the GraphQL response (e.g. entity not found). | -| `RateLimitError` | Local rate limiter blocked the request. | -| `RetryError` | All retry attempts exhausted. Wraps the last exception. | -| `RMPAPIError` | GraphQL API returned an `errors` array. Has `details`. | -| `ConfigurationError` | Invalid client configuration. | - -```python -from rmp_client import RMPClient, HttpError, ParsingError - -with RMPClient() as client: - try: - prof = client.get_professor("999999") - except ParsingError: - print("Professor not found") - except HttpError as e: - print(f"HTTP {e.status_code}") -``` diff --git a/docs/script.js b/docs/script.js new file mode 100644 index 0000000..738a89a --- /dev/null +++ b/docs/script.js @@ -0,0 +1,78 @@ +(function () { + var MOON = ''; + var SUN = ''; + + function isDark() { + return document.documentElement.classList.contains('dark'); + } + + function updateIcon() { + var btn = document.getElementById('theme-toggle'); + if (btn) btn.innerHTML = isDark() ? SUN : MOON; + } + + updateIcon(); + + document.addEventListener('DOMContentLoaded', function () { + updateIcon(); + + var toggle = document.getElementById('theme-toggle'); + if (toggle) { + toggle.addEventListener('click', function () { + var dark = !isDark(); + document.documentElement.classList.toggle('dark', dark); + localStorage.setItem('rmp-docs-theme', dark ? 'dark' : 'light'); + updateIcon(); + }); + } + + buildTOC(); + if (typeof hljs !== 'undefined') hljs.highlightAll(); + }); + + function buildTOC() { + var list = document.getElementById('toc-list'); + var tocEl = document.querySelector('.toc'); + if (!list || !tocEl) return; + + var headings = document.querySelectorAll('main h2[id]'); + if (headings.length === 0) { + tocEl.style.display = 'none'; + document.querySelector('.content').style.marginRight = '0'; + return; + } + + headings.forEach(function (h) { + var li = document.createElement('li'); + var a = document.createElement('a'); + a.href = '#' + h.id; + var text = ''; + for (var i = 0; i < h.childNodes.length; i++) { + var n = h.childNodes[i]; + if (n.nodeType === 3) text += n.textContent; + else if (!n.classList || !n.classList.contains('anchor')) text += n.textContent; + } + a.textContent = text.trim(); + a.addEventListener('click', function (e) { + e.preventDefault(); + h.scrollIntoView({ behavior: 'smooth', block: 'start' }); + history.replaceState(null, '', '#' + h.id); + }); + li.appendChild(a); + list.appendChild(li); + }); + + var tocLinks = list.querySelectorAll('a'); + var observer = new IntersectionObserver(function (entries) { + entries.forEach(function (entry) { + if (entry.isIntersecting) { + tocLinks.forEach(function (a) { a.classList.remove('active'); }); + var link = list.querySelector('a[href="#' + entry.target.id + '"]'); + if (link) link.classList.add('active'); + } + }); + }, { rootMargin: '-60px 0px -75% 0px' }); + + headings.forEach(function (h) { observer.observe(h); }); + } +})(); diff --git a/docs/style.css b/docs/style.css new file mode 100644 index 0000000..9763133 --- /dev/null +++ b/docs/style.css @@ -0,0 +1,581 @@ +@import url('https://fonts.googleapis.com/css2?family=Inter:wght@400;500;600;700&display=swap'); + +*, +*::before, +*::after { + box-sizing: border-box; + margin: 0; + padding: 0; +} + +:root { + --primary: #10b981; + --primary-hover: #059669; + --primary-subtle: #ecfdf5; + --bg: #ffffff; + --bg-surface: #f8fafc; + --bg-code: #1e1e2e; + --text: #1e293b; + --text-secondary: #64748b; + --text-code: #d4d4d8; + --border: #e2e8f0; + --border-subtle: #f1f5f9; + --shadow-sm: 0 1px 2px rgba(0, 0, 0, 0.05); + --shadow-md: 0 4px 6px -1px rgba(0, 0, 0, 0.07), 0 2px 4px -2px rgba(0, 0, 0, 0.05); + --shadow-lg: 0 10px 25px -5px rgba(0, 0, 0, 0.1), 0 8px 10px -6px rgba(0, 0, 0, 0.1); + --font-sans: 'Inter', -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, sans-serif; + --font-mono: 'SF Mono', 'Cascadia Code', 'Fira Code', Consolas, monospace; + --topbar-height: 52px; + --sidebar-width: 240px; + --toc-width: 200px; + --radius: 8px; + --radius-sm: 6px; +} + +html.dark { + --primary: #34d399; + --primary-hover: #6ee7b7; + --primary-subtle: rgba(16, 185, 129, 0.1); + --bg: #0a0a0a; + --bg-surface: #141414; + --bg-code: #161616; + --text: #e5e5e5; + --text-secondary: #a0a0a0; + --text-code: #d4d4d8; + --border: #262626; + --border-subtle: #1a1a1a; + --shadow-sm: 0 1px 2px rgba(0, 0, 0, 0.4); + --shadow-md: 0 4px 6px -1px rgba(0, 0, 0, 0.5); + --shadow-lg: 0 10px 25px -5px rgba(0, 0, 0, 0.6); +} + +html { + font-size: 16px; + scroll-behavior: smooth; + -webkit-font-smoothing: antialiased; + -moz-osx-font-smoothing: grayscale; +} + +body { + font-family: var(--font-sans); + color: var(--text); + background: var(--bg); + line-height: 1.7; +} + +a { + color: var(--primary); + text-decoration: none; + transition: color 0.15s ease; +} + +a:hover { + color: var(--primary-hover); +} + +::selection { + background: var(--primary); + color: #fff; +} + +/* ------------------------------------------------------------------ */ +/* Top Bar */ +/* ------------------------------------------------------------------ */ + +.topbar { + position: fixed; + top: 0; + left: 0; + right: 0; + height: var(--topbar-height); + background: var(--bg); + border-bottom: 1px solid var(--border); + display: flex; + align-items: center; + justify-content: flex-end; + padding: 0 1.25rem; + z-index: 100; +} + +.topbar-right { + display: flex; + align-items: center; + gap: 6px; +} + +.topbar-btn { + display: inline-flex; + align-items: center; + justify-content: center; + gap: 6px; + padding: 6px 10px; + border-radius: var(--radius-sm); + border: 1px solid var(--border); + background: var(--bg-surface); + color: var(--text-secondary); + cursor: pointer; + font-family: var(--font-sans); + font-size: 0.8rem; + transition: all 0.15s ease; + text-decoration: none; + line-height: 1; +} + +.topbar-btn:hover { + color: var(--text); + border-color: var(--primary); + text-decoration: none; +} + +.topbar-btn svg { + flex-shrink: 0; +} + +/* ------------------------------------------------------------------ */ +/* Sidebar */ +/* ------------------------------------------------------------------ */ + +.sidebar { + width: var(--sidebar-width); + position: fixed; + top: var(--topbar-height); + left: 0; + bottom: 0; + background: var(--bg-surface); + border-right: 1px solid var(--border); + padding: 1.5rem 1rem; + overflow-y: auto; + display: flex; + flex-direction: column; + gap: 1.5rem; + z-index: 50; +} + +.sidebar-brand { + font-size: 1.15rem; + font-weight: 700; + color: var(--text); + letter-spacing: -0.02em; + padding: 0 0.5rem; +} + +.sidebar nav ul { + list-style: none; + display: flex; + flex-direction: column; + gap: 2px; +} + +.sidebar nav a { + display: block; + padding: 0.4rem 0.75rem; + border-radius: var(--radius-sm); + font-size: 0.875rem; + font-weight: 500; + color: var(--text-secondary); + transition: all 0.15s ease; +} + +.sidebar nav a:hover { + color: var(--text); + background: var(--border-subtle); + text-decoration: none; +} + +.sidebar nav a.active { + color: var(--primary); + background: var(--primary-subtle); + font-weight: 600; +} + +/* ------------------------------------------------------------------ */ +/* Table of Contents (right sidebar) */ +/* ------------------------------------------------------------------ */ + +.toc { + width: var(--toc-width); + position: fixed; + top: var(--topbar-height); + right: 0; + bottom: 0; + padding: 1.5rem 1rem 1.5rem 0; + overflow-y: auto; + z-index: 50; +} + +.toc-title { + font-size: 0.7rem; + font-weight: 600; + text-transform: uppercase; + letter-spacing: 0.06em; + color: var(--text-secondary); + padding: 0 0 0.6rem 0.75rem; +} + +.toc ul { + list-style: none; + border-left: 1px solid var(--border); +} + +.toc a { + display: block; + padding: 0.25rem 0.75rem; + font-size: 0.78rem; + color: var(--text-secondary); + border-left: 2px solid transparent; + margin-left: -1px; + transition: all 0.15s ease; + text-decoration: none; +} + +.toc a:hover { + color: var(--text); +} + +.toc a.active { + color: var(--primary); + border-left-color: var(--primary); +} + +/* ------------------------------------------------------------------ */ +/* Main content */ +/* ------------------------------------------------------------------ */ + +.content { + margin-left: var(--sidebar-width); + margin-right: var(--toc-width); + max-width: 54rem; + padding: calc(var(--topbar-height) + 2.5rem) 3rem 4rem; +} + +/* ------------------------------------------------------------------ */ +/* Typography */ +/* ------------------------------------------------------------------ */ + +h1 { + font-size: 2rem; + font-weight: 700; + letter-spacing: -0.025em; + margin-bottom: 0.5rem; + line-height: 1.3; +} + +h1 + p { + font-size: 1.05rem; + color: var(--text-secondary); + margin-bottom: 2rem; +} + +h2 { + font-size: 1.35rem; + font-weight: 600; + margin-top: 3rem; + margin-bottom: 0.75rem; + letter-spacing: -0.015em; + display: flex; + align-items: center; + gap: 0.5rem; + scroll-margin-top: calc(var(--topbar-height) + 1.5rem); +} + +h2 .anchor { + color: var(--border); + font-weight: 400; + font-size: 1rem; + text-decoration: none; + opacity: 0; + transition: opacity 0.15s ease; +} + +h2:hover .anchor { + opacity: 1; + color: var(--primary); +} + +h3 { + font-size: 1.1rem; + font-weight: 600; + margin-top: 2rem; + margin-bottom: 0.5rem; + letter-spacing: -0.01em; + scroll-margin-top: calc(var(--topbar-height) + 1.5rem); +} + +p { + margin-bottom: 0.85rem; +} + +ul, +ol { + margin-left: 1.5rem; + margin-bottom: 0.85rem; +} + +li { + margin-bottom: 0.35rem; +} + +strong { + font-weight: 600; +} + +hr { + border: none; + border-top: 1px solid var(--border); + margin: 2.5rem 0; +} + +/* ------------------------------------------------------------------ */ +/* Inline code */ +/* ------------------------------------------------------------------ */ + +code { + font-family: var(--font-mono); + font-size: 0.85em; + background: var(--bg-surface); + border: 1px solid var(--border); + padding: 0.15em 0.4em; + border-radius: 4px; + font-weight: 500; +} + +/* ------------------------------------------------------------------ */ +/* Code blocks */ +/* ------------------------------------------------------------------ */ + +pre { + background: var(--bg-code); + border: 1px solid var(--border); + border-radius: var(--radius); + padding: 1.25rem 1.5rem; + overflow-x: auto; + margin-bottom: 1.25rem; + box-shadow: var(--shadow-md); + line-height: 1.6; +} + +html.dark pre { + border-color: #2a2a2a; +} + +pre code { + background: none; + border: none; + padding: 0; + font-size: 0.84rem; + color: var(--text-code); + font-weight: 400; +} + +/* Styled scrollbar for code blocks */ +pre::-webkit-scrollbar { + height: 6px; +} + +pre::-webkit-scrollbar-track { + background: transparent; +} + +pre::-webkit-scrollbar-thumb { + background: #4a4a5a; + border-radius: 3px; +} + +pre::-webkit-scrollbar-thumb:hover { + background: #5a5a6a; +} + +.method-sig::-webkit-scrollbar { + height: 6px; +} + +.method-sig::-webkit-scrollbar-track { + background: transparent; +} + +.method-sig::-webkit-scrollbar-thumb { + background: #4a4a5a; + border-radius: 3px; +} + +/* Firefox scrollbar */ +pre, +.method-sig { + scrollbar-width: thin; + scrollbar-color: #4a4a5a transparent; +} + +/* ------------------------------------------------------------------ */ +/* Highlight.js overrides */ +/* ------------------------------------------------------------------ */ + +pre code.hljs { + background: transparent !important; + padding: 0 !important; +} + +.hljs { + background: transparent !important; +} + +/* ------------------------------------------------------------------ */ +/* Tables */ +/* ------------------------------------------------------------------ */ + +table { + width: 100%; + border-collapse: collapse; + margin-bottom: 1.25rem; + font-size: 0.875rem; + border-radius: var(--radius); + overflow: hidden; + box-shadow: var(--shadow-sm); + border: 1px solid var(--border); +} + +th, +td { + padding: 0.65rem 1rem; + text-align: left; +} + +th { + background: var(--bg-surface); + font-weight: 600; + color: var(--text); + border-bottom: 2px solid var(--border); + font-size: 0.8rem; + text-transform: uppercase; + letter-spacing: 0.04em; +} + +td { + border-bottom: 1px solid var(--border-subtle); +} + +tr:last-child td { + border-bottom: none; +} + +tr:hover td { + background: var(--bg-surface); +} + +/* ------------------------------------------------------------------ */ +/* Callout */ +/* ------------------------------------------------------------------ */ + +.callout { + border-left: 3px solid var(--primary); + background: var(--primary-subtle); + padding: 0.85rem 1.15rem; + border-radius: 0 var(--radius-sm) var(--radius-sm) 0; + margin-bottom: 1.25rem; + font-size: 0.9rem; + line-height: 1.6; +} + +/* ------------------------------------------------------------------ */ +/* Method signatures */ +/* ------------------------------------------------------------------ */ + +.method-sig { + background: var(--bg-code); + color: var(--text-code); + padding: 0.7rem 1.1rem; + border: 1px solid var(--border); + border-radius: var(--radius-sm); + font-family: var(--font-mono); + font-size: 0.84rem; + margin-bottom: 0.75rem; + overflow-x: auto; + box-shadow: var(--shadow-sm); +} + +html.dark .method-sig { + border-color: #2a2a2a; +} + +/* ------------------------------------------------------------------ */ +/* Badges */ +/* ------------------------------------------------------------------ */ + +.badges { + display: flex; + gap: 6px; + flex-wrap: wrap; + margin-bottom: 1.25rem; +} + +.badges img { + height: 22px; +} + +/* ------------------------------------------------------------------ */ +/* Responsive */ +/* ------------------------------------------------------------------ */ + +@media (max-width: 1100px) { + .toc { + display: none; + } + + .content { + margin-right: 0; + } +} + +@media (max-width: 860px) { + .topbar { + padding-left: 1.25rem; + } + + .sidebar { + position: static; + width: 100%; + border-right: none; + border-bottom: 1px solid var(--border); + padding: 1rem; + gap: 0.75rem; + } + + .sidebar nav ul { + flex-direction: row; + flex-wrap: wrap; + gap: 4px; + } + + .content { + margin-left: 0; + padding: calc(var(--topbar-height) + 1.5rem) 1.25rem 3rem; + } +} + +@media (max-width: 480px) { + h1 { + font-size: 1.6rem; + } + + h2 { + font-size: 1.2rem; + } + + .content { + padding-left: 1rem; + padding-right: 1rem; + } + + pre { + padding: 1rem; + border-radius: var(--radius-sm); + } + + table { + font-size: 0.8rem; + } + + th, + td { + padding: 0.5rem 0.6rem; + } +} diff --git a/docs/usage.html b/docs/usage.html new file mode 100644 index 0000000..d6e5dfb --- /dev/null +++ b/docs/usage.html @@ -0,0 +1,125 @@ + + + + + + Usage — RMP Client + + + + + + +
+
+ + + + +
+
+ + + +
+

Usage

+

All examples use the RMPClient context manager, which handles connection setup and teardown.

+ +

Search Schools #

+
from rmp_client import RMPClient
+
+with RMPClient() as client:
+    result = client.search_schools("queens")
+    for school in result.schools:
+        print(school.name, school.location, school.overall_quality)
+
+    if result.has_next_page:
+        page2 = client.search_schools("queens", cursor=result.next_cursor)
+ +

Get a School by ID #

+
with RMPClient() as client:
+    school = client.get_school("1466")
+    print(school.name, school.location, school.overall_quality)
+    print(f"Reputation: {school.reputation}, Safety: {school.safety}")
+ +

Compare Two Schools #

+
with RMPClient() as client:
+    result = client.get_compare_schools("1466", "1491")
+    print(result.school_1.name, "vs", result.school_2.name)
+ +

Search Professors #

+
with RMPClient() as client:
+    result = client.search_professors("Smith")
+    for prof in result.professors:
+        print(prof.name, prof.overall_rating, prof.school.name if prof.school else "")
+
+    result = client.search_professors("Smith", school_id="1530")
+ +

List Professors at a School #

+
with RMPClient() as client:
+    result = client.list_professors_for_school(1466, page_size=20)
+    for prof in result.professors:
+        print(prof.name, prof.department)
+ +

Iterate All Professors at a School #

+
with RMPClient() as client:
+    for prof in client.iter_professors_for_school(1466, page_size=50):
+        print(prof.name, prof.num_ratings)
+ +

Get a Professor by ID #

+
with RMPClient() as client:
+    prof = client.get_professor("2823076")
+    print(prof.name, prof.department, prof.overall_rating)
+    print(f"Difficulty: {prof.level_of_difficulty}")
+    print(f"Would take again: {prof.percent_take_again}%")
+ +

Professor Ratings (Paginated, Cached) #

+
with RMPClient() as client:
+    page = client.get_professor_ratings_page("2823076", page_size=10)
+    print(f"Professor: {page.professor.name}")
+    for rating in page.ratings:
+        print(rating.date, rating.quality, rating.comment[:50])
+
+    if page.has_next_page:
+        page2 = client.get_professor_ratings_page("2823076", cursor=page.next_cursor)
+ +

Iterate All Professor Ratings #

+
from datetime import date
+from rmp_client import RMPClient
+
+with RMPClient() as client:
+    for rating in client.iter_professor_ratings("2823076", since=date(2024, 1, 1)):
+        print(rating.date, rating.quality, rating.comment)
+ +

School Ratings (Paginated, Cached) #

+
with RMPClient() as client:
+    page = client.get_school_ratings_page("1466", page_size=10)
+    for rating in page.ratings:
+        print(rating.date, rating.overall, rating.category_ratings)
+ +

Iterate All School Ratings #

+
with RMPClient() as client:
+    for rating in client.iter_school_ratings("1466"):
+        print(rating.date, rating.overall, rating.comment[:50])
+ +

Raw GraphQL Query #

+
with RMPClient() as client:
+    data = client.raw_query({"query": "query { viewer { id } }", "variables": {}})
+    print(data)
+
+ + + + + diff --git a/docs/usage.md b/docs/usage.md deleted file mode 100644 index cfecfa7..0000000 --- a/docs/usage.md +++ /dev/null @@ -1,127 +0,0 @@ -### Usage - -All examples use the `RMPClient` context manager, which handles connection setup and teardown. - -#### Search schools - -```python -from rmp_client import RMPClient - -with RMPClient() as client: - result = client.search_schools("queens") - for school in result.schools: - print(school.name, school.location, school.overall_quality) - - # Cursor pagination - if result.has_next_page: - page2 = client.search_schools("queens", cursor=result.next_cursor) -``` - -#### Get a school by ID - -```python -with RMPClient() as client: - school = client.get_school("1466") - print(school.name, school.location, school.overall_quality) - print(f"Reputation: {school.reputation}, Safety: {school.safety}") -``` - -#### Compare two schools - -```python -with RMPClient() as client: - result = client.get_compare_schools("1466", "1491") - print(result.school_1.name, "vs", result.school_2.name) -``` - -#### Search professors - -```python -with RMPClient() as client: - result = client.search_professors("Smith") - for prof in result.professors: - print(prof.name, prof.overall_rating, prof.school.name if prof.school else "") - - # Filter by school - result = client.search_professors("Smith", school_id="1530") -``` - -#### List professors at a school - -```python -with RMPClient() as client: - result = client.list_professors_for_school(1466, page_size=20) - for prof in result.professors: - print(prof.name, prof.department) -``` - -#### Iterate all professors at a school - -```python -with RMPClient() as client: - for prof in client.iter_professors_for_school(1466, page_size=50): - print(prof.name, prof.num_ratings) -``` - -#### Get a professor by ID - -```python -with RMPClient() as client: - prof = client.get_professor("2823076") - print(prof.name, prof.department, prof.overall_rating) - print(f"Difficulty: {prof.level_of_difficulty}") - print(f"Would take again: {prof.percent_take_again}%") -``` - -#### Fetch professor ratings (paginated, cached) - -```python -with RMPClient() as client: - page = client.get_professor_ratings_page("2823076", page_size=10) - print(f"Professor: {page.professor.name}") - for rating in page.ratings: - print(rating.date, rating.quality, rating.comment[:50]) - - # Load more (served from cache, no extra network request) - if page.has_next_page: - page2 = client.get_professor_ratings_page("2823076", cursor=page.next_cursor) -``` - -#### Iterate all professor ratings - -```python -from datetime import date -from rmp_client import RMPClient - -with RMPClient() as client: - for rating in client.iter_professor_ratings("2823076", since=date(2024, 1, 1)): - print(rating.date, rating.quality, rating.comment) -``` - -#### Fetch school ratings (paginated, cached) - -```python -with RMPClient() as client: - page = client.get_school_ratings_page("1466", page_size=10) - for rating in page.ratings: - print(rating.date, rating.overall, rating.category_ratings) -``` - -#### Iterate all school ratings - -```python -with RMPClient() as client: - for rating in client.iter_school_ratings("1466"): - print(rating.date, rating.overall, rating.comment[:50]) -``` - -#### Send a raw GraphQL query - -```python -with RMPClient() as client: - data = client.raw_query({ - "query": "query { viewer { id } }", - "variables": {}, - }) - print(data) -``` diff --git a/pyproject.toml b/pyproject.toml index 0bc5526..1f6af5b 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -4,7 +4,7 @@ build-backend = "hatchling.build" [project] name = "ratemyprofessors-client" -version = "2.0.0" +version = "2.1.0" description = "Typed, retrying, rate-limited unofficial Python client for the RateMyProfessors GraphQL API." readme = "README.md" requires-python = ">=3.10" diff --git a/src/rmp_client/__init__.py b/src/rmp_client/__init__.py index 3174647..d1e638d 100644 --- a/src/rmp_client/__init__.py +++ b/src/rmp_client/__init__.py @@ -15,6 +15,8 @@ from .extras import ( SentimentResult, analyze_sentiment, + CommentIssue, + ValidationResult, is_valid_comment, normalize_comment, build_course_mapping, @@ -34,6 +36,8 @@ "TokenBucket", "SentimentResult", "analyze_sentiment", + "CommentIssue", + "ValidationResult", "is_valid_comment", "normalize_comment", "build_course_mapping", diff --git a/src/rmp_client/extras/__init__.py b/src/rmp_client/extras/__init__.py index 3abf6b9..0f89db8 100644 --- a/src/rmp_client/extras/__init__.py +++ b/src/rmp_client/extras/__init__.py @@ -1,13 +1,15 @@ -# Ingestion helpers: sentiment, dedupe, course_codes. +# Ingestion helpers: sentiment, helpers, course_codes. # Re-exported from rmp_client so you can: from rmp_client import analyze_sentiment, ... from .sentiment import SentimentResult, analyze_sentiment -from .dedupe import is_valid_comment, normalize_comment +from .helpers import CommentIssue, ValidationResult, is_valid_comment, normalize_comment from .course_codes import build_course_mapping, clean_course_label __all__ = [ "SentimentResult", "analyze_sentiment", + "CommentIssue", + "ValidationResult", "is_valid_comment", "normalize_comment", "build_course_mapping", diff --git a/src/rmp_client/extras/dedupe.py b/src/rmp_client/extras/dedupe.py deleted file mode 100644 index 8f1e6e2..0000000 --- a/src/rmp_client/extras/dedupe.py +++ /dev/null @@ -1,14 +0,0 @@ -from __future__ import annotations - -import re - - -def normalize_comment(text: str) -> str: - """Lowercase and collapse whitespace for comment comparison.""" - return re.sub(r"\s+", " ", text.strip().lower()) - - -def is_valid_comment(text: str, *, min_len: int = 10) -> bool: - """Basic heuristic to filter out empty/very short comments.""" - return bool(text and len(text.strip()) >= min_len) - diff --git a/src/rmp_client/extras/helpers.py b/src/rmp_client/extras/helpers.py new file mode 100644 index 0000000..b27be42 --- /dev/null +++ b/src/rmp_client/extras/helpers.py @@ -0,0 +1,100 @@ +"""Helpers for normalizing and validating rating comments.""" + +from __future__ import annotations + +import re +from dataclasses import dataclass, field +from typing import Literal + + +def _strip_html(text: str) -> str: + """Strip HTML tags from text (RMP comments occasionally contain markup).""" + return re.sub(r"<[^>]*>", "", text) + + +def normalize_comment( + text: str, + *, + strip_html: bool = True, + strip_punctuation: bool = False, +) -> str: + """Normalize a comment for comparison or deduplication. + + - Trims leading/trailing whitespace + - Strips HTML tags (opt-out via *strip_html*) + - Lowercases + - Collapses runs of whitespace to a single space + - Optionally strips punctuation for looser matching + """ + out = text.strip() + if strip_html: + out = _strip_html(out) + out = re.sub(r"\s+", " ", out.lower()) + if strip_punctuation: + out = re.sub(r"[^\w\s]", "", out) + return out + + +IssueCode = Literal[ + "empty", + "too_short", + "all_caps", + "excessive_repeats", + "no_alpha", +] + + +@dataclass +class CommentIssue: + code: IssueCode + message: str + + +@dataclass +class ValidationResult: + valid: bool + issues: list[CommentIssue] = field(default_factory=list) + + +def is_valid_comment(text: str, *, min_len: int = 10) -> ValidationResult: + """Validate a comment and return detailed diagnostics. + + Checks for: + - Empty or whitespace-only text + - Below minimum length (*min_len*, default 10) + - All uppercase (shouting) + - Excessive repeated characters (e.g. "aaaaaaa") + - No alphabetic characters at all + """ + issues: list[CommentIssue] = [] + trimmed = (text or "").strip() + + if not trimmed: + issues.append(CommentIssue(code="empty", message="Comment is empty")) + return ValidationResult(valid=False, issues=issues) + + if len(trimmed) < min_len: + issues.append( + CommentIssue( + code="too_short", + message=f"Comment is {len(trimmed)} chars (minimum {min_len})", + ) + ) + + if len(trimmed) > 3 and trimmed == trimmed.upper() and re.search(r"[A-Z]", trimmed): + issues.append(CommentIssue(code="all_caps", message="Comment is all uppercase")) + + if re.search(r"(.)\1{4,}", trimmed, re.IGNORECASE): + issues.append( + CommentIssue( + code="excessive_repeats", + message="Comment contains excessive repeated characters", + ) + ) + + if not re.search(r"[a-zA-Z]", trimmed): + issues.append( + CommentIssue(code="no_alpha", message="Comment contains no alphabetic characters") + ) + + return ValidationResult(valid=len(issues) == 0, issues=issues)