Skip to content

Add rel=nofollow to robots.txt-blocked links on crawlable pages#103

Merged
mlissner merged 2 commits into
mainfrom
83-nofollow-robots-blocked-links-20260506
May 7, 2026
Merged

Add rel=nofollow to robots.txt-blocked links on crawlable pages#103
mlissner merged 2 commits into
mainfrom
83-nofollow-robots-blocked-links-20260506

Conversation

@mlissner
Copy link
Copy Markdown
Member

@mlissner mlissner commented May 7, 2026

Fixes

Fixes: #83 (the "Blocked by robots.txt" / "Not found" rows)

Summary

Google Search Console flags pages whose only inbound links are robots.txt-disallowed without rel=\"nofollow\". Audited the templates rendered to anonymous crawlers (root, /c/<path>/, 404) and found seven <a> tags that hit Disallow patterns without nofollow:

  • assets/templates/cotton/header.html — Sign in (/u/login/)
  • assets/templates/404.html — "Sign In to Find Out" (/u/login/)
  • pages/templates/pages/detail.html — Feedback button (/c/*/feedback/), "What links here" (/c/*/backlinks/), creator/admins/editors badges (/activity/<user>/)

All of these are action/activity URLs we don't want indexed regardless, so adding nofollow is the right fix (vs. removing the Disallow rules).

The directory-detail action links are all gated by {% if user.is_authenticated %} / {% if can_edit %}, so crawlers don't see them — no edit needed there.

Adjacent (not in this PR)

wiki/lib/markdown.py:594-597 deliberately skips action URLs in the user-content nofollow pass, on the assumption that robots.txt is enough. That's the same assumption Google is now pushing back on. Worth a follow-up if a user writes a markdown link like [edit me](/c/foo/edit/) in page content.

Deployment

This PR should:

  • skip-deploy (skips everything below)
    • skip-web-deploy
    • skip-daemon-deploy

Template-only change; no code paths or migrations.

🤖 Generated with Claude Code

mlissner and others added 2 commits May 6, 2026 21:30
The /u/ path is disallowed in robots.txt. Without nofollow, Google
treats the link as a discovery signal it can't follow, then reports
the page as "Blocked by robots.txt" or "Not found" in Search Console.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Feedback button (anonymous fallback for non-editors), the "What
links here" backlinks dropdown item, and the creator/admin/editor
badges all link to URLs disallowed in robots.txt (/c/*/feedback/,
/c/*/backlinks/, /activity/<user>/). Without nofollow, Google
discovers them as crawl candidates it can't follow, then flags the
parent page in Search Console.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mlissner mlissner marked this pull request as ready for review May 7, 2026 04:32
@mlissner mlissner merged commit f672ae3 into main May 7, 2026
9 checks passed
@mlissner mlissner deleted the 83-nofollow-robots-blocked-links-20260506 branch May 7, 2026 15:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Google not indexing for various reasons

1 participant