Fix malformed journal links with double http:// prefix#1357
Fix malformed journal links with double http:// prefix#1357robertatakenaka merged 3 commits intomainfrom
Conversation
The collection.domain field is a URLField storing full URLs with protocol (e.g., http://www.scielo.org.pe). Code was prepending 'http://' again, creating malformed URLs like 'http://http://www.scielo.org.pe/scielo.php?...'. Fixed in: - core/home/views.py: use domain directly without prepending http:// - journals_table.html: remove hardcoded http:// from href - journals_table_by_publisher.html: remove hardcoded http:// from href - journal/sources/classic_website.py: use collection directly without prepending http:// Co-authored-by: robertatakenaka <505143+robertatakenaka@users.noreply.github.com>
|
@copilot adicione testes unitários |
There was a problem hiding this comment.
Pull request overview
This PR aims to fix malformed SciELO journal links caused by blindly prepending http:// to collection.domain, which can already include a scheme, resulting in URLs like http://http//....
Changes:
- Updated URL construction in
core/home/views.pyto stop hardcodinghttp://and strip trailing/. - Updated classic website XML fetch URLs to stop hardcoding
http://and strip trailing/. - Updated two journal list templates to stop hardcoding
http://in generated links.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
core/home/views.py |
Adjusts generated scielo_url to avoid double-scheme URLs. |
journal/sources/classic_website.py |
Adjusts scielo.php request URLs to avoid double-scheme URLs. |
core/templates/home/include/list_journal_page/journals_table.html |
Removes hardcoded http:// prefix in journal link href. |
core/templates/home/include/list_journal_page/journals_table_by_publisher.html |
Removes hardcoded http:// prefix in journal link href. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| {% endif %} | ||
| <a target="_blank" href="http://{{ journal.collection__domain }}/scielo.php?script=sci_serial&pid={{journal.issn_scielo}}&lng=en">{{ journal.journal__title }}</a><span style="color: #888; font-size: 0.9em; opacity: 0.7; padding: 4px 8px;">{{journal.collection__main_name}}</span> | ||
| <a target="_blank" href="{{ journal.collection__domain }}/scielo.php?script=sci_serial&pid={{journal.issn_scielo}}&lng=en">{{ journal.journal__title }}</a><span style="color: #888; font-size: 0.9em; opacity: 0.7; padding: 4px 8px;">{{journal.collection__main_name}}</span> | ||
| </th> |
There was a problem hiding this comment.
This assumes journal.collection__domain already includes a scheme. In this codebase it can be stored as host-only (e.g. www.scielo.br in tests/fixtures), in which case this href becomes a relative URL and breaks. Consider adding a conditional to prefix http:// only when the domain lacks ://, or (preferably) pass a pre-normalized scielo_url from the view/context.
| <span class="material-icons align-middle fs-5" style="color: #c63800;">fiber_manual_record</span> | ||
| {% endif %} | ||
| <a target="_blank" href="http://{{ scielojournal.collection.domain }}/scielo.php?script=sci_serial&pid={{ scielojournal.issn_scielo }}&lng=en">{{ scielojournal.journal.title }}</a><span style="color: #888; font-size: 0.9em; opacity: 0.7; padding: 4px 8px;">{{ scielojournal.collection.main_name }}</span> | ||
| <a target="_blank" href="{{ scielojournal.collection.domain }}/scielo.php?script=sci_serial&pid={{ scielojournal.issn_scielo }}&lng=en">{{ scielojournal.journal.title }}</a><span style="color: #888; font-size: 0.9em; opacity: 0.7; padding: 4px 8px;">{{ scielojournal.collection.main_name }}</span> |
There was a problem hiding this comment.
Same issue here: scielojournal.collection.domain may be stored without http(s):// (host-only), so this href can become relative/invalid. Add logic to prepend a scheme only when missing (e.g. check for ://), or build and provide a normalized URL from Python to keep templates simple and consistent.
| <a target="_blank" href="{{ scielojournal.collection.domain }}/scielo.php?script=sci_serial&pid={{ scielojournal.issn_scielo }}&lng=en">{{ scielojournal.journal.title }}</a><span style="color: #888; font-size: 0.9em; opacity: 0.7; padding: 4px 8px;">{{ scielojournal.collection.main_name }}</span> | |
| {% if "://" in scielojournal.collection.domain %} | |
| <a target="_blank" href="{{ scielojournal.collection.domain }}/scielo.php?script=sci_serial&pid={{ scielojournal.issn_scielo }}&lng=en">{{ scielojournal.journal.title }}</a> | |
| {% else %} | |
| <a target="_blank" href="https://{{ scielojournal.collection.domain }}/scielo.php?script=sci_serial&pid={{ scielojournal.issn_scielo }}&lng=en">{{ scielojournal.journal.title }}</a> | |
| {% endif %} | |
| <span style="color: #888; font-size: 0.9em; opacity: 0.7; padding: 4px 8px;">{{ scielojournal.collection.main_name }}</span> |
| collections = requests.get( | ||
| f"http://{collection}/scielo.php?script=sci_alphabetic&lng=es&nrm=iso&debug=xml", | ||
| f"{collection.rstrip('/')}/scielo.php?script=sci_alphabetic&lng=es&nrm=iso&debug=xml", | ||
| timeout=10, |
There was a problem hiding this comment.
collection can be stored without a URL scheme (e.g. tests/fixtures use www.scielo.br). With the new f-string, requests.get() will receive a URL like www.scielo.br/scielo.php?... (missing scheme) and fail. Consider normalizing before building the URL: strip trailing / and prepend a scheme only when one is missing (e.g., if "://" not in collection: collection = "http://" + collection). Also guard against None values (e.g., collection = (collection or "")).
| official_journal = requests.get( | ||
| f"http://{collection}/scielo.php?script=sci_serial&pid={issn}&lng=es&nrm=iso&debug=xml", | ||
| f"{collection.rstrip('/')}/scielo.php?script=sci_serial&pid={issn}&lng=es&nrm=iso&debug=xml", | ||
| timeout=10, |
There was a problem hiding this comment.
Same issue as above: if collection lacks a scheme (common in this codebase, e.g. Collection(domain="www.scielo.br") in tests), this will generate a URL without http(s):// and requests.get() will fail. Normalize collection once (strip trailing /, add scheme if missing) before constructing the request URL, and handle None safely.
| scielo_url = ( | ||
| f"http://{domain}/scielo.php?script=sci_serial&pid={issn_scielo}&lng=en" | ||
| f"{domain.rstrip('/')}/scielo.php?script=sci_serial&pid={issn_scielo}&lng=en" | ||
| ) |
There was a problem hiding this comment.
collection__domain is not consistently stored with a URL scheme in this repo (tests/fixtures use values like www.scielo.br). After this change, if domain has no scheme the generated URL becomes www.scielo.br/scielo.php?... (relative/invalid in many contexts). Also, if domain is None, calling .rstrip('/') will raise. Suggest normalizing safely: domain = (domain or "").rstrip("/") and prepend a scheme only when missing (e.g. if domain and "://" not in domain: domain = "http://" + domain).
- core/home/tests.py: Tests for _get_scielo_journals_data() verifying that domain URLs with http:// prefix are not double-prefixed, that trailing slashes are stripped, and that https:// domains work - journal/tests.py: Tests for classic_website.get_issn() and get_journal_xml() verifying URLs are built correctly when the domain already contains http:// or https:// Co-authored-by: robertatakenaka <505143+robertatakenaka@users.noreply.github.com>
Added unit tests in commit
|
collection.domainis aURLFieldstoring full URLs (e.g.,http://www.scielo.org.pe), but multiple code paths were blindly prependinghttp://, producing broken URLs likehttp://http//www.scielo.org.pe/scielo.php?....O que esse PR faz?
Remove os prefixos
http://duplicados em todos os locais ondecollection.domainé usado para construir URLs doscielo.php. Antes/depois:A mesma correção foi aplicada nos dois templates HTML (removido
http://antes de{{ journal.collection__domain }}).Também foram adicionados testes unitários para cobrir os casos corrigidos.
Onde a revisão poderia começar?
core/home/views.py→_get_scielo_journals_data(), depois os dois templates emcore/templates/home/include/list_journal_page/, e os novos testes emcore/home/tests.pyejournal/tests.py.Como este poderia ser testado manualmente?
/pt-br/listar-por-ordem-alfabética/)http://www.scielo.org.pe/scielo.php?...) sem duplo prefixohttp://Algum cenário de contexto que queira dar?
A causa raiz é que
domainé umURLFielddo Django, que persiste a URL completa incluindo o esquema. Todos os quatro locais afetados (views.py,journals_table.html,journals_table_by_publisher.html,classic_website.py) assumiam que o campo continha apenas o hostname.Screenshots
N/A — problema de renderização de URL, visível via inspetor do navegador ou ao passar o mouse sobre os links na lista de periódicos.
Quais são tickets relevantes?
Formação do link para os periódicos a partir de scielo.org está incorreta
Referências
URLFieldarmazena a URL completa incluindo o esquema: https://docs.djangoproject.com/en/stable/ref/models/fields/#urlfieldOriginal prompt
🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.