Skip to content

Fix malformed journal links with double http:// prefix#1357

Merged
robertatakenaka merged 3 commits intomainfrom
copilot/fix-scie-lo-generate-link
Feb 24, 2026
Merged

Fix malformed journal links with double http:// prefix#1357
robertatakenaka merged 3 commits intomainfrom
copilot/fix-scie-lo-generate-link

Conversation

Copy link
Contributor

Copilot AI commented Feb 24, 2026

collection.domain is a URLField storing full URLs (e.g., http://www.scielo.org.pe), but multiple code paths were blindly prepending http://, producing broken URLs like http://http//www.scielo.org.pe/scielo.php?....

O que esse PR faz?

Remove os prefixos http:// duplicados em todos os locais onde collection.domain é usado para construir URLs do scielo.php. Antes/depois:

# Before
f"http://{domain}/scielo.php?script=sci_serial&pid={issn_scielo}&lng=en"

# After
f"{domain.rstrip('/')}/scielo.php?script=sci_serial&pid={issn_scielo}&lng=en"

A mesma correção foi aplicada nos dois templates HTML (removido http:// antes de {{ journal.collection__domain }}).

Também foram adicionados testes unitários para cobrir os casos corrigidos.

Onde a revisão poderia começar?

core/home/views.py_get_scielo_journals_data(), depois os dois templates em core/templates/home/include/list_journal_page/, e os novos testes em core/home/tests.py e journal/tests.py.

Como este poderia ser testado manualmente?

  1. Navegar até a página de listagem alfabética de periódicos (e.g., /pt-br/listar-por-ordem-alfabética/)
  2. Buscar por qualquer termo e inspecionar os links gerados
  3. Confirmar que as URLs estão bem formadas (e.g., http://www.scielo.org.pe/scielo.php?...) sem duplo prefixo http://

Algum cenário de contexto que queira dar?

A causa raiz é que domain é um URLField do Django, que persiste a URL completa incluindo o esquema. Todos os quatro locais afetados (views.py, journals_table.html, journals_table_by_publisher.html, classic_website.py) assumiam que o campo continha apenas o hostname.

Screenshots

N/A — problema de renderização de URL, visível via inspetor do navegador ou ao passar o mouse sobre os links na lista de periódicos.

Quais são tickets relevantes?

Formação do link para os periódicos a partir de scielo.org está incorreta

Referências

Original prompt

This section details on the original issue you should resolve

<issue_title>Formação do link para os periódicos a partir de scielo.org está incorreta</issue_title>
<issue_description>### Descrição do problema

Se clicar aqui funciona
http://www.scielo.org.pe/scielo.php?script=sci_serial&pid=2709-3689&lng=es&nrm=iso

Se clicar aqui não funciona
https://www.scielo.org/pt-br/listar-por-ordem-alfab%C3%A9tica/?search_term=kaw

Observe que o link que abre está incorreto:

http://http//www.scielo.org.pe/scielo.php?script=sci_serial&pid=2709-3689&lng=en</issue_description>

Comments on the Issue (you are @copilot in this section)


🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.

The collection.domain field is a URLField storing full URLs with
protocol (e.g., http://www.scielo.org.pe). Code was prepending
'http://' again, creating malformed URLs like
'http://http://www.scielo.org.pe/scielo.php?...'.

Fixed in:
- core/home/views.py: use domain directly without prepending http://
- journals_table.html: remove hardcoded http:// from href
- journals_table_by_publisher.html: remove hardcoded http:// from href
- journal/sources/classic_website.py: use collection directly without
  prepending http://

Co-authored-by: robertatakenaka <505143+robertatakenaka@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix incorrect link generation for scielo.org journals Fix malformed journal links with double http:// prefix Feb 24, 2026
@robertatakenaka robertatakenaka marked this pull request as ready for review February 24, 2026 13:43
Copilot AI review requested due to automatic review settings February 24, 2026 13:43
@robertatakenaka
Copy link
Member

@copilot adicione testes unitários

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to fix malformed SciELO journal links caused by blindly prepending http:// to collection.domain, which can already include a scheme, resulting in URLs like http://http//....

Changes:

  • Updated URL construction in core/home/views.py to stop hardcoding http:// and strip trailing /.
  • Updated classic website XML fetch URLs to stop hardcoding http:// and strip trailing /.
  • Updated two journal list templates to stop hardcoding http:// in generated links.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File Description
core/home/views.py Adjusts generated scielo_url to avoid double-scheme URLs.
journal/sources/classic_website.py Adjusts scielo.php request URLs to avoid double-scheme URLs.
core/templates/home/include/list_journal_page/journals_table.html Removes hardcoded http:// prefix in journal link href.
core/templates/home/include/list_journal_page/journals_table_by_publisher.html Removes hardcoded http:// prefix in journal link href.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 28 to 30
{% endif %}
<a target="_blank" href="http://{{ journal.collection__domain }}/scielo.php?script=sci_serial&pid={{journal.issn_scielo}}&lng=en">{{ journal.journal__title }}</a><span style="color: #888; font-size: 0.9em; opacity: 0.7; padding: 4px 8px;">{{journal.collection__main_name}}</span>
<a target="_blank" href="{{ journal.collection__domain }}/scielo.php?script=sci_serial&pid={{journal.issn_scielo}}&lng=en">{{ journal.journal__title }}</a><span style="color: #888; font-size: 0.9em; opacity: 0.7; padding: 4px 8px;">{{journal.collection__main_name}}</span>
</th>
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumes journal.collection__domain already includes a scheme. In this codebase it can be stored as host-only (e.g. www.scielo.br in tests/fixtures), in which case this href becomes a relative URL and breaks. Consider adding a conditional to prefix http:// only when the domain lacks ://, or (preferably) pass a pre-normalized scielo_url from the view/context.

Copilot uses AI. Check for mistakes.
<span class="material-icons align-middle fs-5" style="color: #c63800;">fiber_manual_record</span>
{% endif %}
<a target="_blank" href="http://{{ scielojournal.collection.domain }}/scielo.php?script=sci_serial&pid={{ scielojournal.issn_scielo }}&lng=en">{{ scielojournal.journal.title }}</a><span style="color: #888; font-size: 0.9em; opacity: 0.7; padding: 4px 8px;">{{ scielojournal.collection.main_name }}</span>
<a target="_blank" href="{{ scielojournal.collection.domain }}/scielo.php?script=sci_serial&pid={{ scielojournal.issn_scielo }}&lng=en">{{ scielojournal.journal.title }}</a><span style="color: #888; font-size: 0.9em; opacity: 0.7; padding: 4px 8px;">{{ scielojournal.collection.main_name }}</span>
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue here: scielojournal.collection.domain may be stored without http(s):// (host-only), so this href can become relative/invalid. Add logic to prepend a scheme only when missing (e.g. check for ://), or build and provide a normalized URL from Python to keep templates simple and consistent.

Suggested change
<a target="_blank" href="{{ scielojournal.collection.domain }}/scielo.php?script=sci_serial&pid={{ scielojournal.issn_scielo }}&lng=en">{{ scielojournal.journal.title }}</a><span style="color: #888; font-size: 0.9em; opacity: 0.7; padding: 4px 8px;">{{ scielojournal.collection.main_name }}</span>
{% if "://" in scielojournal.collection.domain %}
<a target="_blank" href="{{ scielojournal.collection.domain }}/scielo.php?script=sci_serial&pid={{ scielojournal.issn_scielo }}&lng=en">{{ scielojournal.journal.title }}</a>
{% else %}
<a target="_blank" href="https://{{ scielojournal.collection.domain }}/scielo.php?script=sci_serial&pid={{ scielojournal.issn_scielo }}&lng=en">{{ scielojournal.journal.title }}</a>
{% endif %}
<span style="color: #888; font-size: 0.9em; opacity: 0.7; padding: 4px 8px;">{{ scielojournal.collection.main_name }}</span>

Copilot uses AI. Check for mistakes.
Comment on lines 21 to 23
collections = requests.get(
f"http://{collection}/scielo.php?script=sci_alphabetic&lng=es&nrm=iso&debug=xml",
f"{collection.rstrip('/')}/scielo.php?script=sci_alphabetic&lng=es&nrm=iso&debug=xml",
timeout=10,
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

collection can be stored without a URL scheme (e.g. tests/fixtures use www.scielo.br). With the new f-string, requests.get() will receive a URL like www.scielo.br/scielo.php?... (missing scheme) and fail. Consider normalizing before building the URL: strip trailing / and prepend a scheme only when one is missing (e.g., if "://" not in collection: collection = "http://" + collection). Also guard against None values (e.g., collection = (collection or "")).

Copilot uses AI. Check for mistakes.
Comment on lines 54 to 56
official_journal = requests.get(
f"http://{collection}/scielo.php?script=sci_serial&pid={issn}&lng=es&nrm=iso&debug=xml",
f"{collection.rstrip('/')}/scielo.php?script=sci_serial&pid={issn}&lng=es&nrm=iso&debug=xml",
timeout=10,
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as above: if collection lacks a scheme (common in this codebase, e.g. Collection(domain="www.scielo.br") in tests), this will generate a URL without http(s):// and requests.get() will fail. Normalize collection once (strip trailing /, add scheme if missing) before constructing the request URL, and handle None safely.

Copilot uses AI. Check for mistakes.
Comment on lines 77 to 79
scielo_url = (
f"http://{domain}/scielo.php?script=sci_serial&pid={issn_scielo}&lng=en"
f"{domain.rstrip('/')}/scielo.php?script=sci_serial&pid={issn_scielo}&lng=en"
)
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

collection__domain is not consistently stored with a URL scheme in this repo (tests/fixtures use values like www.scielo.br). After this change, if domain has no scheme the generated URL becomes www.scielo.br/scielo.php?... (relative/invalid in many contexts). Also, if domain is None, calling .rstrip('/') will raise. Suggest normalizing safely: domain = (domain or "").rstrip("/") and prepend a scheme only when missing (e.g. if domain and "://" not in domain: domain = "http://" + domain).

Copilot uses AI. Check for mistakes.
- core/home/tests.py: Tests for _get_scielo_journals_data() verifying
  that domain URLs with http:// prefix are not double-prefixed, that
  trailing slashes are stripped, and that https:// domains work
- journal/tests.py: Tests for classic_website.get_issn() and
  get_journal_xml() verifying URLs are built correctly when the domain
  already contains http:// or https://

Co-authored-by: robertatakenaka <505143+robertatakenaka@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Feb 24, 2026

@copilot adicione testes unitários

Added unit tests in commit 1da37de:

  • core/home/tests.py (TestGetScieloJournalsData): 4 tests for _get_scielo_journals_data() — verifies no double http:// prefix, correct URL format, trailing slash stripping, and https:// domains.
  • journal/tests.py (TestClassicWebsiteGetIssn, TestClassicWebsiteGetJournalXml): 6 tests for get_issn() and get_journal_xml() — verifies the HTTP request URLs are well-formed when the collection domain already includes the protocol (http:// or https://).

@robertatakenaka robertatakenaka merged commit 6d5da52 into main Feb 24, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Formação do link para os periódicos a partir de scielo.org está incorreta

3 participants