Skip to content

Corrige coleta journal logo#1365

Merged
robertatakenaka merged 10 commits intoscieloorg:mainfrom
robertatakenaka:corrige_coleta_journal_logo
Feb 27, 2026
Merged

Corrige coleta journal logo#1365
robertatakenaka merged 10 commits intoscieloorg:mainfrom
robertatakenaka:corrige_coleta_journal_logo

Conversation

@robertatakenaka
Copy link
Member

O que esse PR faz?

Corrige o download e armazenamento do logo de periódicos e a geração da URL pública na API. Problemas resolvidos:

  • Arquivos baixados tinham extensão .gif forçada, causando erro de validação no Wagtail quando o arquivo era JPEG. Agora o formato real é detectado via Pillow antes de salvar.
  • A URL do logo na API era construída sem considerar protocolo e porta corretamente. Agora usa request.build_absolute_uri e site.root_url como fallback.
  • A classe intermediária JournalLogo foi removida em favor de uma relação direta Journal.logo → wagtailimages.Image.
  • URL do admin de coleções atualizada de ModelAdmin para SnippetViewSet.

Onde a revisão poderia começar?

journal/models.py — remoção de JournalLogo e adição de get_url_logo em Journal. Em seguida journal/tasks.py para a detecção de formato via Pillow, e journal/api/v1/serializers.py para a geração da URL pública.

Como este poderia ser testado manualmente?

  1. Execute a task de importação de logo para um periódico:
>>> from journal.tasks import fetch_and_process_journal_logo
>>> fetch_and_process_journal_logo(journal_id=<id>)
  1. Verifique no admin do Wagtail (/cms/images/) se a imagem foi salva com a extensão correta.
  2. Confirme que o campo url_logo no endpoint GET /api/v1/journals/ retorna uma URL absoluta e acessível.

Algum cenário de contexto que queira dar?

O sistema legado armazena logos com extensão .gif, mas muitos são JPEG renomeados. O Wagtail valida o formato real via Pillow no upload e rejeita arquivos com extensão incorreta. A solução inspeciona o conteúdo em memória (BytesIO) antes de nomear o arquivo.

Screenshots

N/A — alterações de backend.

Quais são tickets relevantes?

Informar a issue relacionada.

Referências

Copilot AI review requested due to automatic review settings February 26, 2026 23:19
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes journal logo download and storage by addressing file extension mismatches, improving URL construction, and simplifying the data model. The changes remove the intermediate JournalLogo model in favor of a direct relationship between Journal and Wagtail's Image model.

Changes:

  • Implemented format detection using Pillow to determine actual image format instead of forcing .gif extension
  • Updated URL generation in the API to use request.build_absolute_uri() for proper protocol and domain handling
  • Removed intermediate JournalLogo model and established direct Journal.logo → wagtailimages.Image relationship

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
journal/tasks.py Added BytesIO-based image format detection with Pillow; removed JournalLogo model usage; simplified domain handling in _build_logo_url
journal/models.py Removed JournalLogo class; added get_url_logo() method to Journal model
journal/api/v1/serializers.py Updated get_url_logo() to use request context for building absolute URIs with proper fallback
journal/sources/am_to_core.py Refactored update_logo() to use Journal.get_url_logo() instead of JournalLogo
journal/tests.py Commented out JournalLogo-related tests
core/templates/wagtailadmin/summary_items/collection_summary_item.html Updated admin URL from ModelAdmin to SnippetViewSet pattern

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

journal/tasks.py Outdated
except Exception:
pass

logo_fiename = f"{journal_acron}_logo{correct_ext}"
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in variable name: 'logo_fiename' should be 'logo_filename'.

Copilot uses AI. Check for mistakes.
journal/tasks.py Outdated
title=journal_acron,
defaults={
"file": ContentFile(response, name=f"{journal_acron}_glogo.gif"),
"file": ContentFile(response, name=logo_fiename),
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The BytesIO object 'img_bytes' is created and used to detect the image format, but then the original 'response' bytes are passed to ContentFile instead of the BytesIO content. After Pillow reads from BytesIO and it's seeked back to 0, you should use 'img_bytes.read()' or 'img_bytes.getvalue()' in ContentFile, not 'response'. Otherwise, the seek(0) operation serves no purpose and the code creates unnecessary confusion about which byte stream is being used.

Suggested change
"file": ContentFile(response, name=logo_fiename),
"file": ContentFile(img_bytes.getvalue(), name=logo_fiename),

Copilot uses AI. Check for mistakes.
Comment on lines 808 to 810
site = Site.objects.get(is_default_site=True)
# Garantir que o root_url esteja devidamente informado
return f"{site.root_url}{rendition.url}"
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The call to Site.objects.get(is_default_site=True) can raise Site.DoesNotExist if no default site is configured. This should be wrapped in a try-except block or use .filter().first() to handle cases where the default site might not exist, returning None gracefully instead of crashing.

Copilot uses AI. Check for mistakes.
Comment on lines 187 to 192
img_wagtail, created = Image.objects.get_or_create(
title=journal_acron,
defaults={
"file": ContentFile(response, name=f"{journal_acron}_glogo.gif"),
"file": ContentFile(response, name=logo_fiename),
},
)
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The get_or_create logic using 'title' as the lookup field means that if an Image with that journal_acron title already exists, it won't be updated even if the logo file has changed. The 'defaults' parameter only applies when creating a new record. If logos can change over time, this should either use update_or_create, or delete the old image before creating a new one, or use a different unique identifier that includes versioning information.

Copilot uses AI. Check for mistakes.
Comment on lines 124 to 130
domain = collection.domain
collection_acron3 = collection.acron3

if collection_acron3 == "scl":
return f"https://{domain}/media/images/{journal_acron}_glogo.gif"
return f"{domain}/media/images/{journal_acron}_glogo.gif"
else:
return f"http://{domain}/img/revistas/{journal_acron}/glogo.gif"
return f"{domain}/img/revistas/{journal_acron}/glogo.gif"
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code directly uses 'collection.domain' without checking if it's None. Since the Collection model defines domain as 'null=True', this could result in constructing URLs like "None/media/images/..." if a collection has no domain set. Consider adding a null check: 'if not collection.domain: return None' before using it.

Copilot uses AI. Check for mistakes.
Comment on lines +276 to 299
# """Test that ValueError is raised when collection does not exist"""
# with self.assertRaises(ValueError) as context:
# fetch_and_process_journal_logos_in_collection(
# collection_acron3="invalid_acron",
# username=self.user.username,
# )
# self.assertIn("does not exist", str(context.exception))

# @patch("journal.tasks.group")
# def test_fetch_and_process_journal_logos_with_valid_collection(self, mock_group):
# """Test that task executes with valid collection"""
# # Mock the group to avoid actually running the celery tasks
# mock_group.return_value.return_value = None

# result = fetch_and_process_journal_logos_in_collection(
# collection_acron3=self.collection.acron3,
# username=self.user.username,
# )

# # The task should complete without raising an exception
# # and should call group with the task signatures
# self.assertTrue(mock_group.called)


Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests for journal logo functionality were commented out rather than updated to reflect the model changes. Commented-out tests should either be removed entirely or updated to work with the new direct relationship between Journal and wagtailimages.Image. Leaving large blocks of commented code reduces maintainability and creates confusion about whether this functionality is tested.

Suggested change
# """Test that ValueError is raised when collection does not exist"""
# with self.assertRaises(ValueError) as context:
# fetch_and_process_journal_logos_in_collection(
# collection_acron3="invalid_acron",
# username=self.user.username,
# )
# self.assertIn("does not exist", str(context.exception))
# @patch("journal.tasks.group")
# def test_fetch_and_process_journal_logos_with_valid_collection(self, mock_group):
# """Test that task executes with valid collection"""
# # Mock the group to avoid actually running the celery tasks
# mock_group.return_value.return_value = None
# result = fetch_and_process_journal_logos_in_collection(
# collection_acron3=self.collection.acron3,
# username=self.user.username,
# )
# # The task should complete without raising an exception
# # and should call group with the task signatures
# self.assertTrue(mock_group.called)

Copilot uses AI. Check for mistakes.
Comment on lines +198 to +202
try:
request = self.context.get('request')
return obj.get_url_logo(request.build_absolute_uri('/') if request else None)
except Exception:
return obj.get_url_logo()
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bare 'except Exception' catches all exceptions silently without logging them. If an error occurs while building the URL (e.g., Site.DoesNotExist from get_url_logo), it will be silently swallowed and the fallback will be attempted. Consider logging the exception before falling back, or catching more specific exceptions. This will help with debugging if the URL generation fails.

Copilot uses AI. Check for mistakes.
Comment on lines 123 to 130
# collection.domain contém https:// ou http://
domain = collection.domain
collection_acron3 = collection.acron3

if collection_acron3 == "scl":
return f"https://{domain}/media/images/{journal_acron}_glogo.gif"
return f"{domain}/media/images/{journal_acron}_glogo.gif"
else:
return f"http://{domain}/img/revistas/{journal_acron}/glogo.gif"
return f"{domain}/img/revistas/{journal_acron}/glogo.gif"
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment states "collection.domain contém https:// ou http://" (contains https:// or http://), but the Collection model uses URLField which doesn't enforce this, and test data shows domains stored without protocol (e.g., "www.scielo.br" in journal/tests.py:190,196). This inconsistency could lead to malformed URLs like "www.scielo.br/media/images/..." instead of "http://www.scielo.br/media/images/...". Consider either: 1) Adding validation to ensure domain always includes protocol, 2) Using urlparse to normalize the domain, or 3) Explicitly prepending the protocol based on collection type.

Copilot uses AI. Check for mistakes.
Comment on lines +140 to +146
EXT_MAP = {
'JPEG': '.jpg',
'GIF': '.gif',
'PNG': '.png',
'WEBP': '.webp',
'ICO': '.ico',
}
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The EXT_MAP dictionary is defined inside the function and recreated on every function call. Since this is a Celery task that could be called many times, consider moving this constant to module level (above the function) to avoid unnecessary recreation. This is a minor performance optimization.

Copilot uses AI. Check for mistakes.
Comment on lines 18 to 23
AMJournal,
Journal,
JournalLicense,
JournalLogo,
SciELOJournal,
)
from journal.sources import classic_website
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing database migration: The JournalLogo model was removed from models.py but no migration file was included in this PR to drop the corresponding database table. This will cause the database schema to be out of sync with the models. Run 'python manage.py makemigrations' to generate the migration that removes the JournalLogo model and its database table.

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 11 comments.

Comments suppressed due to low confidence (1)

journal/tasks.py:216

  • In the exception handler, the code references journal.title, url_logo, and collection.domain, but if the exception occurs early in the function (e.g., when fetching the journal object on line 156-164 or when scielo_journal is None on line 165), these variables may not be defined, leading to additional exceptions that mask the original error. Consider adding checks for variable existence or using getattr with defaults in the exception handler.
    except Exception as e:
        exc_type, exc_value, exc_traceback = sys.exc_info()
        UnexpectedEvent.create(
            exception=e,
            exc_traceback=exc_traceback,
            action="journal.tasks.fetch_and_process_journal_logo",
            detail={
                "function": "journal.tasks.fetch_and_process_journal_logo",
                "journal_title": journal.title,
                "url_logo": url_logo,
                "domain": collection.domain,
            },
        )

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1 to +16
# Generated by Django 5.2.7 on 2026-02-26 21:01

from django.db import migrations


class Migration(migrations.Migration):

dependencies = [
("journal", "0055_journalproxyadminonly_and_more"),
]

operations = [
migrations.DeleteModel(
name="JournalLogo",
),
]
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The migration deletes the JournalLogo model without migrating existing data. If there are existing JournalLogo records in the database that link journals to their logo images, this data will be lost. Before applying this migration in production, ensure that:

  1. All existing JournalLogo relationships have been migrated to the direct Journal.logo field
  2. A data migration script has been run to copy logo references from JournalLogo to Journal.logo
  3. Or confirm that no production data exists in the JournalLogo table

Consider adding a data migration step before the DeleteModel operation to preserve any existing logo associations.

Copilot uses AI. Check for mistakes.
)
if not journal:
return None

Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's trailing whitespace on this line (after return None). This should be removed to maintain code cleanliness and follow PEP 8 style guidelines.

Suggested change

Copilot uses AI. Check for mistakes.
Comment on lines +801 to +806
def get_url_logo(self, root_url=None):
if not self.logo:
return None
rendition = self.logo.get_rendition('original')
if root_url:
return f"{root_url}{rendition.url}"
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When root_url is provided and contains a trailing slash (as request.build_absolute_uri('/') does), and rendition.url starts with a slash (which is typical for Django media URLs), this will result in a double slash in the URL (e.g., http://example.com//media/...). Consider stripping trailing slashes from root_url before concatenation, or use urljoin for proper URL construction.

Copilot uses AI. Check for mistakes.
Comment on lines 197 to +202
def get_url_logo(self, obj):
if obj.logo:
domain = Site.objects.get(is_default_site=True).hostname
domain = f"http://{domain}"
return f"{domain}{obj.logo.file.url}"
return None
try:
request = self.context.get('request')
return obj.get_url_logo(request.build_absolute_uri('/') if request else None)
except Exception:
return obj.get_url_logo()
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exception handler catches all exceptions but doesn't log them, making debugging difficult if an error occurs. Consider logging the exception before falling back to calling obj.get_url_logo() without arguments, so that issues (like incorrect root_url construction) can be diagnosed.

Copilot uses AI. Check for mistakes.
@@ -1,6 +1,8 @@
import logging
import sys
from io import BytesIO
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The from io import BytesIO import is added but there's already a blank line before line 3. The import should follow PEP 8 conventions by grouping standard library imports together. Consider placing this import immediately after import sys on line 2 without the extra blank line, maintaining consistency with the rest of the file's import style.

Copilot uses AI. Check for mistakes.
with PilImage.open(img_bytes) as pil_img:
real_format = pil_img.format # 'JPEG', 'GIF', etc
correct_ext = EXT_MAP.get(real_format, '.jpg')

Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's trailing whitespace on this line (after the closing brace on line 182 or after the except clause). This should be removed to maintain code cleanliness and follow PEP 8 style guidelines.

Copilot uses AI. Check for mistakes.
img_bytes.seek(0) # reset após leitura do Pillow
except Exception:
pass

Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's trailing whitespace on this line. This should be removed to maintain code cleanliness and follow PEP 8 style guidelines.

Copilot uses AI. Check for mistakes.
.filter(collection__is_active=True)
.values_list("collection__acron3", flat=True)
)

Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's trailing whitespace on this line (after the closing parenthesis). This should be removed to maintain code cleanliness and follow PEP 8 style guidelines.

Suggested change

Copilot uses AI. Check for mistakes.
Comment on lines +131 to 133
if not domain:
logger.warning(f"Collection {collection.acron3} has no domain defined")
return None
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a redundant check for collection.domain on lines 124 and 131-132. The second check (lines 131-133) is unnecessary since the first check (lines 124-126) would have already returned None if the domain was empty. Consider removing the duplicate check.

Suggested change
if not domain:
logger.warning(f"Collection {collection.acron3} has no domain defined")
return None

Copilot uses AI. Check for mistakes.
Comment on lines 195 to 200
img_wagtail, created = Image.objects.get_or_create(
title=journal_acron,
defaults={
"file": ContentFile(response, name=f"{journal_acron}_glogo.gif"),
"file": ContentFile(img_bytes.read(), name=logo_filename),
},
)
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Image.objects.get_or_create() call uses only title as the lookup field, which means if an image with this journal_acron already exists, it will be reused without updating its file. This could be problematic if:

  1. The logo image changes but keeps the same journal acronym
  2. An old logo has the wrong extension (e.g., .gif) and you want to replace it with the corrected extension

Consider using update_or_create() instead, or add logic to delete/update the existing image when needed, especially since this PR is specifically about fixing incorrect extensions on existing logos.

Copilot uses AI. Check for mistakes.
@robertatakenaka robertatakenaka merged commit 75a3c4b into scieloorg:main Feb 27, 2026
7 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants