Corrige coleta journal logo by robertatakenaka · Pull Request #1365 · scieloorg/core

robertatakenaka · 2026-02-26T23:19:13Z

O que esse PR faz?

Corrige o download e armazenamento do logo de periódicos e a geração da URL pública na API. Problemas resolvidos:

Arquivos baixados tinham extensão .gif forçada, causando erro de validação no Wagtail quando o arquivo era JPEG. Agora o formato real é detectado via Pillow antes de salvar.
A URL do logo na API era construída sem considerar protocolo e porta corretamente. Agora usa request.build_absolute_uri e site.root_url como fallback.
A classe intermediária JournalLogo foi removida em favor de uma relação direta Journal.logo → wagtailimages.Image.
URL do admin de coleções atualizada de ModelAdmin para SnippetViewSet.

Onde a revisão poderia começar?

journal/models.py — remoção de JournalLogo e adição de get_url_logo em Journal. Em seguida journal/tasks.py para a detecção de formato via Pillow, e journal/api/v1/serializers.py para a geração da URL pública.

Como este poderia ser testado manualmente?

Execute a task de importação de logo para um periódico:

>>> from journal.tasks import fetch_and_process_journal_logo
>>> fetch_and_process_journal_logo(journal_id=<id>)

Verifique no admin do Wagtail (/cms/images/) se a imagem foi salva com a extensão correta.
Confirme que o campo url_logo no endpoint GET /api/v1/journals/ retorna uma URL absoluta e acessível.

Algum cenário de contexto que queira dar?

O sistema legado armazena logos com extensão .gif, mas muitos são JPEG renomeados. O Wagtail valida o formato real via Pillow no upload e rejeita arquivos com extensão incorreta. A solução inspeciona o conteúdo em memória (BytesIO) antes de nomear o arquivo.

Screenshots

N/A — alterações de backend.

Quais são tickets relevantes?

Informar a issue relacionada.

Referências

…ction_summary_item

…Logo

…e uso de JournalLogo

…iódico, remove JournalLogo

…L do logo na API

…para revisão futura

Copilot

Pull request overview

This PR fixes journal logo download and storage by addressing file extension mismatches, improving URL construction, and simplifying the data model. The changes remove the intermediate JournalLogo model in favor of a direct relationship between Journal and Wagtail's Image model.

Changes:

Implemented format detection using Pillow to determine actual image format instead of forcing .gif extension
Updated URL generation in the API to use request.build_absolute_uri() for proper protocol and domain handling
Removed intermediate JournalLogo model and established direct Journal.logo → wagtailimages.Image relationship

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
journal/tasks.py	Added BytesIO-based image format detection with Pillow; removed JournalLogo model usage; simplified domain handling in _build_logo_url
journal/models.py	Removed JournalLogo class; added get_url_logo() method to Journal model
journal/api/v1/serializers.py	Updated get_url_logo() to use request context for building absolute URIs with proper fallback
journal/sources/am_to_core.py	Refactored update_logo() to use Journal.get_url_logo() instead of JournalLogo
journal/tests.py	Commented out JournalLogo-related tests
core/templates/wagtailadmin/summary_items/collection_summary_item.html	Updated admin URL from ModelAdmin to SnippetViewSet pattern

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-26T23:24:35Z

journal/tasks.py

+            except Exception:
+                pass
+
+        logo_fiename = f"{journal_acron}_logo{correct_ext}"


Typo in variable name: 'logo_fiename' should be 'logo_filename'.

Copilot · 2026-02-26T23:24:35Z

journal/tasks.py

            title=journal_acron,
            defaults={
-                "file": ContentFile(response, name=f"{journal_acron}_glogo.gif"),
+                "file": ContentFile(response, name=logo_fiename),


The BytesIO object 'img_bytes' is created and used to detect the image format, but then the original 'response' bytes are passed to ContentFile instead of the BytesIO content. After Pillow reads from BytesIO and it's seeked back to 0, you should use 'img_bytes.read()' or 'img_bytes.getvalue()' in ContentFile, not 'response'. Otherwise, the seek(0) operation serves no purpose and the code creates unnecessary confusion about which byte stream is being used.

Suggested change

"file": ContentFile(response, name=logo_fiename),

"file": ContentFile(img_bytes.getvalue(), name=logo_fiename),

Copilot · 2026-02-26T23:24:36Z

journal/models.py

+        site = Site.objects.get(is_default_site=True)
+        # Garantir que o root_url esteja devidamente informado
+        return f"{site.root_url}{rendition.url}"


The call to Site.objects.get(is_default_site=True) can raise Site.DoesNotExist if no default site is configured. This should be wrapped in a try-except block or use .filter().first() to handle cases where the default site might not exist, returning None gracefully instead of crashing.

Copilot · 2026-02-26T23:24:36Z

journal/tasks.py

        img_wagtail, created = Image.objects.get_or_create(
            title=journal_acron,
            defaults={
-                "file": ContentFile(response, name=f"{journal_acron}_glogo.gif"),
+                "file": ContentFile(response, name=logo_fiename),
            },
        )


The get_or_create logic using 'title' as the lookup field means that if an Image with that journal_acron title already exists, it won't be updated even if the logo file has changed. The 'defaults' parameter only applies when creating a new record. If logos can change over time, this should either use update_or_create, or delete the old image before creating a new one, or use a different unique identifier that includes versioning information.

Copilot · 2026-02-26T23:24:36Z

journal/tasks.py

+    domain = collection.domain
    collection_acron3 = collection.acron3

    if collection_acron3 == "scl":
-        return f"https://{domain}/media/images/{journal_acron}_glogo.gif"
+        return f"{domain}/media/images/{journal_acron}_glogo.gif"
    else:
-        return f"http://{domain}/img/revistas/{journal_acron}/glogo.gif"
+        return f"{domain}/img/revistas/{journal_acron}/glogo.gif"


The code directly uses 'collection.domain' without checking if it's None. Since the Collection model defines domain as 'null=True', this could result in constructing URLs like "None/media/images/..." if a collection has no domain set. Consider adding a null check: 'if not collection.domain: return None' before using it.

Copilot · 2026-02-26T23:24:36Z

journal/tests.py

+#         """Test that ValueError is raised when collection does not exist"""
+#         with self.assertRaises(ValueError) as context:
+#             fetch_and_process_journal_logos_in_collection(
+#                 collection_acron3="invalid_acron",
+#                 username=self.user.username,
+#             )
+#         self.assertIn("does not exist", str(context.exception))
+
+#     @patch("journal.tasks.group")
+#     def test_fetch_and_process_journal_logos_with_valid_collection(self, mock_group):
+#         """Test that task executes with valid collection"""
+#         # Mock the group to avoid actually running the celery tasks
+#         mock_group.return_value.return_value = None
+
+#         result = fetch_and_process_journal_logos_in_collection(
+#             collection_acron3=self.collection.acron3,
+#             username=self.user.username,
+#         )
+
+#         # The task should complete without raising an exception
+#         # and should call group with the task signatures
+#         self.assertTrue(mock_group.called)




Tests for journal logo functionality were commented out rather than updated to reflect the model changes. Commented-out tests should either be removed entirely or updated to work with the new direct relationship between Journal and wagtailimages.Image. Leaving large blocks of commented code reduces maintainability and creates confusion about whether this functionality is tested.

Suggested change

# """Test that ValueError is raised when collection does not exist"""

# with self.assertRaises(ValueError) as context:

# fetch_and_process_journal_logos_in_collection(

# collection_acron3="invalid_acron",

# username=self.user.username,

# )

# self.assertIn("does not exist", str(context.exception))

# @patch("journal.tasks.group")

# def test_fetch_and_process_journal_logos_with_valid_collection(self, mock_group):

# """Test that task executes with valid collection"""

# # Mock the group to avoid actually running the celery tasks

# mock_group.return_value.return_value = None

# result = fetch_and_process_journal_logos_in_collection(

# collection_acron3=self.collection.acron3,

# username=self.user.username,

# )

# # The task should complete without raising an exception

# # and should call group with the task signatures

# self.assertTrue(mock_group.called)

Copilot · 2026-02-26T23:24:36Z

journal/api/v1/serializers.py

+        try:
+            request = self.context.get('request')
+            return obj.get_url_logo(request.build_absolute_uri('/') if request else None)
+        except Exception:
+            return obj.get_url_logo()


The bare 'except Exception' catches all exceptions silently without logging them. If an error occurs while building the URL (e.g., Site.DoesNotExist from get_url_logo), it will be silently swallowed and the fallback will be attempted. Consider logging the exception before falling back, or catching more specific exceptions. This will help with debugging if the URL generation fails.

Copilot · 2026-02-26T23:24:37Z

journal/tasks.py

+    # collection.domain contém https:// ou http://
+    domain = collection.domain
    collection_acron3 = collection.acron3

    if collection_acron3 == "scl":
-        return f"https://{domain}/media/images/{journal_acron}_glogo.gif"
+        return f"{domain}/media/images/{journal_acron}_glogo.gif"
    else:
-        return f"http://{domain}/img/revistas/{journal_acron}/glogo.gif"
+        return f"{domain}/img/revistas/{journal_acron}/glogo.gif"


The comment states "collection.domain contém https:// ou http://" (contains https:// or http://), but the Collection model uses URLField which doesn't enforce this, and test data shows domains stored without protocol (e.g., "www.scielo.br" in journal/tests.py:190,196). This inconsistency could lead to malformed URLs like "www.scielo.br/media/images/..." instead of "http://www.scielo.br/media/images/...". Consider either: 1) Adding validation to ensure domain always includes protocol, 2) Using urlparse to normalize the domain, or 3) Explicitly prepending the protocol based on collection type.

Copilot · 2026-02-26T23:24:37Z

journal/tasks.py

+    EXT_MAP = {
+        'JPEG': '.jpg',
+        'GIF': '.gif',
+        'PNG': '.png',
+        'WEBP': '.webp',
+        'ICO': '.ico',
+    }


The EXT_MAP dictionary is defined inside the function and recreated on every function call. Since this is a Celery task that could be called many times, consider moving this constant to module level (above the function) to avoid unnecessary recreation. This is a minor performance optimization.

Copilot · 2026-02-26T23:24:37Z

journal/tasks.py

    AMJournal,
    Journal,
    JournalLicense,
-    JournalLogo,
    SciELOJournal,
 )
 from journal.sources import classic_website


Missing database migration: The JournalLogo model was removed from models.py but no migration file was included in this PR to drop the corresponding database table. This will cause the database schema to be out of sync with the models. Run 'python manage.py makemigrations' to generate the migration that removes the JournalLogo model and its database table.

…on url

… processing

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 11 comments.

Comments suppressed due to low confidence (1)

journal/tasks.py:216

In the exception handler, the code references journal.title, url_logo, and collection.domain, but if the exception occurs early in the function (e.g., when fetching the journal object on line 156-164 or when scielo_journal is None on line 165), these variables may not be defined, leading to additional exceptions that mask the original error. Consider adding checks for variable existence or using getattr with defaults in the exception handler.

    except Exception as e:
        exc_type, exc_value, exc_traceback = sys.exc_info()
        UnexpectedEvent.create(
            exception=e,
            exc_traceback=exc_traceback,
            action="journal.tasks.fetch_and_process_journal_logo",
            detail={
                "function": "journal.tasks.fetch_and_process_journal_logo",
                "journal_title": journal.title,
                "url_logo": url_logo,
                "domain": collection.domain,
            },
        )

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-27T11:26:55Z

journal/migrations/0056_delete_journallogo.py

+# Generated by Django 5.2.7 on 2026-02-26 21:01
+
+from django.db import migrations
+
+
+class Migration(migrations.Migration):
+
+    dependencies = [
+        ("journal", "0055_journalproxyadminonly_and_more"),
+    ]
+
+    operations = [
+        migrations.DeleteModel(
+            name="JournalLogo",
+        ),
+    ]


The migration deletes the JournalLogo model without migrating existing data. If there are existing JournalLogo records in the database that link journals to their logo images, this data will be lost. Before applying this migration in production, ensure that:

All existing JournalLogo relationships have been migrated to the direct Journal.logo field

A data migration script has been run to copy logo references from JournalLogo to Journal.logo

Or confirm that no production data exists in the JournalLogo table

Consider adding a data migration step before the DeleteModel operation to preserve any existing logo associations.

Copilot · 2026-02-27T11:26:55Z

journal/sources/am_to_core.py

-            )
+        if not journal:
+            return None
+


There's trailing whitespace on this line (after return None). This should be removed to maintain code cleanliness and follow PEP 8 style guidelines.

Suggested change

Copilot · 2026-02-27T11:26:55Z

journal/models.py

+    def get_url_logo(self, root_url=None):
+        if not self.logo:
+            return None
+        rendition = self.logo.get_rendition('original')
+        if root_url:
+            return f"{root_url}{rendition.url}"


When root_url is provided and contains a trailing slash (as request.build_absolute_uri('/') does), and rendition.url starts with a slash (which is typical for Django media URLs), this will result in a double slash in the URL (e.g., http://example.com//media/...). Consider stripping trailing slashes from root_url before concatenation, or use urljoin for proper URL construction.

Copilot · 2026-02-27T11:26:56Z

journal/api/v1/serializers.py

    def get_url_logo(self, obj):
-        if obj.logo:
-            domain = Site.objects.get(is_default_site=True).hostname
-            domain = f"http://{domain}"
-            return f"{domain}{obj.logo.file.url}"
-        return None
+        try:
+            request = self.context.get('request')
+            return obj.get_url_logo(request.build_absolute_uri('/') if request else None)
+        except Exception:
+            return obj.get_url_logo()


The exception handler catches all exceptions but doesn't log them, making debugging difficult if an error occurs. Consider logging the exception before falling back to calling obj.get_url_logo() without arguments, so that issues (like incorrect root_url construction) can be diagnosed.

Copilot · 2026-02-27T11:26:56Z

journal/tasks.py

@@ -1,6 +1,8 @@
 import logging
 import sys
+from io import BytesIO


The from io import BytesIO import is added but there's already a blank line before line 3. The import should follow PEP 8 conventions by grouping standard library imports together. Consider placing this import immediately after import sys on line 2 without the extra blank line, maintaining consistency with the rest of the file's import style.

Copilot · 2026-02-27T11:26:56Z

journal/tasks.py

+            with PilImage.open(img_bytes) as pil_img:
+                real_format = pil_img.format  # 'JPEG', 'GIF', etc
+                correct_ext = EXT_MAP.get(real_format, '.jpg')
+


There's trailing whitespace on this line (after the closing brace on line 182 or after the except clause). This should be removed to maintain code cleanliness and follow PEP 8 style guidelines.

Copilot · 2026-02-27T11:26:56Z

journal/tasks.py

+                img_bytes.seek(0)  # reset após leitura do Pillow
+            except Exception:
+                pass
+


There's trailing whitespace on this line. This should be removed to maintain code cleanliness and follow PEP 8 style guidelines.

Copilot · 2026-02-27T11:26:57Z

journal/models.py

            .filter(collection__is_active=True)
            .values_list("collection__acron3", flat=True)
        )
+


There's trailing whitespace on this line (after the closing parenthesis). This should be removed to maintain code cleanliness and follow PEP 8 style guidelines.

Suggested change

Copilot · 2026-02-27T11:26:57Z

journal/tasks.py

+    if not domain:
+        logger.warning(f"Collection {collection.acron3} has no domain defined")
        return None


There's a redundant check for collection.domain on lines 124 and 131-132. The second check (lines 131-133) is unnecessary since the first check (lines 124-126) would have already returned None if the domain was empty. Consider removing the duplicate check.

Suggested change

if not domain:

logger.warning(f"Collection {collection.acron3} has no domain defined")

return None

Copilot · 2026-02-27T11:26:57Z

journal/tasks.py

        img_wagtail, created = Image.objects.get_or_create(
            title=journal_acron,
            defaults={
-                "file": ContentFile(response, name=f"{journal_acron}_glogo.gif"),
+                "file": ContentFile(img_bytes.read(), name=logo_filename),
            },
        )


The Image.objects.get_or_create() call uses only title as the lookup field, which means if an image with this journal_acron already exists, it will be reused without updating its file. This could be problematic if:

The logo image changes but keeps the same journal acronym

An old logo has the wrong extension (e.g., .gif) and you want to replace it with the corrected extension

Consider using update_or_create() instead, or add logic to delete/update the existing image when needed, especially since this PR is specifically about fixing incorrect extensions on existing logos.

robertatakenaka added 6 commits February 26, 2026 20:16

fix: atualiza URL do admin de ModelAdmin para SnippetViewSet em colle…

01ae5d1

…ction_summary_item

feat: adiciona método get_url_logo em Journal e remove classe Journal…

3e2db1c

…Logo

refactor: simplifica update_logo usando logo direto no Journal, remov…

095983c

…e uso de JournalLogo

feat: detecta formato real da imagem via Pillow ao baixar logo do per…

c7f4251

…iódico, remove JournalLogo

refactor: usa get_url_logo e request.build_absolute_uri para gerar UR…

5ce2e0c

…L do logo na API

chore: comenta testes de TestFetchAndProcessJournalLogosInCollection …

7045a96

…para revisão futura

Copilot AI review requested due to automatic review settings February 26, 2026 23:19

Copilot started reviewing on behalf of robertatakenaka February 26, 2026 23:19 View session

Copilot AI reviewed Feb 26, 2026

View reviewed changes

robertatakenaka added 4 commits February 26, 2026 21:18

Corrige o nome da variável filename

df97543

Adiciona a migração de journal que apaga JournalLogo

e3dbf78

fix(models): add fallback for missing default site in Journal renditi…

b7b2ea7

…on url

fix(tasks): add domain validation and fix ContentFile reading in logo…

8f26636

… processing

robertatakenaka requested a review from Copilot February 27, 2026 11:21

Copilot started reviewing on behalf of robertatakenaka February 27, 2026 11:21 View session

Copilot AI reviewed Feb 27, 2026

View reviewed changes

robertatakenaka merged commit 75a3c4b into scieloorg:main Feb 27, 2026
7 of 9 checks passed

	"file": ContentFile(response, name=logo_fiename),
	"file": ContentFile(img_bytes.getvalue(), name=logo_fiename),

	if not domain:
	logger.warning(f"Collection {collection.acron3} has no domain defined")
	return None

Conversation

robertatakenaka commented Feb 26, 2026

O que esse PR faz?

Onde a revisão poderia começar?

Como este poderia ser testado manualmente?

Algum cenário de contexto que queira dar?

Screenshots

Quais são tickets relevantes?

Referências

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!