Corrige coleta journal logo#1365
Conversation
…ction_summary_item
…e uso de JournalLogo
…iódico, remove JournalLogo
…para revisão futura
There was a problem hiding this comment.
Pull request overview
This PR fixes journal logo download and storage by addressing file extension mismatches, improving URL construction, and simplifying the data model. The changes remove the intermediate JournalLogo model in favor of a direct relationship between Journal and Wagtail's Image model.
Changes:
- Implemented format detection using Pillow to determine actual image format instead of forcing
.gifextension - Updated URL generation in the API to use
request.build_absolute_uri()for proper protocol and domain handling - Removed intermediate
JournalLogomodel and established directJournal.logo → wagtailimages.Imagerelationship
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| journal/tasks.py | Added BytesIO-based image format detection with Pillow; removed JournalLogo model usage; simplified domain handling in _build_logo_url |
| journal/models.py | Removed JournalLogo class; added get_url_logo() method to Journal model |
| journal/api/v1/serializers.py | Updated get_url_logo() to use request context for building absolute URIs with proper fallback |
| journal/sources/am_to_core.py | Refactored update_logo() to use Journal.get_url_logo() instead of JournalLogo |
| journal/tests.py | Commented out JournalLogo-related tests |
| core/templates/wagtailadmin/summary_items/collection_summary_item.html | Updated admin URL from ModelAdmin to SnippetViewSet pattern |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
journal/tasks.py
Outdated
| except Exception: | ||
| pass | ||
|
|
||
| logo_fiename = f"{journal_acron}_logo{correct_ext}" |
There was a problem hiding this comment.
Typo in variable name: 'logo_fiename' should be 'logo_filename'.
journal/tasks.py
Outdated
| title=journal_acron, | ||
| defaults={ | ||
| "file": ContentFile(response, name=f"{journal_acron}_glogo.gif"), | ||
| "file": ContentFile(response, name=logo_fiename), |
There was a problem hiding this comment.
The BytesIO object 'img_bytes' is created and used to detect the image format, but then the original 'response' bytes are passed to ContentFile instead of the BytesIO content. After Pillow reads from BytesIO and it's seeked back to 0, you should use 'img_bytes.read()' or 'img_bytes.getvalue()' in ContentFile, not 'response'. Otherwise, the seek(0) operation serves no purpose and the code creates unnecessary confusion about which byte stream is being used.
| "file": ContentFile(response, name=logo_fiename), | |
| "file": ContentFile(img_bytes.getvalue(), name=logo_fiename), |
journal/models.py
Outdated
| site = Site.objects.get(is_default_site=True) | ||
| # Garantir que o root_url esteja devidamente informado | ||
| return f"{site.root_url}{rendition.url}" |
There was a problem hiding this comment.
The call to Site.objects.get(is_default_site=True) can raise Site.DoesNotExist if no default site is configured. This should be wrapped in a try-except block or use .filter().first() to handle cases where the default site might not exist, returning None gracefully instead of crashing.
| img_wagtail, created = Image.objects.get_or_create( | ||
| title=journal_acron, | ||
| defaults={ | ||
| "file": ContentFile(response, name=f"{journal_acron}_glogo.gif"), | ||
| "file": ContentFile(response, name=logo_fiename), | ||
| }, | ||
| ) |
There was a problem hiding this comment.
The get_or_create logic using 'title' as the lookup field means that if an Image with that journal_acron title already exists, it won't be updated even if the logo file has changed. The 'defaults' parameter only applies when creating a new record. If logos can change over time, this should either use update_or_create, or delete the old image before creating a new one, or use a different unique identifier that includes versioning information.
| domain = collection.domain | ||
| collection_acron3 = collection.acron3 | ||
|
|
||
| if collection_acron3 == "scl": | ||
| return f"https://{domain}/media/images/{journal_acron}_glogo.gif" | ||
| return f"{domain}/media/images/{journal_acron}_glogo.gif" | ||
| else: | ||
| return f"http://{domain}/img/revistas/{journal_acron}/glogo.gif" | ||
| return f"{domain}/img/revistas/{journal_acron}/glogo.gif" |
There was a problem hiding this comment.
The code directly uses 'collection.domain' without checking if it's None. Since the Collection model defines domain as 'null=True', this could result in constructing URLs like "None/media/images/..." if a collection has no domain set. Consider adding a null check: 'if not collection.domain: return None' before using it.
| # """Test that ValueError is raised when collection does not exist""" | ||
| # with self.assertRaises(ValueError) as context: | ||
| # fetch_and_process_journal_logos_in_collection( | ||
| # collection_acron3="invalid_acron", | ||
| # username=self.user.username, | ||
| # ) | ||
| # self.assertIn("does not exist", str(context.exception)) | ||
|
|
||
| # @patch("journal.tasks.group") | ||
| # def test_fetch_and_process_journal_logos_with_valid_collection(self, mock_group): | ||
| # """Test that task executes with valid collection""" | ||
| # # Mock the group to avoid actually running the celery tasks | ||
| # mock_group.return_value.return_value = None | ||
|
|
||
| # result = fetch_and_process_journal_logos_in_collection( | ||
| # collection_acron3=self.collection.acron3, | ||
| # username=self.user.username, | ||
| # ) | ||
|
|
||
| # # The task should complete without raising an exception | ||
| # # and should call group with the task signatures | ||
| # self.assertTrue(mock_group.called) | ||
|
|
||
|
|
There was a problem hiding this comment.
Tests for journal logo functionality were commented out rather than updated to reflect the model changes. Commented-out tests should either be removed entirely or updated to work with the new direct relationship between Journal and wagtailimages.Image. Leaving large blocks of commented code reduces maintainability and creates confusion about whether this functionality is tested.
| # """Test that ValueError is raised when collection does not exist""" | |
| # with self.assertRaises(ValueError) as context: | |
| # fetch_and_process_journal_logos_in_collection( | |
| # collection_acron3="invalid_acron", | |
| # username=self.user.username, | |
| # ) | |
| # self.assertIn("does not exist", str(context.exception)) | |
| # @patch("journal.tasks.group") | |
| # def test_fetch_and_process_journal_logos_with_valid_collection(self, mock_group): | |
| # """Test that task executes with valid collection""" | |
| # # Mock the group to avoid actually running the celery tasks | |
| # mock_group.return_value.return_value = None | |
| # result = fetch_and_process_journal_logos_in_collection( | |
| # collection_acron3=self.collection.acron3, | |
| # username=self.user.username, | |
| # ) | |
| # # The task should complete without raising an exception | |
| # # and should call group with the task signatures | |
| # self.assertTrue(mock_group.called) |
| try: | ||
| request = self.context.get('request') | ||
| return obj.get_url_logo(request.build_absolute_uri('/') if request else None) | ||
| except Exception: | ||
| return obj.get_url_logo() |
There was a problem hiding this comment.
The bare 'except Exception' catches all exceptions silently without logging them. If an error occurs while building the URL (e.g., Site.DoesNotExist from get_url_logo), it will be silently swallowed and the fallback will be attempted. Consider logging the exception before falling back, or catching more specific exceptions. This will help with debugging if the URL generation fails.
| # collection.domain contém https:// ou http:// | ||
| domain = collection.domain | ||
| collection_acron3 = collection.acron3 | ||
|
|
||
| if collection_acron3 == "scl": | ||
| return f"https://{domain}/media/images/{journal_acron}_glogo.gif" | ||
| return f"{domain}/media/images/{journal_acron}_glogo.gif" | ||
| else: | ||
| return f"http://{domain}/img/revistas/{journal_acron}/glogo.gif" | ||
| return f"{domain}/img/revistas/{journal_acron}/glogo.gif" |
There was a problem hiding this comment.
The comment states "collection.domain contém https:// ou http://" (contains https:// or http://), but the Collection model uses URLField which doesn't enforce this, and test data shows domains stored without protocol (e.g., "www.scielo.br" in journal/tests.py:190,196). This inconsistency could lead to malformed URLs like "www.scielo.br/media/images/..." instead of "http://www.scielo.br/media/images/...". Consider either: 1) Adding validation to ensure domain always includes protocol, 2) Using urlparse to normalize the domain, or 3) Explicitly prepending the protocol based on collection type.
| EXT_MAP = { | ||
| 'JPEG': '.jpg', | ||
| 'GIF': '.gif', | ||
| 'PNG': '.png', | ||
| 'WEBP': '.webp', | ||
| 'ICO': '.ico', | ||
| } |
There was a problem hiding this comment.
The EXT_MAP dictionary is defined inside the function and recreated on every function call. Since this is a Celery task that could be called many times, consider moving this constant to module level (above the function) to avoid unnecessary recreation. This is a minor performance optimization.
| AMJournal, | ||
| Journal, | ||
| JournalLicense, | ||
| JournalLogo, | ||
| SciELOJournal, | ||
| ) | ||
| from journal.sources import classic_website |
There was a problem hiding this comment.
Missing database migration: The JournalLogo model was removed from models.py but no migration file was included in this PR to drop the corresponding database table. This will cause the database schema to be out of sync with the models. Run 'python manage.py makemigrations' to generate the migration that removes the JournalLogo model and its database table.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 11 comments.
Comments suppressed due to low confidence (1)
journal/tasks.py:216
- In the exception handler, the code references
journal.title,url_logo, andcollection.domain, but if the exception occurs early in the function (e.g., when fetching the journal object on line 156-164 or whenscielo_journalis None on line 165), these variables may not be defined, leading to additional exceptions that mask the original error. Consider adding checks for variable existence or using getattr with defaults in the exception handler.
except Exception as e:
exc_type, exc_value, exc_traceback = sys.exc_info()
UnexpectedEvent.create(
exception=e,
exc_traceback=exc_traceback,
action="journal.tasks.fetch_and_process_journal_logo",
detail={
"function": "journal.tasks.fetch_and_process_journal_logo",
"journal_title": journal.title,
"url_logo": url_logo,
"domain": collection.domain,
},
)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Generated by Django 5.2.7 on 2026-02-26 21:01 | ||
|
|
||
| from django.db import migrations | ||
|
|
||
|
|
||
| class Migration(migrations.Migration): | ||
|
|
||
| dependencies = [ | ||
| ("journal", "0055_journalproxyadminonly_and_more"), | ||
| ] | ||
|
|
||
| operations = [ | ||
| migrations.DeleteModel( | ||
| name="JournalLogo", | ||
| ), | ||
| ] |
There was a problem hiding this comment.
The migration deletes the JournalLogo model without migrating existing data. If there are existing JournalLogo records in the database that link journals to their logo images, this data will be lost. Before applying this migration in production, ensure that:
- All existing
JournalLogorelationships have been migrated to the directJournal.logofield - A data migration script has been run to copy logo references from
JournalLogotoJournal.logo - Or confirm that no production data exists in the
JournalLogotable
Consider adding a data migration step before the DeleteModel operation to preserve any existing logo associations.
| ) | ||
| if not journal: | ||
| return None | ||
|
|
There was a problem hiding this comment.
There's trailing whitespace on this line (after return None). This should be removed to maintain code cleanliness and follow PEP 8 style guidelines.
| def get_url_logo(self, root_url=None): | ||
| if not self.logo: | ||
| return None | ||
| rendition = self.logo.get_rendition('original') | ||
| if root_url: | ||
| return f"{root_url}{rendition.url}" |
There was a problem hiding this comment.
When root_url is provided and contains a trailing slash (as request.build_absolute_uri('/') does), and rendition.url starts with a slash (which is typical for Django media URLs), this will result in a double slash in the URL (e.g., http://example.com//media/...). Consider stripping trailing slashes from root_url before concatenation, or use urljoin for proper URL construction.
| def get_url_logo(self, obj): | ||
| if obj.logo: | ||
| domain = Site.objects.get(is_default_site=True).hostname | ||
| domain = f"http://{domain}" | ||
| return f"{domain}{obj.logo.file.url}" | ||
| return None | ||
| try: | ||
| request = self.context.get('request') | ||
| return obj.get_url_logo(request.build_absolute_uri('/') if request else None) | ||
| except Exception: | ||
| return obj.get_url_logo() |
There was a problem hiding this comment.
The exception handler catches all exceptions but doesn't log them, making debugging difficult if an error occurs. Consider logging the exception before falling back to calling obj.get_url_logo() without arguments, so that issues (like incorrect root_url construction) can be diagnosed.
| @@ -1,6 +1,8 @@ | |||
| import logging | |||
| import sys | |||
| from io import BytesIO | |||
There was a problem hiding this comment.
The from io import BytesIO import is added but there's already a blank line before line 3. The import should follow PEP 8 conventions by grouping standard library imports together. Consider placing this import immediately after import sys on line 2 without the extra blank line, maintaining consistency with the rest of the file's import style.
| with PilImage.open(img_bytes) as pil_img: | ||
| real_format = pil_img.format # 'JPEG', 'GIF', etc | ||
| correct_ext = EXT_MAP.get(real_format, '.jpg') | ||
|
|
There was a problem hiding this comment.
There's trailing whitespace on this line (after the closing brace on line 182 or after the except clause). This should be removed to maintain code cleanliness and follow PEP 8 style guidelines.
| img_bytes.seek(0) # reset após leitura do Pillow | ||
| except Exception: | ||
| pass | ||
|
|
There was a problem hiding this comment.
There's trailing whitespace on this line. This should be removed to maintain code cleanliness and follow PEP 8 style guidelines.
| .filter(collection__is_active=True) | ||
| .values_list("collection__acron3", flat=True) | ||
| ) | ||
|
|
There was a problem hiding this comment.
There's trailing whitespace on this line (after the closing parenthesis). This should be removed to maintain code cleanliness and follow PEP 8 style guidelines.
| if not domain: | ||
| logger.warning(f"Collection {collection.acron3} has no domain defined") | ||
| return None |
There was a problem hiding this comment.
There's a redundant check for collection.domain on lines 124 and 131-132. The second check (lines 131-133) is unnecessary since the first check (lines 124-126) would have already returned None if the domain was empty. Consider removing the duplicate check.
| if not domain: | |
| logger.warning(f"Collection {collection.acron3} has no domain defined") | |
| return None |
| img_wagtail, created = Image.objects.get_or_create( | ||
| title=journal_acron, | ||
| defaults={ | ||
| "file": ContentFile(response, name=f"{journal_acron}_glogo.gif"), | ||
| "file": ContentFile(img_bytes.read(), name=logo_filename), | ||
| }, | ||
| ) |
There was a problem hiding this comment.
The Image.objects.get_or_create() call uses only title as the lookup field, which means if an image with this journal_acron already exists, it will be reused without updating its file. This could be problematic if:
- The logo image changes but keeps the same journal acronym
- An old logo has the wrong extension (e.g., .gif) and you want to replace it with the corrected extension
Consider using update_or_create() instead, or add logic to delete/update the existing image when needed, especially since this PR is specifically about fixing incorrect extensions on existing logos.
O que esse PR faz?
Corrige o download e armazenamento do logo de periódicos e a geração da URL pública na API. Problemas resolvidos:
.gifforçada, causando erro de validação no Wagtail quando o arquivo era JPEG. Agora o formato real é detectado via Pillow antes de salvar.request.build_absolute_uriesite.root_urlcomo fallback.JournalLogofoi removida em favor de uma relação diretaJournal.logo → wagtailimages.Image.ModelAdminparaSnippetViewSet.Onde a revisão poderia começar?
journal/models.py— remoção deJournalLogoe adição deget_url_logoemJournal. Em seguidajournal/tasks.pypara a detecção de formato via Pillow, ejournal/api/v1/serializers.pypara a geração da URL pública.Como este poderia ser testado manualmente?
/cms/images/) se a imagem foi salva com a extensão correta.url_logono endpointGET /api/v1/journals/retorna uma URL absoluta e acessível.Algum cenário de contexto que queira dar?
O sistema legado armazena logos com extensão
.gif, mas muitos são JPEG renomeados. O Wagtail valida o formato real via Pillow no upload e rejeita arquivos com extensão incorreta. A solução inspeciona o conteúdo em memória (BytesIO) antes de nomear o arquivo.Screenshots
N/A — alterações de backend.
Quais são tickets relevantes?
Informar a issue relacionada.
Referências