Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
0075e77
未病アイテムタイプスクリプト用sqlを最新化
ayumi-nishida Oct 29, 2025
4607a04
不要な修正を削除
ayumi-nishida Oct 29, 2025
7ad9920
コメントを修正
ayumi-nishida Oct 29, 2025
6bcc722
選択肢を修正
ayumi-nishida Oct 29, 2025
1ba0a8c
複数選択の設定を修正
ayumi-nishida Oct 29, 2025
b431b82
選択肢の崩れを修正
ayumi-nishida Oct 29, 2025
0e1195a
jsonldを修正
ayumi-nishida Oct 29, 2025
8a3f60f
add extract_text_from_pdf, extract_text_with_tika from v2.0.0
ivis-futagami Oct 29, 2025
841c919
sqlファイルを最新化
ayumi-nishida Oct 30, 2025
d2f2200
Implementation for extended metadata
ivis-futagami Oct 30, 2025
afe5428
拡張メタデータを追加
ayumi-nishida Oct 30, 2025
e68176d
AllowMulthpul修正
ayumi-nishida Oct 30, 2025
d6c6499
アイテムタイプの選択肢誤りを修正
ayumi-nishida Oct 31, 2025
33386fd
未病フロント側のセクション、サブセクションの表示を修正
ayumi-nishida Oct 31, 2025
e574383
Merge pull request #1048 from ivis-futagami/feature/ams_w2025_61a_ext…
ivis-futagami Oct 31, 2025
ebff6f8
Merge pull request #1039 from ayumi-nishida/feature/ams_itemtype_mapp…
ivis-futagami Nov 5, 2025
4ad61f0
develop_v2.0.0作業分を追加
ayumi-nishida Nov 5, 2025
68a08b0
m_indexをデータセットメタデータに対応
ayumi-nishida Nov 5, 2025
d1fd6b2
リファクタリングを採用
ayumi-nishida Nov 7, 2025
fafb318
データセットメタデータ用 プロパティを作成
ayumi-nishida Nov 7, 2025
3c4676d
データセットメタデータのプロパティ修正
ayumi-nishida Nov 7, 2025
41bbf50
JSONLDマッピングを修正
ayumi-nishida Nov 7, 2025
0d771b5
アイテム詳細画面でメタデータの項目名(親ラベル)が表示されるように修正。
ayumi-nishida Nov 10, 2025
80b9346
データセットメタデータのプロパティ修正
ayumi-nishida Nov 10, 2025
0cf456a
jaとenの逆転を修正
ayumi-nishida Nov 10, 2025
8a931ca
余分な空白を修正
ayumi-nishida Nov 10, 2025
7daf266
アイテムタイプ更新スクリプトを更新
ayumi-nishida Nov 10, 2025
6d2d89f
intが送られてもインポートできるように修正
ayumi-nishida Nov 11, 2025
8d73a2e
item_type.sqlを最新化/一部enが変更されていなかった個所を修正
ayumi-nishida Nov 11, 2025
b680e08
インデントのずれを修正/print文を削除
ayumi-nishida Nov 11, 2025
26739e5
コンフリクト解消
ayumi-nishida Nov 11, 2025
84ccc59
単体テストを作成。判定誤りを修正
ayumi-nishida Nov 11, 2025
54d5e32
fix: parse file id for ro-crate. #56045
ivis-kuroda Nov 11, 2025
3f3f5ef
判定修正
ayumi-nishida Nov 11, 2025
b05d965
mapperの修正分を削除
ayumi-nishida Nov 11, 2025
b5d1419
fix: divide the mapping process.
ivis-kuroda Nov 11, 2025
ad3afa3
update: unit tests.
ivis-kuroda Nov 11, 2025
4b4a72e
Merge pull request #1058 from ivis-kuroda/fix/mapping_process_for_ams
ivis-futagami Nov 11, 2025
7d3cdcf
テストのスキーマ定義を修正
ayumi-nishida Nov 12, 2025
28e912b
コンフリクト解消
ayumi-nishida Nov 12, 2025
6f10a36
翻訳を修正/言語ファイルを更新
ayumi-nishida Nov 12, 2025
4b372ee
unicodeになる不具合解消
ayumi-nishida Nov 12, 2025
dd80256
Merge branch 'fix/ams-detail-tab' of https://github.com/ayumi-nishida…
ayumi-nishida Nov 12, 2025
b79e672
表示崩れを修正
ayumi-nishida Nov 12, 2025
842b874
インデントずれ修正
ayumi-nishida Nov 12, 2025
b9d25e7
Merge pull request #1059 from ivis-futagami/Fix/ams-detail-tab
ivis-futagami Nov 12, 2025
4f60e09
Merge branch 'feature/ams_w2025_61a_metadata' of https://github.com/i…
ayumi-nishida Nov 12, 2025
0655b2b
sqlファイルを更新
ayumi-nishida Nov 12, 2025
959cbac
fix build error
ivis-futagami Nov 12, 2025
45a60a2
Merge pull request #1060 from ivis-futagami/fix/fix_build_error
ivis-futagami Nov 12, 2025
bf0381c
Merge pull request #1055 from ayumi-nishida/feature/ams_itemtype_mapp…
ivis-futagami Nov 12, 2025
9b1f920
fix item type sql
ivis-futagami Nov 12, 2025
e127673
Merge pull request #1061 from ivis-futagami/fix/item_type_sql
ivis-futagami Nov 12, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 7 additions & 5 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -311,11 +311,14 @@ services:

pgpool:
restart: "always"
image: bitnami/pgpool
image: pgpool/pgpool:4.2.2
environment:
- PGPOOL_BACKEND_NODES=0:postgresql:5432
- PGPOOL_SR_CHECK_USER=invenio
- PGPOOL_SR_CHECK_PASSWORD=dbpass123
- PGPOOL_PARAMS_BACKEND_HOSTNAME0=postgresql
- PGPOOL_PARAMS_BACKEND_PORT0=5432
- PGPOOL_PARAMS_BACKEND_WEIGHT0=1
- PGPOOL_PARAMS_SR_CHECK_USER=invenio
- PGPOOL_PARAMS_SR_CHECK_PASSWORD=dbpass123
- PGPOOL_PARAMS_PORT=5432
- PGPOOL_ENABLE_LDAP=no
- PGPOOL_POSTGRES_USERNAME=postgres
- PGPOOL_POSTGRES_PASSWORD=dbpass123
Expand Down Expand Up @@ -452,4 +455,3 @@ volumes:
mongo_data:
# letsencrypt_etc:
# letsencrypt_html:

12 changes: 7 additions & 5 deletions docker-compose2.yml
Original file line number Diff line number Diff line change
Expand Up @@ -311,11 +311,14 @@ services:

pgpool:
restart: "always"
image: bitnami/pgpool
image: pgpool/pgpool:4.2.2
environment:
- PGPOOL_BACKEND_NODES=0:postgresql:5432
- PGPOOL_SR_CHECK_USER=invenio
- PGPOOL_SR_CHECK_PASSWORD=dbpass123
- PGPOOL_PARAMS_BACKEND_HOSTNAME0=postgresql
- PGPOOL_PARAMS_BACKEND_PORT0=5432
- PGPOOL_PARAMS_BACKEND_WEIGHT0=1
- PGPOOL_PARAMS_SR_CHECK_USER=invenio
- PGPOOL_PARAMS_SR_CHECK_PASSWORD=dbpass123
- PGPOOL_PARAMS_PORT=5432
- PGPOOL_ENABLE_LDAP=no
- PGPOOL_POSTGRES_USERNAME=postgres
- PGPOOL_POSTGRES_PASSWORD=dbpass123
Expand Down Expand Up @@ -452,4 +455,3 @@ volumes:
mongo_data:
# letsencrypt_etc:
# letsencrypt_html:

6 changes: 3 additions & 3 deletions modules/weko-accounts/weko_accounts/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,6 @@
import hashlib

from .config import WEKO_API_LIMIT_RATE_DEFAULT
from weko_admin.models import AdminSettings

limiter = Limiter(
app=None,
Expand Down Expand Up @@ -111,6 +110,7 @@ def parse_attributes():
error = False

# Get attribute mapping from admin settings
from weko_admin.models import AdminSettings
admin_settings = AdminSettings.get('attribute_mapping', dict_to_object=False)

for header, attr in current_app.config[
Expand Down Expand Up @@ -221,7 +221,7 @@ def decorated_view(*args, **kwargs):

def get_sp_info():
"""Get Service Provider (SP) information for Shibboleth login.

Returns:
dict: A dictionary containing SP entityID, handlerURL, and return URL.
"""
Expand All @@ -233,7 +233,7 @@ def get_sp_info():
sp_entityID = 'https://' + current_app.config['WEB_HOST_NAME'] + '/shibboleth-sp'
if 'SP_ENTITYID' in current_app.config:
sp_entityID = current_app.config['SP_ENTITYID']

sp_handlerURL = 'https://' + current_app.config['WEB_HOST_NAME'] + '/Shibboleth.sso'
if 'SP_HANDLERURL' in current_app.config:
sp_handlerURL = current_app.config['SP_HANDLERURL']
Expand Down
1 change: 1 addition & 0 deletions modules/weko-deposit/requirements2.txt
Original file line number Diff line number Diff line change
Expand Up @@ -287,3 +287,4 @@ xmlschema==0.9.30
xmltodict==0.12.0
zipp==3.6.0
zope.interface==5.5.2
pypdfium2==4.30.0
Binary file not shown.
61 changes: 58 additions & 3 deletions modules/weko-deposit/tests/test_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,12 +19,14 @@
# MA 02111-1307, USA.

from weko_deposit.api import WekoDeposit
from weko_deposit.utils import update_pdf_contents_es
from weko_deposit.utils import update_pdf_contents_es, extract_text_from_pdf, extract_text_with_tika

from mock import patch
from mock import MagicMock
import uuid
from tests.helpers import create_record_with_pdf

import os
import pytest

# .tox/c1/bin/pytest --cov=weko_deposit tests/test_utils.py::test_update_pdf_contents_es -vv -s --cov-branch --cov-report=term --basetemp=/code/modules/weko-deposit/.tox/c1/tmp
def test_update_pdf_contents_es(app, db, location, mocker):
Expand All @@ -46,4 +48,57 @@ def test_update_pdf_contents_es(app, db, location, mocker):
for args, _ in args_list:
test = pdf_file_infos[i]
assert args[0] == (test,str(record_ids[i]))
i+=1
i+=1


# .tox/c1/bin/pytest --cov=weko_deposit tests/test_utils.py::test_extract_text_from_pdf -vv -s --cov-branch --cov-report=term --basetemp=/code/modules/weko-deposit/.tox/c1/tmp
def test_extract_text_from_pdf():
filepath = os.path.join(os.path.dirname(__file__),"data","test_files","test_file_1.2M.pdf")

# file size > max_size
data = extract_text_from_pdf(filepath, 10000)
assert len(data.encode('utf-8')) <= 10000
assert data.count("test file page") < 1240

# file size <= max_size
data = extract_text_from_pdf(filepath, 100000000)
assert len(data.encode('utf-8')) == 19561
assert data.count("test file page") == 1240

# not exist file
filepath = "not_exist_file.pdf"
with pytest.raises(FileNotFoundError) as e:
data = extract_text_from_pdf(filepath, 10000)
assert str(e.value) == "/code/modules/weko-deposit/not_exist_file.pdf"


# .tox/c1/bin/pytest --cov=weko_deposit tests/test_utils.py::test_extract_text_with_tika -vv -s --cov-branch --cov-report=term --basetemp=/code/modules/weko-deposit/.tox/c1/tmp
def test_extract_text_with_tika():
filepath = os.path.join(os.path.dirname(__file__),"data","test_files","sample_word.docx")
# not exist tika jar file.
mock_env_not_exist_tika = {"TIKA_JAR_FILE_PATH": "/not/exist/path/tika-server.jar"}
with patch.dict(os.environ, mock_env_not_exist_tika, clear=False):
with pytest.raises(Exception) as e:
data = extract_text_with_tika(filepath, 100)
assert str(e.value) == "not exist tika jar file."

mock_env_not_exist_tika = {"TIKA_JAR_FILE_PATH": "/code/tika/tika-app-2.6.0.jar"}
with patch.dict(os.environ, mock_env_not_exist_tika, clear=False):
# error with subprocess
mock_run = MagicMock()
mock_run.returncode.return_value=1
mock_run.stderr.decode.return_value="test_error"
with patch("weko_deposit.utils.subprocess.run", return_value=mock_run):
with pytest.raises(Exception) as e:
data = extract_text_with_tika(filepath, 100)
assert str(e.value) == "raise in tika: test_error"

# file size > max_size
data = extract_text_with_tika(filepath, 50)
assert len(data.encode('utf-8')) < 50
assert data == "これはテスト用のサンプルwordファイ"

# file size <= max_size
data = extract_text_with_tika(filepath, 5000)
assert len(data.encode('utf-8')) > 50
assert data == "これはテスト用のサンプルwordファイルです中身は特に意味がありません"
75 changes: 74 additions & 1 deletion modules/weko-deposit/weko_deposit/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,10 @@

from .tasks import extract_pdf_and_update_file_contents
from .api import WekoDeposit
import pypdfium2
import os
import subprocess


def update_pdf_contents_es(record_ids):
"""register the contents of the record PDF file in elasticsearch
Expand All @@ -29,4 +33,73 @@ def update_pdf_contents_es(record_ids):
deposits = WekoDeposit.get_records(record_ids)
for dep in deposits:
file_infos = dep.get_pdf_info()
extract_pdf_and_update_file_contents.apply_async((file_infos, str(dep.id)))
extract_pdf_and_update_file_contents.apply_async((file_infos, str(dep.id)))


def extract_text_from_pdf(filepath, max_size):
"""Read PDF file and extract text.

Args:
filepath (str): Path to the PDF file.
max_size (int): Maximum size of the extracted text in bytes.

Returns:
str: Extracted text from the PDF file.

"""
reader = None
data = ""
try:
reader = pypdfium2.PdfDocument(filepath)
texts = []
total_bytes = 0
for page in reader:
text = page.get_textpage().get_text_range()
encoded = text.encode('utf-8', errors='replace')
if total_bytes + len(encoded) > max_size:
remain = max_size - total_bytes
texts.append(encoded[:remain].decode('utf-8', errors='ignore'))
break
else:
texts.append(text)
total_bytes += len(encoded)
data = "".join(texts)
data = "".join(data.splitlines())
finally:
if reader is not None:
reader.close()

return data


def extract_text_with_tika(filepath, max_size):
"""Read non-PDF file and extract text.

Args:
filepath (str): Path to the PDF file.
max_size (int): Maximum size of the extracted text in bytes.

Raises:
Exception: If Tika jar file does not exist.
Exception: If Tika processing fails.

Returns:
str: Extracted text from the non-PDF file.
"""
tika_jar_path = os.environ.get("TIKA_JAR_FILE_PATH")
if not tika_jar_path or os.path.isfile(tika_jar_path) is False:
raise Exception("not exist tika jar file.")
args = ["java", "-jar", tika_jar_path, "-t", filepath]
result = subprocess.run(
args,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
)
if result.returncode != 0:
raise Exception("raise in tika: {}".format(result.stderr.decode("utf-8")))
data = "".join(result.stdout.decode("utf-8").splitlines())
if len(data.encode('utf-8')) > max_size:
encoded = data.encode('utf-8')
data = encoded[:max_size].decode('utf-8', errors='ignore')

return data
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@
{%- endfor -%}
{%- else -%}
{{ output_attribute_value_mlt_Init( record_detail_data ) }}
{%- endif -%}
{%- endif -%}
{%- endif -%}
{% endmacro %}

Expand All @@ -83,7 +83,7 @@
{% if parrent_name %}
{%- set labels = parrent_name.split('.') -%}
{%- if labels|length == 1 -%}
{{ child_data(parrent_name, '', level) }}
{{ child_data(parrent_name, ' ', level) }}
{%- else -%}
{%- set displayflag = False -%}
{%- endif -%}
Expand Down Expand Up @@ -118,7 +118,7 @@
<td><a href="{{content | encode_filename }}">{{ content | escape_str }}</a></td>
{%- elif content|url_to_link -%}
<td><a href="{{content}}" target='_blank'>{{ content | escape_str }}</a></td>
{%- else -%}
{%- else -%}
<td class="multiple-line">{{ content | escape_str }}</td>
{%- endif -%}
{% endautoescape %}
Expand Down Expand Up @@ -207,7 +207,7 @@
{%- set nsflg.dispflg = True -%}
{%- endif -%}
{%- endfor -%}
{%- endif -%}
{%- endif -%}
{%- endif -%}
{%- endfor -%}
{%- for language_value in language_data -%}
Expand Down Expand Up @@ -245,7 +245,7 @@
{%- if value is string -%}
{%- if attribute_data|length == 2 -%}
{{ output_attribute_value_mlt(attribute_data, level) }}
{%- else -%}
{%- else -%}
{{ output_attribute_value_mlt_exceptlang(attribute_data,level) }}
{%- endif -%}
{%- else -%}
Expand Down
1 change: 1 addition & 0 deletions modules/weko-search-ui/requirements2.txt
Original file line number Diff line number Diff line change
Expand Up @@ -287,3 +287,4 @@ xmlschema==0.9.30
xmltodict==0.12.0
zipp==3.6.0
zope.interface==5.5.2
pypdfium2==4.30.0
13 changes: 11 additions & 2 deletions modules/weko-search-ui/tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -136,7 +136,13 @@
from weko_deposit.api import WekoDeposit
from weko_deposit.api import WekoDeposit as aWekoDeposit
from weko_deposit.api import WekoIndexer, WekoRecord
from weko_deposit.config import WEKO_BUCKET_QUOTA_SIZE, WEKO_MAX_FILE_SIZE
from weko_deposit.config import (
WEKO_BUCKET_QUOTA_SIZE,
WEKO_MAX_FILE_SIZE,
WEKO_DEPOSIT_FILESIZE_LIMIT,
WEKO_MIMETYPE_WHITELIST_FOR_ES,
WEKO_DEPOSIT_TEXTMIMETYPE_WHITELIST_FOR_ES
)
from weko_groups import WekoGroups
from weko_index_tree import WekoIndexTree, WekoIndexTreeREST
from weko_index_tree.api import Indexes
Expand Down Expand Up @@ -694,7 +700,10 @@ def base_app(instance_path, search_class, request):
WEKO_ITEMS_UI_INDEX_PATH_SPLIT = '///',
WEKO_SEARCH_UI_BULK_EXPORT_RETRY = 5,
WEKO_SEARCH_UI_BULK_EXPORT_LIMIT = 100,
RECORDS_UI_ENDPOINTS = RECORDS_UI_ENDPOINTS
RECORDS_UI_ENDPOINTS = RECORDS_UI_ENDPOINTS,
WEKO_DEPOSIT_FILESIZE_LIMIT = WEKO_DEPOSIT_FILESIZE_LIMIT,
WEKO_MIMETYPE_WHITELIST_FOR_ES = WEKO_MIMETYPE_WHITELIST_FOR_ES,
WEKO_DEPOSIT_TEXTMIMETYPE_WHITELIST_FOR_ES = WEKO_DEPOSIT_TEXTMIMETYPE_WHITELIST_FOR_ES
)
app_.url_map.converters["pid"] = PIDConverter
app_.config["RECORDS_REST_ENDPOINTS"]["recid"]["search_class"] = search_class
Expand Down
Binary file not shown.
Binary file added modules/weko-search-ui/tests/data/ams/png_file.pdf
Binary file not shown.
Binary file added modules/weko-search-ui/tests/data/ams/png_file.txt
Binary file not shown.
2 changes: 2 additions & 0 deletions modules/weko-search-ui/tests/data/ams/sample.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
This is a
text file.
Loading
Loading