Skip to content

Commit f9725de

Browse files
Merge pull request #10 from dialogoo/docs/apache-license-update
chore: embrace open source - migrate to Apache 2.0
2 parents f436df8 + b7d4a1b commit f9725de

File tree

10 files changed

+95
-77
lines changed

10 files changed

+95
-77
lines changed

CONTRIBUTING.md

Lines changed: 60 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,23 +5,79 @@ Thanks for helping improve **laiive**!
55
---
66

77
## Agreement
8-
By cloning, forking, or making a pull request, you **accept** the [Collaborator Agreement](./LICENSES/COLLABORATOR_AGREEMENT.md).If you don’t agree, please don’t contribute.
8+
Take a look to the [Collaborator Agreement](./LICENSES/COLLABORATOR_AGREEMENT.md).
99

1010
---
1111

1212
## How to Contribute
1313
- Open issues for bugs, ideas, or questions.
1414
- Make pull requests with clear, focused changes.
15-
- Keep commit messages simple and descriptive.
15+
- Follow our commit message guidelines below.
16+
17+
---
18+
19+
## Commit Message Guidelines
20+
21+
We prioritize self-documenting code and meaningful commit messages. Code should clearly express **what** it does, while commit messages explain **why**.
22+
23+
### Format
24+
25+
```
26+
<type>: <brief summary>
27+
28+
<detailed explanation>
29+
- Why was this change necessary?
30+
- What problem does it solve?
31+
- What alternatives were considered?
32+
```
33+
34+
### Types
35+
- `feat:` New feature
36+
- `fix:` Bug fix
37+
- `refactor:` Code restructuring
38+
- `perf:` Performance improvement
39+
- `docs:` Documentation
40+
- `test:` Tests
41+
- `chore:` Maintenance
42+
43+
### Example
44+
```
45+
refactor: Replace nested loops with hash map in findDuplicates
46+
47+
The O(n²) approach was causing timeouts on large datasets.
48+
HashMap provides O(1) lookups and reduces execution time
49+
from 2.3s to 45ms on 10k items.
50+
51+
Refs: #156
52+
```
53+
54+
### Code Documentation Rules
55+
56+
**Avoid:**
57+
- Comments restating obvious code
58+
- Commented-out code (use git history)
59+
- Vague TODOs and FIXMEs without context
60+
61+
**Include:**
62+
- Why non-obvious decisions were made
63+
- Complex business logic explanations
64+
- Known limitations or edge cases
65+
- Public API documentation
66+
- Complex algorithms or patterns
1667

1768
---
1869

1970
## Respect & Kindness
2071
- Be kind and polite.
2172
- Listen to others and assume good intent.
2273
- Disagree with ideas, not with people.
23-
- No harassment, insults, or sharing private project info.
2474

2575
---
2676

27-
That’s it. Let’s keep things simple, safe, and respectful while building together.
77+
That's it. Let's keep things simple, safe, and respectful while building together.
78+
79+
---
80+
81+
## Recomended reading to be a good contributor
82+
83+
- Hunt, A., & Thomas, D. (2019). The pragmatic programmer: Your journey to mastery (20th anniversary ed.). Addison-Wesley.

LICENSES/CHANGE_LICENSE_DATE.txt

Lines changed: 0 additions & 2 deletions
This file was deleted.

LICENSES/LICENSE_PRECHANGE.txt

Lines changed: 0 additions & 12 deletions
This file was deleted.

LICENSES/THIRD_PARTY_LICENSES.md

Lines changed: 18 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -1,35 +1,24 @@
1-
#### Third-Party Licenses
1+
# Third-Party Licenses
22

3-
This project uses the following third-party libraries:
3+
This project includes and depends on the following open source libraries:
44

5-
- pre-commit-hooks (MIT License)
6-
https://github.com/pre-commit/pre-commit-hooks
5+
## Development Tools
6+
- **pre-commit-hooks** (MIT) - https://github.com/pre-commit/pre-commit-hooks
7+
- **ruff-pre-commit** (MIT) - https://github.com/astral-sh/ruff-pre-commit
8+
- **mypy** (MIT) - https://github.com/python/mypy
9+
- **bandit** (Apache-2.0) - https://github.com/PyCQA/bandit
10+
- **detect-secrets** (Apache-2.0) - https://github.com/Yelp/detect-secrets
11+
- **sqlfluff** (MIT) - https://github.com/sqlfluff/sqlfluff
12+
- **commitizen** (MIT) - https://github.com/commitizen-tools/commitizen
713

8-
- ruff-pre-commit (MIT License)
9-
https://github.com/astral-sh/ruff-pre-commit
14+
## Runtime Dependencies
15+
- **streamlit** (Apache-2.0) - https://github.com/streamlit/streamlit
16+
- **fastapi** (MIT) - https://github.com/tiangolo/fastapi
17+
- **pgvector** (PostgreSQL License) - https://github.com/ankane/pgvector
1018

11-
- mypy (MIT License)
12-
https://github.com/python/mypy
19+
[Add other runtime dependencies from your requirements.txt/pyproject.toml]
1320

14-
- bandit (Apache-2.0 License)
15-
https://github.com/PyCQA/bandit
21+
---
1622

17-
- detect-secrets (Apache-2.0 License)
18-
https://github.com/Yelp/detect-secrets
19-
20-
- sqlfluff (MIT License)
21-
https://github.com/sqlfluff/sqlfluff
22-
23-
- commitizen (MIT License)
24-
https://github.com/commitizen-tools/commitizen
25-
26-
- streamlit (Apache-2.0 License)
27-
https://github.com/streamlit/streamlit
28-
29-
- fastapi (MIT License)
30-
https://github.com/tiangolo/fastapi
31-
32-
- pgvector (PostgreSQL License)
33-
https://github.com/ankane/pgvector
34-
35-
Each third-party library is subject to its own license, which is included here for reference.
23+
All third-party software is used in compliance with their respective licenses.
24+
Full license texts are available in the linked repositories.

README.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,9 +52,15 @@ A Postgres db is the heart of this dynamics and stores all the system knowledge.
5252

5353
---
5454

55-
[^*]: © laiive. All strategic documents, diagrams, mockups, and planning materials contained in this repository are the intellectual property of Laiive and are provided solely for reference purposes. Unauthorized reproduction, distribution, reverse engineering, or commercial use of these materials is strictly prohibited without prior written consent from Laiive. For questions regarding usage rights or licensing, please contact: [email protected].
55+
[^*]: © 2025 Oscar Arroyo Vega. This project is licensed under the Apache License 2.0. You are free to use, modify, and distribute this software for any purpose, including commercial use, under the terms of the Apache 2.0 License. See the LICENSE file for details.
5656

57+
## License
5758

59+
This project is licensed under the **Apache License 2.0** - see the [LICENSE](LICENSES/LICENSE) file for details.
60+
61+
Copyright 2025 Oscar Arroyo Vega
62+
63+
---
5864

5965
### instructions
6066

File renamed without changes.

services/scraper/db_parser/config.py renamed to services/parser/config.py

Lines changed: 2 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,13 @@
11
from pydantic_settings import BaseSettings, SettingsConfigDict
22

33

4-
# services/scraper/db_parser/config.py
54

65
from pydantic import BaseSettings
76

87
class DbParserSettings(BaseSettings):
9-
# … (other settings fields)
8+
# TODO
109

11-
# … (any other module‐level code)
1210

13-
# Replace instantiation to match the new class name
1411
settings = DbParserSettings()
1512
POSTGRES_URL: str
1613

@@ -32,7 +29,7 @@ class db_parser_settings(BaseSettings):
3229

3330
def __init__(self, **kwargs):
3431
super().__init__(**kwargs)
35-
# Fix the URL format for psycopg2
32+
# FIXME the URL format for psycopg2
3633
if self.POSTGRES_URL.startswith("postgresql+asyncpg://"):
3734
self.POSTGRES_URL = self.POSTGRES_URL.replace(
3835
"postgresql+asyncpg://", "postgresql://"

services/scraper/db_parser/parser.py renamed to services/parser/parser.py

Lines changed: 3 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -17,13 +17,10 @@ def __init__(self, database_url: str = None):
1717
self.PERFECT_SIMILARITY_THRESHOLD = 1.0
1818

1919
def get_connection(self):
20-
"""Get database connection"""
2120
return psycopg2.connect(self.database_url)
2221

2322
def insert_events(self, events_data, source_website="www.ecodibergamo.it"):
24-
"""Insert events with duplicate checking with perfect match and rapidfuzz similarity.
25-
returns: inserted count"""
26-
# TODO improve the insertion logic, check artist and date filtering by place.
23+
# TODO improve the insertion logic, check artist and date filtering by place. add pydantic for type verification
2724

2825
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
2926
self.review_file = (
@@ -118,7 +115,6 @@ def insert_events(self, events_data, source_website="www.ecodibergamo.it"):
118115
return inserted_count
119116

120117
def get_events_count(self) -> int:
121-
"""Get total number of events in database"""
122118
conn = self.get_connection()
123119
cursor = conn.cursor()
124120
cursor.execute("SELECT COUNT(*) FROM events")
@@ -134,7 +130,6 @@ def _log_for_review(
134130
similarity: float,
135131
event_data: dict = None,
136132
):
137-
"""function to save the logs for review in case of insert duplicate doubt"""
138133

139134
review_entry = {
140135
"timestamp": datetime.now().isoformat(),
@@ -162,7 +157,6 @@ def get_review_summary(
162157
) -> Dict[
163158
str, Any
164159
]: # TODO add all the data for review and check reviews for artists and venues
165-
"""Get summary of items pending review"""
166160
try:
167161
with open(self.review_file, "r", encoding="utf-8") as f:
168162
lines = f.readlines()
@@ -172,12 +166,11 @@ def get_review_summary(
172166
if line.strip():
173167
try:
174168
entry = json.loads(line.strip())
175-
if entry.get("decision") is None: # Not yet reviewed
169+
if entry.get("decision") is None: # TODO Not yet reviewed
176170
pending_reviews.append(entry)
177171
except json.JSONDecodeError:
178172
continue
179173

180-
# Group by table
181174
by_table = {}
182175
for review in pending_reviews:
183176
table = review["table"]
@@ -196,21 +189,18 @@ def get_review_summary(
196189
return {"total_pending": 0, "by_table": {}, "pending_reviews": []}
197190

198191
def remove_review_file(self):
199-
"""Clear the review file"""
200192
if os.path.exists(self.review_file):
201193
os.remove(self.review_file)
202194
logger.info("Cleared review file")
203195

204196
def _normalize_text(self, text: str) -> str:
205-
"""Normalize text for comparison"""
206197
if not text:
207198
return ""
208199
normalized = re.sub(r"[^\w\s]", "", text.lower().strip())
209200
normalized = re.sub(r"\s+", " ", normalized)
210201
return normalized
211202

212203
def _calculate_similarity(self, text1: str, text2: str) -> float:
213-
"""Calculate similarity between two texts"""
214204
if not text1 or not text2:
215205
return 0.0
216206
return fuzz.ratio(
@@ -220,10 +210,6 @@ def _calculate_similarity(self, text1: str, text2: str) -> float:
220210
def _find_duplicate_entity(
221211
self, cursor, table_name: str, name: str, entity_type: str = "entity"
222212
) -> Optional[tuple]:
223-
"""this method is to find duplicates with exact match and with fuzzysearch
224-
level one checks for exact match
225-
second level check for rapidfuzz match
226-
"""
227213
if not name or not name.strip():
228214
return None
229215

@@ -235,13 +221,12 @@ def _find_duplicate_entity(
235221
f"{entity_type.title()} exact match found: '{name}' (ID: {exact_match[0]})"
236222
)
237223
return (exact_match[0], "exact", exact_match[1])
238-
# Level 2: Similarity-based match (only if no exact matches found)
224+
239225
self._find_similar_entity(cursor, table_name, name, entity_type)
240226

241227
def _find_similar_entity(
242228
self, cursor, table_name: str, name: str, entity_type: str
243229
) -> Optional[tuple]:
244-
"""Find similar entities using similarity calculation"""
245230
cursor.execute(f"SELECT id, name FROM {table_name}")
246231
all_entities = cursor.fetchall()
247232

@@ -268,7 +253,6 @@ def _find_similar_entity(
268253
return None
269254

270255
def _is_duplicate_event(self, cursor, event) -> bool:
271-
"""Check if event is a duplicate using standardized 4-level detection"""
272256
event_name = event.get("name")
273257
if not event_name:
274258
return False

services/scraper/event_scraper/spiders/eppen_music.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,8 @@
55
import sys
66
import os
77

8-
sys.path.append(os.path.join(os.path.dirname(__file__), "..", ".."))
9-
from db_parser.parser import DatabaseParser
8+
sys.path.append(os.path.join(os.path.dirname(__file__), ".."))
9+
from parser.parser import DatabaseParser
1010

1111

1212
class EppenMusicSpider(scrapy.Spider):
@@ -223,7 +223,7 @@ def parse_to_db(self):
223223
self.logger.warning("No event data to parse")
224224
return
225225

226-
from db_parser.config import settings
226+
from parser.config import settings
227227

228228
db_parser = DatabaseParser(settings.POSTGRES_URL)
229229

tests/test_parser.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,8 @@
88
sys.path.insert(0, workspace_root)
99

1010
# Now import from services
11-
from services.scraper.db_parser.parser import DatabaseParser
12-
from services.scraper.db_parser.config import settings
11+
from services.parser.parser import DatabaseParser
12+
from services.parser.config import settings
1313
from loguru import logger
1414

1515

0 commit comments

Comments
 (0)