mypy-and-pylint #2

Open
brent.edwards wants to merge 12 commits from mypy-and-pylint into 16-error-messages-and-logging
Member

Some work toward cleaning the scripts.

Some work toward cleaning the scripts.
brent.edwards changed target branch from master to 16-error-messages-and-logging 2025-11-21 02:36:38 +00:00
aditya approved these changes 2025-11-21 08:42:52 +00:00
brent.edwards left a comment
Author
Member

A few initial comments...

A few initial comments...
@ -90,3 +93,3 @@
# Handle compressed files
if file_path.suffix == ".gz":
file_obj = gzip.open(file_path, "rt", encoding="utf-8")
with gzip.open(file_path, "rt", encoding="utf-8") as file_obj:
Author
Member

Why was this added?

Why was this added?
Member

Thanks for the feedback. The duplication came from fixing SIM115 (use a context manager), which required opening files with with. I've refactored to remove the duplication by extracting the shared logic into _process_ntriples_file() while keeping the context managers. This addresses both the Ruff error and the code duplication.

Thanks for the feedback. The duplication came from fixing SIM115 (use a context manager), which required opening files with with. I've refactored to remove the duplication by extracting the shared logic into _process_ntriples_file() while keeping the context managers. This addresses both the Ruff error and the code duplication.
@ -258,4 +267,0 @@
if current_batch:
yield current_batch
else:
Author
Member

This is the point where the comparer got confused...

This is the point where the comparer got confused...
Member

This is part of the refactoring of the code, this part of the code is now moved into a helper function. The logic is unchanged.

This is part of the refactoring of the code, this part of the code is now moved into a helper function. The logic is unchanged.
@ -276,0 +280,4 @@
# Yield remaining lines
if current_batch:
yield current_batch
else:
Author
Member

Again, why was this added?

Again, why was this added?
Member

This is part of the refactoring of the code, this part of the code is now moved into a helper function. The logic is unchanged.

This is part of the refactoring of the code, this part of the code is now moved into a helper function. The logic is unchanged.
@ -344,3 +403,2 @@
def stream_turtle_chunks(file_path: Path, chunk_size: int = 10000) -> Iterator[list[dict[str, str]]]:
"""Stream Turtle file in chunks using incremental parsing.
def _process_turtle_file(
Author
Member

Again, this is different. What changed?

Again, this is different. What changed?
Member

The stream_turtle_chunks function was split into _process_turtle_file() helper to remove duplication when fixing SIM115.

The stream_turtle_chunks function was split into _process_turtle_file() helper to remove duplication when fixing SIM115.
This pull request has changes conflicting with the target branch.
  • scripts/convert_rdf_to_hf_dataset_streaming_parallel.py
  • scripts/upload_all_datasets.py
View command line instructions

Checkout

From your project repository, check out a new branch and test the changes.
git fetch -u origin mypy-and-pylint:mypy-and-pylint
git switch mypy-and-pylint

Merge

Merge the changes and update on Forgejo.

Warning: The "Autodetect manual merge" setting is not enabled for this repository, you will have to mark this pull request as manually merged afterwards.

git switch 16-error-messages-and-logging
git merge --no-ff mypy-and-pylint
git switch mypy-and-pylint
git rebase 16-error-messages-and-logging
git switch 16-error-messages-and-logging
git merge --ff-only mypy-and-pylint
git switch mypy-and-pylint
git rebase 16-error-messages-and-logging
git switch 16-error-messages-and-logging
git merge --no-ff mypy-and-pylint
git switch 16-error-messages-and-logging
git merge --squash mypy-and-pylint
git switch 16-error-messages-and-logging
git merge --ff-only mypy-and-pylint
git switch 16-error-messages-and-logging
git merge mypy-and-pylint
git push origin 16-error-messages-and-logging
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleverdatasets/dataset-uploader!2
No description provided.