Fix download resume functionality in rdf_dataset_downloader.py #32

Open
opened 2025-12-10 09:04:52 +00:00 by aditya · 0 comments
Member

Description:

The download resume feature fails due to two bugs: (1) Early exit check returns when file exists, preventing resume logic from executing for partial downloads, and (2) httpx stream re-entry crash occurs when attempting to restart download for complete files, causing "stream has been closed" error.
Fix: Remove early exit check and add proper file completeness detection using Content-Length comparison before triggering retry logic.

Acceptance Criteria:

[ ] Partial downloads resume correctly after interruption
[ ] Complete files are detected and skipped without crash
[ ] No regression in normal download functionality

### **Description:** The download resume feature fails due to two bugs: (1) Early exit check returns when file exists, preventing resume logic from executing for partial downloads, and (2) httpx stream re-entry crash occurs when attempting to restart download for complete files, causing "stream has been closed" error. Fix: Remove early exit check and add proper file completeness detection using Content-Length comparison before triggering retry logic. ### **Acceptance Criteria:** [ ] Partial downloads resume correctly after interruption [ ] Complete files are detected and skipped without crash [ ] No regression in normal download functionality
aditya self-assigned this 2025-12-10 09:37:05 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleverdatasets/dataset-uploader#32
No description provided.