The --rm parameter should remove cached files. #18

Open
opened 2025-11-22 00:03:07 +00:00 by brent.edwards · 0 comments
Member

upload_all_datasets.py has an --rm parameter.

That parameter removes the files from base_dir/downloads.

However, a call to Dataset.from_parquet(str(temp_chunks_dir / filename)) also creates cache files under /home/brent.edwards/.cache/huggingface/datasets.

The --rm parameter should either ensure that the cache files are never created or remove them when the update_all_datasets.py program is done.

`upload_all_datasets.py` has an `--rm` parameter. That parameter removes the files from `base_dir/downloads`. However, a call to `Dataset.from_parquet(str(temp_chunks_dir / filename))` also creates cache files under `/home/brent.edwards/.cache/huggingface/datasets`. The `--rm` parameter should either ensure that the cache files are never created or remove them when the `update_all_datasets.py` program is done.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Depends on
You do not have permission to read 1 dependency
Reference
cleverdatasets/dataset-uploader#18
No description provided.