How are temporary files used? #30

Open
opened 2025-12-09 22:07:04 +00:00 by brent.edwards · 0 comments
Member

The big problem with the dataset-uploader program is that it takes too much space.

We have a goal of translating truly huge data sets, even bigger than Wikipedia.

There are two questions to ask:

  1. Is it possible to remove temporary files once they have been used? (The disk would need to contain the whole of the temporary files at one point, but as temporary files are used and removed, the needed disk space would not change.)
  2. Is it possible to only use one temporary file at a time -- turning the code from completed stages to streaming? (The disk would only need enough for temporary file at a time.)
The big problem with the `dataset-uploader` program is that it takes too much space. We have a goal of translating truly huge data sets, even bigger than Wikipedia. There are two questions to ask: 1. Is it possible to remove temporary files once they have been used? (The disk would need to contain the whole of the temporary files at one point, but as temporary files are used and removed, the needed disk space would not change.) 2. Is it possible to only use one temporary file at a time -- turning the code from completed stages to streaming? (The disk would only need enough for temporary file at a time.)
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleverdatasets/dataset-uploader#30
No description provided.