Skip to content

Commit 16fd958

Browse files
authored
Merge pull request #1 from johnhe-dev/johnhe-dev-patch-1
Update 2023-07-25-announcing-cpp.md
2 parents 9048288 + cb23617 commit 16fd958

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

_posts/2023-07-25-announcing-cpp.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
layout: blog_detail
33
title: "Announcing CPP-based S3 IO DataPipes"
4-
author: John He, Khaled ElGalaind Roshani Nagmote, Daiming Yang
4+
author: John He, Khaled ElGalaind, Roshani Nagmote, Daiming Yang
55
---
66

77
Training large deep learning models requires large datasets. [Amazon Simple Storage Service](https://aws.amazon.com/s3/) (Amazon S3) is a scalable cloud object store service used for storing large training datasets. Machine learning (ML) practitioners need an efficient data pipe that can download data from Amazon S3, transform the data, and feed the data to GPUs for training models with high throughput and low latency.
@@ -48,7 +48,7 @@ The following code snippet provides a typical usage of `load_files_by_s3()`:
4848
from torch.utils.data import DataLoader

4949
from torchdata.datapipes.iter import IterableWrapper


5050
51-
s3_shard_urls = IterableWrapper(["s3://bucket/prefix/",])

51+
s3_shard_urls = IterableWrapper(["s3://bucket/prefix/",])
.list_files_by_s3()
5252
s3_shards = s3_shard_urls.load_files_by_s3()

5353
# text data

5454
training_data = s3_shards.readlines(return_path=False)


0 commit comments

Comments
 (0)