data

A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.

1,247

170

1,247

254

View on GitHub

Top Related Projects

data

1,247

A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.

Quick Overview

Meta-PyTorch/data is a repository containing datasets and data loading utilities for PyTorch, specifically tailored for meta-learning tasks. It provides a collection of popular meta-learning datasets and tools to efficiently load and preprocess data for meta-learning experiments.

Pros

Specialized for meta-learning tasks, saving time on dataset preparation
Includes popular meta-learning datasets like Omniglot and Mini-ImageNet
Offers efficient data loading and preprocessing utilities
Integrates seamlessly with PyTorch ecosystem

Cons

Limited to meta-learning datasets, not suitable for general-purpose machine learning tasks
May require additional dependencies for specific datasets
Documentation could be more comprehensive for some datasets
Updates and maintenance may not be as frequent as larger, more general-purpose libraries

Code Examples

Loading the Omniglot dataset:

from meta_pytorch.data import OmniglotDataset

dataset = OmniglotDataset(root='./data', download=True)

Creating a meta-learning task sampler:

from meta_pytorch.data import TaskSampler

sampler = TaskSampler(dataset, n_way=5, k_shot=1, query_size=15)

Using a data loader for meta-learning:

from torch.utils.data import DataLoader

dataloader = DataLoader(dataset, batch_sampler=sampler, num_workers=4)

Getting Started

To get started with meta-pytorch/data, follow these steps:

Install the library:
```
pip install meta-pytorch
```

Import and use the datasets:

from meta_pytorch.data import OmniglotDataset, MiniImageNetDataset

omniglot = OmniglotDataset(root='./data', download=True)
mini_imagenet = MiniImageNetDataset(root='./data', download=True)

Create a task sampler and data loader:

from meta_pytorch.data import TaskSampler
from torch.utils.data import DataLoader

sampler = TaskSampler(omniglot, n_way=5, k_shot=1, query_size=15)
dataloader = DataLoader(omniglot, batch_sampler=sampler, num_workers=4)

Iterate through the data in your meta-learning experiment:

for batch in dataloader:
    # Your meta-learning training loop here
    pass

Competitor Comparisons

data

1,247

A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.

Pros of data

More comprehensive dataset handling capabilities
Better integration with PyTorch ecosystem
Actively maintained with regular updates

Cons of data

Potentially more complex API for simple use cases
May have a steeper learning curve for beginners
Larger codebase, which could impact performance in some scenarios

Code Comparison

data:

from torchdata.datapipes.iter import IterableWrapper, FileOpener

dp = IterableWrapper(["file1.txt", "file2.txt"])
dp = FileOpener(dp, mode="r")
for file_content in dp:
    print(file_content)

data>:

from torch.utils.data import Dataset

class CustomDataset(Dataset):
    def __init__(self, file_list):
        self.file_list = file_list

    def __getitem__(self, index):
        with open(self.file_list[index], "r") as f:
            return f.read()

    def __len__(self):
        return len(self.file_list)

The data repository offers a more flexible and powerful approach to data handling, utilizing DataPipes for efficient data processing. On the other hand, data> provides a simpler, more traditional Dataset implementation that may be easier to understand for those familiar with PyTorch's basic data utilities.

fastai

27,582

The fastai deep learning library

Pros of fastai

More comprehensive library with a wider range of deep learning applications
Higher-level API, making it easier for beginners to get started
Extensive documentation and educational resources

Cons of fastai

Less flexible for low-level customization
Potentially slower execution compared to pure PyTorch implementations
Steeper learning curve for understanding the entire ecosystem

Code Comparison

fastai:

from fastai.vision.all import *
path = untar_data(URLs.PETS)
dls = ImageDataLoaders.from_folder(path, valid_pct=0.2, size=224)
learn = cnn_learner(dls, resnet34, metrics=error_rate)
learn.fine_tune(1)

meta-pytorch/data:

from torchvision import datasets, transforms
transform = transforms.Compose([transforms.Resize(224), transforms.ToTensor()])
dataset = datasets.ImageFolder('path/to/data', transform=transform)
dataloader = torch.utils.data.DataLoader(dataset, batch_size=32, shuffle=True)

The fastai code showcases its high-level API for quick model creation and training, while the meta-pytorch/data example demonstrates a more low-level approach to data loading and preprocessing.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

TorchData

What is TorchData? | Stateful DataLoader | Install guide | Contributing | License

What is TorchData?

The TorchData project is an iterative enhancement to the PyTorch torch.utils.data.DataLoader and torch.utils.data.Dataset/IterableDataset to make them scalable, performant dataloading solutions. We will be iterating on the enhancements under the torchdata repo.

Our first change begins with adding checkpointing to torch.utils.data.DataLoader, which can be found in stateful_dataloader, a drop-in replacement for torch.utils.data.DataLoader, by defining load_state_dict and state_dict methods that enable mid-epoch checkpointing, and an API for users to track custom iteration progress, and other custom states from the dataloader workers such as token buffers and/or RNG states.

Stateful DataLoader

torchdata.stateful_dataloader.StatefulDataLoader is a drop-in replacement for torch.utils.data.DataLoader which provides state_dict and load_state_dict functionality. See the Stateful DataLoader main page for more information and examples. Also check out the examples in this Colab notebook.

torchdata.nodes

torchdata.nodes is a library of composable iterators (not iterables!) that let you chain together common dataloading and pre-proc operations. It follows a streaming programming model, although "sampler + Map-style" can still be configured if you desire. See torchdata.nodes main page for more details. Stay tuned for tutorial on torchdata.nodes coming soon!

Installation

Version Compatibility

The following is the corresponding torchdata versions and supported Python versions.

`torch`	`torchdata`	`python`
`master` / `nightly`	`main` / `nightly`	`>=3.9`, `<=3.13`
`2.6.0`	`0.11.0`	`>=3.9`, `<=3.13`
`2.5.0`	`0.10.0`	`>=3.9`, `<=3.12`
`2.5.0`	`0.9.0`	`>=3.9`, `<=3.12`
`2.4.0`	`0.8.0`	`>=3.8`, `<=3.12`
`2.0.0`	`0.6.0`	`>=3.8`, `<=3.11`
`1.13.1`	`0.5.1`	`>=3.7`, `<=3.10`
`1.12.1`	`0.4.1`	`>=3.7`, `<=3.10`
`1.12.0`	`0.4.0`	`>=3.7`, `<=3.10`
`1.11.0`	`0.3.0`	`>=3.7`, `<=3.10`

Local pip or conda

First, set up an environment. We will be installing a PyTorch binary as well as torchdata. If you're using conda, create a conda environment:

conda create --name torchdata
conda activate torchdata

If you wish to use venv instead:

python -m venv torchdata-env
source torchdata-env/bin/activate

Install torchdata:

Using pip:

pip install torchdata

Using conda:

conda install -c pytorch torchdata

From source

pip install .

In case building TorchData from source fails, install the nightly version of PyTorch following the linked guide on the contributing page.

From nightly

The nightly version of TorchData is also provided and updated daily from main branch.

Using pip:

pip install --pre torchdata --index-url https://download.pytorch.org/whl/nightly/cpu

Using conda:

conda install torchdata -c pytorch-nightly

Contributing

We welcome PRs! See the CONTRIBUTING file.

Beta Usage and Feedback

We'd love to hear from and work with early adopters to shape our designs. Please reach out by raising an issue if you're interested in using this tooling for your project.

License

TorchData is BSD licensed, as found in the LICENSE file.

Top Related Projects

data

1,247

A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.

fastai

27,582

The fastai deep learning library

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot