Convert Figma logo to code with AI

meta-pytorch logodata

A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.

1,247
170
1,247
254

Top Related Projects

1,247

A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.

27,582

The fastai deep learning library

Quick Overview

Meta-PyTorch/data is a repository containing datasets and data loading utilities for PyTorch, specifically tailored for meta-learning tasks. It provides a collection of popular meta-learning datasets and tools to efficiently load and preprocess data for meta-learning experiments.

Pros

  • Specialized for meta-learning tasks, saving time on dataset preparation
  • Includes popular meta-learning datasets like Omniglot and Mini-ImageNet
  • Offers efficient data loading and preprocessing utilities
  • Integrates seamlessly with PyTorch ecosystem

Cons

  • Limited to meta-learning datasets, not suitable for general-purpose machine learning tasks
  • May require additional dependencies for specific datasets
  • Documentation could be more comprehensive for some datasets
  • Updates and maintenance may not be as frequent as larger, more general-purpose libraries

Code Examples

Loading the Omniglot dataset:

from meta_pytorch.data import OmniglotDataset

dataset = OmniglotDataset(root='./data', download=True)

Creating a meta-learning task sampler:

from meta_pytorch.data import TaskSampler

sampler = TaskSampler(dataset, n_way=5, k_shot=1, query_size=15)

Using a data loader for meta-learning:

from torch.utils.data import DataLoader

dataloader = DataLoader(dataset, batch_sampler=sampler, num_workers=4)

Getting Started

To get started with meta-pytorch/data, follow these steps:

  1. Install the library:

    pip install meta-pytorch
    
  2. Import and use the datasets:

    from meta_pytorch.data import OmniglotDataset, MiniImageNetDataset
    
    omniglot = OmniglotDataset(root='./data', download=True)
    mini_imagenet = MiniImageNetDataset(root='./data', download=True)
    
  3. Create a task sampler and data loader:

    from meta_pytorch.data import TaskSampler
    from torch.utils.data import DataLoader
    
    sampler = TaskSampler(omniglot, n_way=5, k_shot=1, query_size=15)
    dataloader = DataLoader(omniglot, batch_sampler=sampler, num_workers=4)
    
  4. Iterate through the data in your meta-learning experiment:

    for batch in dataloader:
        # Your meta-learning training loop here
        pass
    

Competitor Comparisons

1,247

A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.

Pros of data

  • More comprehensive dataset handling capabilities
  • Better integration with PyTorch ecosystem
  • Actively maintained with regular updates

Cons of data

  • Potentially more complex API for simple use cases
  • May have a steeper learning curve for beginners
  • Larger codebase, which could impact performance in some scenarios

Code Comparison

data:

from torchdata.datapipes.iter import IterableWrapper, FileOpener

dp = IterableWrapper(["file1.txt", "file2.txt"])
dp = FileOpener(dp, mode="r")
for file_content in dp:
    print(file_content)

data>:

from torch.utils.data import Dataset

class CustomDataset(Dataset):
    def __init__(self, file_list):
        self.file_list = file_list

    def __getitem__(self, index):
        with open(self.file_list[index], "r") as f:
            return f.read()

    def __len__(self):
        return len(self.file_list)

The data repository offers a more flexible and powerful approach to data handling, utilizing DataPipes for efficient data processing. On the other hand, data> provides a simpler, more traditional Dataset implementation that may be easier to understand for those familiar with PyTorch's basic data utilities.

27,582

The fastai deep learning library

Pros of fastai

  • More comprehensive library with a wider range of deep learning applications
  • Higher-level API, making it easier for beginners to get started
  • Extensive documentation and educational resources

Cons of fastai

  • Less flexible for low-level customization
  • Potentially slower execution compared to pure PyTorch implementations
  • Steeper learning curve for understanding the entire ecosystem

Code Comparison

fastai:

from fastai.vision.all import *
path = untar_data(URLs.PETS)
dls = ImageDataLoaders.from_folder(path, valid_pct=0.2, size=224)
learn = cnn_learner(dls, resnet34, metrics=error_rate)
learn.fine_tune(1)

meta-pytorch/data:

from torchvision import datasets, transforms
transform = transforms.Compose([transforms.Resize(224), transforms.ToTensor()])
dataset = datasets.ImageFolder('path/to/data', transform=transform)
dataloader = torch.utils.data.DataLoader(dataset, batch_size=32, shuffle=True)

The fastai code showcases its high-level API for quick model creation and training, while the meta-pytorch/data example demonstrates a more low-level approach to data loading and preprocessing.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

TorchData

What is TorchData? | Stateful DataLoader | Install guide | Contributing | License

What is TorchData?

The TorchData project is an iterative enhancement to the PyTorch torch.utils.data.DataLoader and torch.utils.data.Dataset/IterableDataset to make them scalable, performant dataloading solutions. We will be iterating on the enhancements under the torchdata repo.

Our first change begins with adding checkpointing to torch.utils.data.DataLoader, which can be found in stateful_dataloader, a drop-in replacement for torch.utils.data.DataLoader, by defining load_state_dict and state_dict methods that enable mid-epoch checkpointing, and an API for users to track custom iteration progress, and other custom states from the dataloader workers such as token buffers and/or RNG states.

Stateful DataLoader

torchdata.stateful_dataloader.StatefulDataLoader is a drop-in replacement for torch.utils.data.DataLoader which provides state_dict and load_state_dict functionality. See the Stateful DataLoader main page for more information and examples. Also check out the examples in this Colab notebook.

torchdata.nodes

torchdata.nodes is a library of composable iterators (not iterables!) that let you chain together common dataloading and pre-proc operations. It follows a streaming programming model, although "sampler + Map-style" can still be configured if you desire. See torchdata.nodes main page for more details. Stay tuned for tutorial on torchdata.nodes coming soon!

Installation

Version Compatibility

The following is the corresponding torchdata versions and supported Python versions.

torchtorchdatapython
master / nightlymain / nightly>=3.9, <=3.13
2.6.00.11.0>=3.9, <=3.13
2.5.00.10.0>=3.9, <=3.12
2.5.00.9.0>=3.9, <=3.12
2.4.00.8.0>=3.8, <=3.12
2.0.00.6.0>=3.8, <=3.11
1.13.10.5.1>=3.7, <=3.10
1.12.10.4.1>=3.7, <=3.10
1.12.00.4.0>=3.7, <=3.10
1.11.00.3.0>=3.7, <=3.10

Local pip or conda

First, set up an environment. We will be installing a PyTorch binary as well as torchdata. If you're using conda, create a conda environment:

conda create --name torchdata
conda activate torchdata

If you wish to use venv instead:

python -m venv torchdata-env
source torchdata-env/bin/activate

Install torchdata:

Using pip:

pip install torchdata

Using conda:

conda install -c pytorch torchdata

From source

pip install .

In case building TorchData from source fails, install the nightly version of PyTorch following the linked guide on the contributing page.

From nightly

The nightly version of TorchData is also provided and updated daily from main branch.

Using pip:

pip install --pre torchdata --index-url https://download.pytorch.org/whl/nightly/cpu

Using conda:

conda install torchdata -c pytorch-nightly

Contributing

We welcome PRs! See the CONTRIBUTING file.

Beta Usage and Feedback

We'd love to hear from and work with early adopters to shape our designs. Please reach out by raising an issue if you're interested in using this tooling for your project.

License

TorchData is BSD licensed, as found in the LICENSE file.