Convert Figma logo to code with AI

PaddlePaddle logoPaddleFormers

PaddleFormers is an easy-to-use library of pre-trained large language model zoo based on PaddlePaddle.

12,951
2,169
12,951
135

Top Related Projects

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

41,188

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

32,154

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Ongoing research training transformer models at scale

11,890

An open-source NLP research library, built on PyTorch.

39,797

TensorFlow code and pre-trained models for BERT

Quick Overview

PaddleFormers is an open-source library for natural language processing (NLP) tasks based on the PaddlePaddle deep learning framework. It provides a collection of pre-trained models and tools for various NLP applications, including text classification, named entity recognition, and machine translation.

Pros

  • Offers a wide range of pre-trained models for different NLP tasks
  • Built on PaddlePaddle, which provides efficient deep learning capabilities
  • Includes easy-to-use APIs for quick implementation of NLP solutions
  • Supports both Chinese and English language processing

Cons

  • Less popular compared to other NLP libraries like Hugging Face Transformers
  • Documentation and community support may be limited compared to more established libraries
  • Primarily focused on PaddlePaddle ecosystem, which may limit integration with other frameworks
  • Learning curve may be steeper for those unfamiliar with PaddlePaddle

Code Examples

  1. Text Classification:
from paddlenlp.transformers import ErnieForSequenceClassification, ErnieTokenizer

model = ErnieForSequenceClassification.from_pretrained('ernie-1.0', num_classes=2)
tokenizer = ErnieTokenizer.from_pretrained('ernie-1.0')

text = "This is a great movie!"
inputs = tokenizer(text)
outputs = model(**inputs)
print(outputs)
  1. Named Entity Recognition:
from paddlenlp.transformers import ErnieForTokenClassification, ErnieTokenizer

model = ErnieForTokenClassification.from_pretrained('ernie-1.0')
tokenizer = ErnieTokenizer.from_pretrained('ernie-1.0')

text = "Steve Jobs was the co-founder of Apple Inc."
inputs = tokenizer(text)
outputs = model(**inputs)
print(outputs)
  1. Machine Translation:
from paddlenlp.transformers import MBartForConditionalGeneration, MBartTokenizer

model = MBartForConditionalGeneration.from_pretrained('mbart-large-cc25')
tokenizer = MBartTokenizer.from_pretrained('mbart-large-cc25')

src_text = "Hello, how are you?"
inputs = tokenizer(src_text, return_tensors="pd")
outputs = model.generate(**inputs)
translated_text = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
print(translated_text)

Getting Started

To get started with PaddleFormers:

  1. Install PaddlePaddle and PaddleNLP:
pip install paddlepaddle paddlenlp
  1. Import the required modules:
from paddlenlp.transformers import *
  1. Load a pre-trained model and tokenizer:
model = ErnieForSequenceClassification.from_pretrained('ernie-1.0')
tokenizer = ErnieTokenizer.from_pretrained('ernie-1.0')
  1. Process your text and get predictions:
text = "Your input text here"
inputs = tokenizer(text)
outputs = model(**inputs)

For more detailed instructions and examples, refer to the PaddleNLP documentation and examples in the GitHub repository.

Competitor Comparisons

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Pros of transformers

  • Larger community and more extensive documentation
  • Supports multiple deep learning frameworks (PyTorch, TensorFlow, JAX)
  • More comprehensive model zoo with pre-trained models

Cons of transformers

  • Can be more complex for beginners due to its extensive features
  • Potentially slower inference speed compared to PaddleFormers
  • Larger package size and dependencies

Code Comparison

transformers:

from transformers import BertTokenizer, BertModel
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased')

PaddleFormers:

from paddlenlp.transformers import BertTokenizer, BertModel
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased')

The code usage is quite similar between the two libraries, with the main difference being the import statement. transformers uses the transformers package, while PaddleFormers uses paddlenlp.transformers. Both libraries provide similar APIs for loading pre-trained models and tokenizers, making it relatively easy for users to switch between them if needed.

41,188

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Pros of DeepSpeed

  • More extensive optimization techniques, including ZeRO-Offload and 3D parallelism
  • Better support for large-scale distributed training across multiple GPUs and nodes
  • More active development and frequent updates

Cons of DeepSpeed

  • Steeper learning curve due to more advanced features
  • Primarily focused on PyTorch, while PaddleFormers supports PaddlePaddle framework
  • May require more fine-tuning for optimal performance in specific use cases

Code Comparison

DeepSpeed:

import deepspeed
model_engine, optimizer, _, _ = deepspeed.initialize(args=args, model=model, model_parameters=params)
for step, batch in enumerate(data_loader):
    loss = model_engine(batch)
    model_engine.backward(loss)
    model_engine.step()

PaddleFormers:

import paddle
from paddlenlp.transformers import ErnieForSequenceClassification
model = ErnieForSequenceClassification.from_pretrained('ernie-1.0')
optimizer = paddle.optimizer.AdamW(learning_rate=0.0001, parameters=model.parameters())
for batch in train_data_loader:
    loss = model(input_ids=batch['input_ids'], labels=batch['labels'])
    loss.backward()
    optimizer.step()
32,154

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Pros of fairseq

  • More extensive documentation and examples
  • Larger community and more frequent updates
  • Supports a wider range of NLP tasks and architectures

Cons of fairseq

  • Steeper learning curve for beginners
  • Requires more computational resources for some models
  • Less integrated with other deep learning frameworks

Code Comparison

PaddleFormers:

import paddle
from paddlenlp.transformers import BertForSequenceClassification

model = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_classes=2)

fairseq:

from fairseq.models.roberta import RobertaModel

roberta = RobertaModel.from_pretrained('/path/to/roberta/model', checkpoint_file='model.pt')

Both repositories provide high-level APIs for working with transformer models, but fairseq offers more flexibility in model architecture and training options. PaddleFormers is more tightly integrated with the PaddlePaddle ecosystem, making it easier to use for those already familiar with that framework.

fairseq has a larger collection of pre-implemented models and supports more advanced features like distributed training and mixed precision. However, PaddleFormers may be more accessible for users in certain regions due to its origins in the Chinese tech industry.

Ultimately, the choice between these repositories depends on the specific requirements of your project, your familiarity with the underlying frameworks, and the level of customization you need.

Ongoing research training transformer models at scale

Pros of Megatron-LM

  • Optimized for NVIDIA GPUs, offering better performance on NVIDIA hardware
  • Supports larger model sizes and distributed training across multiple GPUs
  • More extensive documentation and examples for various model architectures

Cons of Megatron-LM

  • Limited to NVIDIA hardware, reducing flexibility for users with different setups
  • Steeper learning curve due to its focus on large-scale models and distributed training
  • Less integration with other deep learning frameworks compared to PaddleFormers

Code Comparison

Megatron-LM (model initialization):

model = get_language_model(
    attention_mask_func, num_tokentypes=num_tokentypes,
    add_pooler=add_pooler, init_method=init_method,
    scaled_init_method=scaled_init_method)

PaddleFormers (model initialization):

model = AutoModelForSequenceClassification.from_pretrained(
    model_name_or_path,
    num_classes=num_classes)

Both repositories provide powerful tools for working with transformer-based models, but they cater to different use cases. Megatron-LM is more focused on large-scale models and distributed training, while PaddleFormers offers a more user-friendly approach with easier integration into existing workflows. The choice between the two depends on the specific requirements of your project and the available hardware resources.

11,890

An open-source NLP research library, built on PyTorch.

Pros of AllenNLP

  • More extensive documentation and tutorials
  • Larger community and ecosystem of pre-built models
  • Better integration with PyTorch and other popular NLP libraries

Cons of AllenNLP

  • Steeper learning curve for beginners
  • Less focus on performance optimization compared to PaddleFormers
  • More complex setup and configuration process

Code Comparison

AllenNLP:

from allennlp.data import DatasetReader, Instance
from allennlp.data.fields import TextField
from allennlp.data.token_indexers import SingleIdTokenIndexer

class MyDatasetReader(DatasetReader):
    def _read(self, file_path: str) -> Iterable[Instance]:
        with open(file_path, "r") as f:
            for line in f:
                yield self.text_to_instance(line.strip())

PaddleFormers:

from paddlenlp.datasets import MapDataset

class MyDataset(MapDataset):
    def __init__(self, data_path):
        with open(data_path, 'r', encoding='utf-8') as f:
            lines = f.readlines()
        super().__init__(lines)

    def __getitem__(self, idx):
        return {"text": self.data[idx].strip()}
39,797

TensorFlow code and pre-trained models for BERT

Pros of BERT

  • Widely adopted and well-documented, with extensive research and community support
  • Provides pre-trained models for various languages and tasks
  • Offers a straightforward implementation of the BERT architecture

Cons of BERT

  • Limited to BERT-specific models and tasks
  • Less flexibility for customization and experimentation with different architectures
  • Older codebase with fewer recent updates

Code Comparison

BERT:

import tensorflow as tf
from bert import modeling

bert_config = modeling.BertConfig.from_json_file("bert_config.json")
model = modeling.BertModel(config=bert_config, is_training=True, input_ids=input_ids)

PaddleFormers:

import paddle
from paddlenlp.transformers import BertModel

model = BertModel.from_pretrained('bert-base-uncased')
input_ids = paddle.to_tensor([[1, 2, 3, 4, 5, 6]])
output = model(input_ids)

PaddleFormers offers a more modern and flexible approach, supporting various transformer architectures beyond BERT. It provides easier integration with the PaddlePaddle framework and includes more recent advancements in NLP. However, BERT remains a solid choice for those specifically focused on BERT-based models and looking for a well-established implementation.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README


最新更新 | 特性 | 安装 | 快速体验 | 社区交流

PaddleFormers

📝简介

PaddleFormers 是基于百度深度学习框架 PaddlePaddle 搭建的 Transformers 库,旨在为 PaddlePaddle 生态提供与 Hugging Face Transformers 项目对等的模型接口与功能体验,支持大语言模型(LLM)与视觉语言模型(VLM)的训练能力。PaddleFormers 充分发挥 PaddlePaddle 在高性能训练方面的内置优势,全面支持包括张量并行、流水线并行和专家并行在内的主流大模型分布式训练策略,以及自动混合精度等加速技术,在 DeepSeek-V3、GLM-4.5-Air 等重点模型上,训练性能明显超越 Megatron-LM ,实现了高效的预训练与后训练性能。

结合业界主流优化方法与飞桨在业务实践中积累的高效特性,PaddleFormers 致力于打造**高性能、低资源占用**的训练体验,帮助用户高效便捷地完成大模型训练,而无需关注底层复杂的优化细节。

🆕最新更新

  • 2026.01.21 - PaddleFomers v1.0版本发布啦!我们提供了针对 LLM 和 VLM 等模型的训练能力,针对 DeepSeek-V3模型和 GLM-4.5-Air 等重点模型,我们实现了极致性能优化(训练性能明显超越 Megatron-LM )。针对 PaddleOCR-VL,我们在昆仑芯 P800、天数天垓150等国产计算芯片上进行了适配,更好的满足国内用户需求。

✨特性

  • 丰富的模型支持: PaddleFormers 实现了对于 100+ 主流的大语言模型和视觉语言模型的训练能力支持,涵盖了 DeepSeek-V3、GLM-4.5系列、Qwen2和 Qwen3系列、Qwen3-VL 等前沿模型。同时提供了对 ERNIE-4.5、ERNIE-4.5-VL、PaddleOCR-VL 等文心系列模型完备的训练能力。
  • 高性能组网实现: 实现了 FP8低精度训练与高性能算子优化、通信计算重叠优化、精细化存算均衡等策略,大幅提升大模型训练的计算、通信和存储效率。在 DeepSeek-V3、GLM-4.5-Air 等模型上,训练性能明显超越 Megatron-LM。
  • 全流程能力支持: PaddleFormers 实现了从预训练到后训练的全流程训练能力支持,其中后训练支持 CPT / SFT / SFT-LoRA / DPO / DPO-LoRA 等主流能力,帮助用户高效、便捷地完成大模型的迭代与优化。PaddleFormers 还实现了对 Safetensors 格式的 全面支持 ,训练完成的模型,其存储格式与 Hugging Face 上托管的权重格式一致,可以在任意支持该格式的框架或工具中使用(如 FastDeploy / vLLM / SGLang 等)。
  • 完备的训练能力支持: PaddleFormers 实现了对于 Function Call 、 Thinking​ 等大模型前沿能力的训练支持,并通过 Data Packing 、 Padding Free​ 等数据流技术显著优化训练性能。
  • 国产芯片深度适配: 支持昆仑芯 P800、天数天垓150、沐曦 C550等国产计算平台,基于128卡昆仑芯 P800支持 DeepSeek V3的 SFT,成为最少国产算力资源后训练方案。

📋模型列表

模型类型 模型系列 模型名称 Chat Template
LLM DeepSeekv3 deepseek-ai/DeepSeek-V3-Base、deepseek-ai/DeepSeek-V3、deepseek-ai/DeepSeek-V3-0324 deepseek3
🏛️ERNIE-4.5 baidu/ERNIE-4.5-0.3B-Base-PT、baidu/ERNIE-4.5-0.3B-PT、baidu/ERNIE-4.5-21B-A3B-Base-PT、baidu/ERNIE-4.5-21B-A3B-PT、baidu/ERNIE-4.5-300B-A47B-Base-PT、baidu/ERNIE-4.5-300B-A47B-PT、baidu/ERNIE-4.5-21B-A3B-Thinking ernie、ernie_nothink
gemma3 google/gemma-3-270m、google/gemma-3-270m-it、google/gemma-3-1b-pt、google/gemma-3-1b-it、google/gemma-3-4b-pt、google/gemma-3-4b-it、google/gemma-3-12b-pt、google/gemma-3-12b-it、google/gemma-3-27b-pt、google/gemma-3-27b-it gemma
GLM-4.5 zai-org/GLM-4.5-Air-Base、zai-org/GLM-4.5-Air、zai-org/GLM-4.5-Base、zai-org/GLM-4.5 glm4_moe
gpt-oss openai/gpt-oss-20b、openai/gpt-oss-120b gpt
Llama-3 meta-llama/Meta-Llama-3-8B、meta-llama/Meta-Llama-3-8B-Instruct、meta-llama/Meta-Llama-3-70B、meta-llama/Meta-Llama-3-70B-Instruct、meta-llama/Llama-3.1-8B、meta-llama/Llama-3.1-8B-Instruct、meta-llama/Llama-3.1-70B、meta-llama/Llama-3.1-70B-Instruct、meta-llama/Llama-3.1-405B、meta-llama/Llama-3.1-405B-Instruct、meta-llama/Llama-3.2-1B、meta-llama/Llama-3.2-1B-Instruct、meta-llama/Llama-3.2-3B、meta-llama/Llama-3.2-3B-Instruct、meta-llama/Llama-3.3-70B-Instruct llama3
phi-4 microsoft/phi-4 phi4
Qwen2 Qwen/Qwen2-0.5B、Qwen/Qwen2-0.5B-Instruct、Qwen/Qwen2-1.5B、Qwen/Qwen2-1.5B-Instruct、Qwen/Qwen2-7B、Qwen/Qwen2-7B-Instruct、Qwen/Qwen2-57B-A14B、Qwen/Qwen2-57B-A14B-Instruct、Qwen/Qwen2-72B、Qwen/Qwen2-0.5B-Instruct qwen
Qwen3 Qwen/Qwen3-0.6B-Base、Qwen/Qwen3-0.6B、Qwen/Qwen3-1.7B-Base、Qwen/Qwen3-1.7B、Qwen/Qwen3-4B-Base、Qwen/Qwen3-4B、Qwen/Qwen3-4B-Instruct-2507、Qwen/Qwen3-4B-Thinking-2507、Qwen/Qwen3-8B-Base、Qwen/Qwen3-8B、Qwen/Qwen3-14B-Base、Qwen/Qwen3-14B、Qwen/Qwen3-32B、Qwen/Qwen3-30B-A3B-Base、Qwen/Qwen3-30B-A3B、Qwen/Qwen3-30B-A3B-Instruct-2507、Qwen/Qwen3-30B-A3B-Thinking-2507、Qwen/Qwen3-235B-A22B、Qwen/Qwen3-235B-A22B-Instruct-2507、Qwen/Qwen3-235B-A22B-Thinking-2507 qwen3、qwen3_nothink
Qwen3-Next Qwen/Qwen3-Next-80B-A3B-Instruct、Qwen/Qwen3-Next-80B-A3B-Thinking qwen3、qwen3_nothink
VLM 🏛️ERNIE-4.5-VL baidu/ERNIE-4.5-VL-28B-A3B-Base-PT、baidu/ERNIE-4.5-VL-28B-A3B-PT、baidu/ERNIE-4.5-VL-424B-A47B-Base-PT、baidu/ERNIE-4.5-VL-424B-A47B-PT、baidu/ERNIE-4.5-VL-28B-A3B-Thinking ernie_vl、ernie_vl_nothink
🏛️PaddleOCR-VL PaddlePaddle/PaddleOCR-VL paddleocr_vl
Qwen2.5-VL Qwen/Qwen2.5-VL-3B-Instruct、Qwen/Qwen2.5-VL-7B-Instruct、Qwen/Qwen2.5-VL-32B-Instruct、Qwen/Qwen2.5-VL-72B-Instruct qwen2_vl
Qwen3-VL Qwen/Qwen3-VL-2B-Instruct、Qwen/Qwen3-VL-2B-Thinking、Qwen/Qwen3-VL-4B-Instruct、Qwen/Qwen3-VL-4B-Thinking、Qwen/Qwen3-VL-8B-Instruct、Qwen/Qwen3-VL-8B-Thinking、Qwen/Qwen3-VL-32B-Instruct、Qwen/Qwen3-VL-32B-Thinking、Qwen/Qwen3-VL-30B-A3B-Instruct、Qwen/Qwen3-VL-30B-A3B-Thinking、Qwen/Qwen3-VL-235B-A22B-Instruct、Qwen/Qwen3-VL-235B-A22B-Thinking qwen3_vl、qwen3_vl_nothink
  • 更多关于模型训练能力的支持细节,请参考:PaddleFormers 模型能力矩阵
  • 带有🏛️标签的模型是 PaddleFormers 官方维护的模型

💾安装

环境依赖

  • python ≥ 3.10
  • CUDA ≥ 12.0
  • PaddleFleet ≥ 0.1(仅为 GPU 训练功能依赖)

安装依赖(GPU)

基于 Docker 容器的方式(推荐)

为了避免本地环境存在较多冲突,我们建议使用 PaddleFormers 的预置镜像来准备环境,容器中已经拉取了 PaddleFormers 仓库并完成了安装:

# 以cuda12.6为例
docker run --gpus all --name paddleformers-work -v $(pwd):/work  \
    -w=/work --shm-size=512G --network=host -it \
    ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.3.0-gpu-cuda12.6-cudnn9.5 /bin/bash

# cuda12.9镜像:ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.3.0-gpu-cuda12.9-cudnn9.9
# cuda13.0镜像:ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.3.0-gpu-cuda13.0-cudnn9.13

基于 pip/源码的安装方式

我们推荐使用 conda / venv / uv 等虚拟环境工具管理 python 环境。

# conda
conda create -n paddleformers-work python=3.10 #支持python3.10~3.13
conda activate paddleformers-work
# venv
python -m venv .paddleformers-work
source .paddleformers-work/bin/activate
# uv
uv venv .paddleformers-work
source .paddleformers-work/bin/activate

安装方案一: 拉取源码安装

# Install development version
git clone https://github.com/PaddlePaddle/PaddleFormers.git
cd PaddleFormers
# cuda12.6
python -m pip install -e '.[paddlefleet]' --extra-index-url https://www.paddlepaddle.org.cn/packages/nightly/cu126/ --extra-index-url https://www.paddlepaddle.org.cn/packages/stable/cu126/
# cuda12.9
# python -m pip install -e '.[paddlefleet]' --extra-index-url https://www.paddlepaddle.org.cn/packages/nightly/cu129/ --extra-index-url https://www.paddlepaddle.org.cn/packages/stable/cu129/
# cuda13.0
# python -m pip install -e '.[paddlefleet]' --extra-index-url https://www.paddlepaddle.org.cn/packages/nightly/cu130/ --extra-index-url https://www.paddlepaddle.org.cn/packages/stable/cu130/

安装方案二: 如果您不想拉取源码,可以基于下面的命令安装 PaddleFormers 和 PaddleFleet。

# Install via pip
# cuda12.6
python -m pip install paddleformers[paddlefleet] --extra-index-url https://www.paddlepaddle.org.cn/packages/stable/cu126/
# cuda12.9
# python -m pip install paddleformers[paddlefleet] --extra-index-url https://www.paddlepaddle.org.cn/packages/stable/cu129/
# cuda13.0
# python -m pip install paddleformers[paddlefleet] --extra-index-url https://www.paddlepaddle.org.cn/packages/stable/cu130/

安装方案三: 如果您只需使用 tokenizer 或者 processor,可以通过以下命令安装,这种情况下不会安装训练相关的依赖,安装速度更加快。

python -m pip install paddleformers

安装依赖(XPU & ILUVATAR-GPU & Metax GPU)

⚡快速体验

PaddleFormers 在 API 设计上与 Hugging Face Transformers 保持了高度一致,使用示例如下:

使用 tokenizer

from paddleformers.transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-0.6B-Base")
print(tokenizer.encode("中华人民共和国"))
# 中华人民共和国将会被编码为两个token:
# [105492, 104773]

文本生成

from paddleformers.transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-0.6B-Base")
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-0.6B-Base", dtype="bfloat16").eval()

input_features = tokenizer("请给我一段大模型的简短介绍:", return_tensors="pd")
outputs = model.generate(**input_features, max_new_tokens=128)

print(tokenizer.batch_decode(outputs[0], skip_special_tokens=True))

模型训练

paddleformers-cli train ./examples/config/sft/full.yaml

📊数据处理

🚀模型训练 & 部署

💻多硬件使用

🔍最佳实践

➕其他

💬社区相关

贡献代码

  • 欢迎社区用户为 PaddleFormers 贡献代码,详情请参考 贡献指南。

和我们交流

  • 微信扫描二维码并填写问卷,即可加入交流群与众多社区开发者以及官方团队深度交流.
qrcode

🙏致谢

我们借鉴了 Hugging Face 的Transformers🤗关于预训练模型使用的优秀设计,在此对 Hugging Face 作者及其开源社区表示感谢。

📜许可证

PaddleFormers 遵循Apache-2.0开源协议。