Baichuan-13B

A 13B large language model developed by Baichuan Intelligent Technology

2,951

236

2,951

View on GitHub

Top Related Projects

gpt-neox

7,328

An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries

ChatGLM-6B

41,153

ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型

Yi

7,842

A series of large language models trained from scratch by developers @01-ai

Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We also show you how to solve end to end problems using Llama model family and using them on various provider services

Quick Overview

Baichuan-13B is an open-source large language model (LLM) developed by Baichuan Intelligence. It is a 13 billion parameter model trained on a diverse multilingual dataset, with a focus on Chinese and English languages. The model aims to provide high-quality natural language processing capabilities for various applications.

Pros

Open-source and freely available for research and commercial use
Strong performance in both Chinese and English language tasks
Supports efficient inference on consumer-grade hardware
Actively maintained and updated by the Baichuan team

Cons

Limited documentation and examples compared to some other popular LLMs
May require fine-tuning for specific domain applications
Performance in languages other than Chinese and English may be less robust
Potential biases and limitations inherent to large language models

Getting Started

To use Baichuan-13B, follow these steps:

Install the required dependencies:

pip install transformers torch

Load the model and tokenizer:

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("baichuan-inc/Baichuan-13B-Chat", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("baichuan-inc/Baichuan-13B-Chat", trust_remote_code=True)

Generate text:

inputs = tokenizer("Hello, how are you?", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Note: This is a basic example. For more advanced usage and fine-tuning, refer to the project's documentation and the Hugging Face Transformers library documentation.

Competitor Comparisons

llama

59,042

Inference code for Llama models

Pros of Llama

More extensive documentation and community support
Broader language support and multilingual capabilities
Higher flexibility for fine-tuning and customization

Cons of Llama

Larger model size, requiring more computational resources
Potentially slower inference time for certain tasks
More complex licensing and usage restrictions

Code Comparison

Baichuan-13B example:

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("baichuan-inc/Baichuan-13B-Chat", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("baichuan-inc/Baichuan-13B-Chat", trust_remote_code=True)

Llama example:

from transformers import LlamaTokenizer, LlamaForCausalLM

tokenizer = LlamaTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf")
model = LlamaForCausalLM.from_pretrained("meta-llama/Llama-2-7b-chat-hf")

Both repositories provide pre-trained language models, but they differ in their approach and focus. Llama offers a more versatile and widely-supported model, while Baichuan-13B is more specialized for certain tasks and languages. The code examples demonstrate similar usage patterns, with minor differences in model initialization and tokenizer selection.

gpt-neox

7,328

An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries

Pros of gpt-neox

More extensive documentation and examples for training and fine-tuning
Broader community support and contributions
Designed for distributed training across multiple GPUs and nodes

Cons of gpt-neox

Larger model size, requiring more computational resources
More complex setup and configuration process
Less focus on multilingual capabilities compared to Baichuan-13B

Code Comparison

gpt-neox:

from megatron import get_args
from megatron import print_rank_0
from megatron import get_tokenizer
from megatron import get_model
from megatron.training import train

train(model_provider=get_model,
      optimizer=None,
      model_type='GPT')

Baichuan-13B:

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("baichuan-inc/Baichuan-13B-Chat", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("baichuan-inc/Baichuan-13B-Chat", trust_remote_code=True)

The code snippets highlight the different approaches to model initialization and training. gpt-neox uses a custom training loop with Megatron-LM, while Baichuan-13B leverages the Hugging Face Transformers library for easier integration and use.

ChatGLM-6B

41,153

ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型

Pros of ChatGLM-6B

Smaller model size (6B parameters) requires less computational resources
Designed for efficient inference on consumer-grade hardware
Supports both Chinese and English languages

Cons of ChatGLM-6B

Lower parameter count may result in less sophisticated responses
Limited fine-tuning options compared to Baichuan-13B
Less extensive training data, potentially affecting performance on diverse tasks

Code Comparison

ChatGLM-6B:

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().cuda()

Baichuan-13B:

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("baichuan-inc/Baichuan-13B-Chat", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("baichuan-inc/Baichuan-13B-Chat", trust_remote_code=True).half().cuda()

The code snippets show similar usage patterns, with minor differences in model loading and class names. Both repositories utilize the Hugging Face Transformers library for easy integration and deployment.

Yi

7,842

A series of large language models trained from scratch by developers @01-ai

Pros of Yi

More extensive documentation and examples provided in the repository
Offers pre-trained models in both 6B and 34B parameter sizes
Includes detailed model cards with performance metrics and benchmarks

Cons of Yi

Less community engagement and fewer third-party contributions
Limited multilingual support compared to Baichuan-13B
Fewer fine-tuned variants available for specific tasks

Code Comparison

Yi:

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("01-ai/Yi-6B")
model = AutoModelForCausalLM.from_pretrained("01-ai/Yi-6B")

Baichuan-13B:

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("baichuan-inc/Baichuan-13B-Chat")
model = AutoModelForCausalLM.from_pretrained("baichuan-inc/Baichuan-13B-Chat")

Both repositories use similar code for loading models and tokenizers through the Hugging Face Transformers library. The main difference lies in the model names and sizes available. Yi offers 6B and 34B variants, while Baichuan-13B focuses on the 13B parameter size with chat-specific fine-tuning.

llama-cookbook

18,145

Pros of llama-cookbook

Comprehensive documentation and examples for working with LLaMA models
Broader scope, covering various aspects of LLM usage and fine-tuning
Active community support and regular updates

Cons of llama-cookbook

Not a standalone model, requires access to Meta's LLaMA models
May have higher computational requirements for running examples

Code Comparison

Baichuan-13B (model loading):

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("baichuan-inc/Baichuan-13B-Chat", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("baichuan-inc/Baichuan-13B-Chat", trust_remote_code=True)

llama-cookbook (model loading):

from transformers import LlamaTokenizer, LlamaForCausalLM

tokenizer = LlamaTokenizer.from_pretrained("path/to/llama/tokenizer")
model = LlamaForCausalLM.from_pretrained("path/to/llama/model")

The main difference in code is that Baichuan-13B can be directly loaded from the Hugging Face model hub, while llama-cookbook requires local model files. Additionally, Baichuan-13B uses the trust_remote_code=True parameter for custom model implementations.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Baichuan-13B

ð¤ Baichuan-13B-Base â¢ ð¤ Baichuan-13B-Chat â¢ ð¤ ModelScope â¢ ð¬ WeChat

ä¸æ | English

æ´æ°ä¿¡æ¯

[2023.09.06] æä»¬åå¸äºæ°ä¸ä»£å¼æºæ¨¡å Baichuan 2ï¼åå« 7Bã13B å°ºå¯¸ ð¥ð¥ð¥
[2023.08.01] æ´æ°äºå¯¹é½æ¨¡å Baichuan-13B-Chat æéï¼ä¼åäºé¨ååºæ¯çææ

ç®å½

ä»ç»
Benchmarkç»æ
æ¨¡åç»è
æ¨çåé¨ç½²
å¯¹æ¨¡åè¿è¡å¾®è°
å£°æ
åè®®

ä»ç»

Baichuan-13B æ¯ç±ç¾å·æºè½ç»§ Baichuan-7B ä¹åå¼åçåå« 130 äº¿åæ°çå¼æºå¯åç¨çå¤§è§æ¨¡è¯è¨æ¨¡åï¼å¨æå¨çä¸æåè±æ benchmark ä¸ååå¾åå°ºå¯¸æå¥½çææãæ¬æ¬¡åå¸åå«æé¢è®ç» (Baichuan-13B-Base) åå¯¹é½ (Baichuan-13B-Chat) ä¸¤ä¸ªçæ¬ãBaichuan-13B æå¦ä¸å ä¸ªç¹ç¹ï¼

**æ´å¤§å°ºå¯¸ãæ´å¤æ°æ®**ï¼Baichuan-13B å¨ Baichuan-7B çåºç¡ä¸è¿ä¸æ¥æ©å¤§åæ°éå° 130 äº¿ï¼å¹¶ä¸å¨é«è´¨éçè¯æä¸è®ç»äº 1.4 ä¸äº¿ tokensï¼è¶è¿ LLaMA-13B 40%ï¼æ¯å½åå¼æº 13B å°ºå¯¸ä¸è®ç»æ°æ®éæå¤çæ¨¡åãæ¯æä¸è±åè¯ï¼ä½¿ç¨ ALiBi ä½ç½®ç¼ç ï¼ä¸ä¸æçªå£é¿åº¦ä¸º 4096ã
åæ¶å¼æºé¢è®ç»åå¯¹é½æ¨¡åï¼é¢è®ç»æ¨¡åæ¯éç¨å¼åèçã åºåº§ ãï¼èå¹¿å¤§æ®éç¨æ·å¯¹æå¯¹è¯åè½çå¯¹é½æ¨¡åå·ææ´å¼ºçéæ±ãå æ¤æ¬æ¬¡å¼æºæä»¬åæ¶åå¸äºå¯¹é½æ¨¡åï¼Baichuan-13B-Chatï¼ï¼å·æå¾å¼ºçå¯¹è¯è½åï¼å¼ç®±å³ç¨ï¼å è¡ä»£ç å³å¯ç®åçé¨ç½²ã
æ´é«æçæ¨çï¼ä¸ºäºæ¯ææ´å¹¿å¤§ç¨æ·çä½¿ç¨ï¼æä»¬æ¬æ¬¡åæ¶å¼æºäº int8 å int4 çéåçæ¬ï¼ç¸å¯¹ééåçæ¬å¨å ä¹æ²¡ææææå¤±çæåµä¸å¤§å¤§éä½äºé¨ç½²çæºå¨èµæºé¨æ§ï¼å¯ä»¥é¨ç½²å¨å¦ Nvidia 3090 è¿æ ·çæ¶è´¹çº§æ¾å¡ä¸ã
**å¼æºåè´¹å¯åç¨**ï¼Baichuan-13B ä¸ä»å¯¹å¦æ¯ç ç©¶å®å¨å¼æ¾ï¼å¼åèä¹ä»éé®ä»¶ç³è¯·å¹¶è·å¾å®æ¹åç¨è®¸å¯åï¼å³å¯ä»¥åè´¹åç¨ã

Benchmarkç»æ

C-Eval

Model 5-shot	STEM	Social Sciences	Humanities	Others	Average
Baichuan-7B	38.2	52.0	46.2	39.3	42.8
Chinese-Alpaca-Plus-13B	35.2	45.6	40.0	38.2	38.8
Vicuna-13B	30.5	38.2	32.5	32.5	32.8
Chinese-LLaMA-Plus-13B	30.3	38.0	32.9	29.1	32.1
Ziya-LLaMA-13B-Pretrain	27.6	34.4	32.0	28.6	30.0
LLaMA-13B	27.0	33.6	27.7	27.6	28.5
moss-moon-003-base (16B)	27.0	29.1	27.2	26.9	27.4
Baichuan-13B-Base	45.9	63.5	57.2	49.3	52.4
Baichuan-13B-Chat	43.7	64.6	56.2	49.2	51.5

MMLU

Model 5-shot	STEM	Social Sciences	Humanities	Others	Average
Vicuna-13B	40.4	60.5	49.5	58.4	52.0
LLaMA-13B	36.1	53.0	44.0	52.8	46.3
Chinese-Alpaca-Plus-13B	36.9	48.9	40.5	50.5	43.9
Ziya-LLaMA-13B-Pretrain	35.6	47.6	40.1	49.4	42.9
Baichuan-7B	35.6	48.9	38.4	48.1	42.3
Chinese-LLaMA-Plus-13B	33.1	42.8	37.0	44.6	39.2
moss-moon-003-base (16B)	22.4	22.8	24.2	24.4	23.6
Baichuan-13B-Base	41.6	60.9	47.4	58.5	51.6
Baichuan-13B-Chat	40.9	60.9	48.8	59.0	52.1

è¯´æï¼æä»¬éç¨äº MMLU å®æ¹çè¯æµæ¹æ¡ã

CMMLU

Model 5-shot	STEM	Humanities	Social Sciences	Others	China Specific	Average
Baichuan-7B	34.4	47.5	47.6	46.6	44.3	44.0
Vicuna-13B	31.8	36.2	37.6	39.5	34.3	36.3
Chinese-Alpaca-Plus-13B	29.8	33.4	33.2	37.9	32.1	33.4
Chinese-LLaMA-Plus-13B	28.1	33.1	35.4	35.1	33.5	33.0
Ziya-LLaMA-13B-Pretrain	29.0	30.7	33.8	34.4	31.9	32.1
LLaMA-13B	29.2	30.8	31.6	33.0	30.5	31.2
moss-moon-003-base (16B)	27.2	30.4	28.8	32.6	28.7	29.6
Baichuan-13B-Base	41.7	61.1	59.8	59.0	56.4	55.3
Baichuan-13B-Chat	42.8	62.6	59.7	59.0	56.1	55.8

è¯´æï¼CMMLU æ¯ä¸ä¸ªç»¼åæ§çä¸æè¯ä¼°åºåï¼ä¸é¨ç¨äºè¯ä¼°è¯è¨æ¨¡åå¨ä¸æè¯å¢ä¸çç¥è¯åæ¨çè½åãæä»¬éç¨äºå¶å®æ¹çè¯æµæ¹æ¡ã

æ¨¡åç»è

æ¨¡ååç§°	éèå±ç»´åº¦	å±æ°	æ³¨æåå¤´æ°	è¯è¡¨å¤§å°	æ»åæ°é	è®ç»æ°æ®ï¼tokensï¼	ä½ç½®ç¼ç	æå¤§é¿åº¦
Baichuan-7B	4,096	32	32	64,000	7,000,559,616	1.2 ä¸äº¿	RoPE	4,096
Baichuan-13B	5,120	40	40	64,000	13,264,901,120	1.4 ä¸äº¿	ALiBi	4,096

æ¨çåé¨ç½²

æ¨çæéçæ¨¡åæéãæºç ãéç½®å·²åå¸å¨ Hugging Faceï¼Baichuan-13B-Base å Baichuan-13B-Chatãä¸é¢ä»¥ Baichuan-13B-Chat ä¸ºä¾ç¤ºèå¤ç§æ¨çæ¹å¼ãç¨åºä¼èªå¨ä» Hugging Face ä¸è½½æéèµæºã

æ¨çåè¯·å®è£ä¾èµï¼

pip install -r requirements.txt

Pythonä»£ç æ¹å¼

>>> import torch
>>> from transformers import AutoModelForCausalLM, AutoTokenizer
>>> from transformers.generation.utils import GenerationConfig
>>> tokenizer = AutoTokenizer.from_pretrained("baichuan-inc/Baichuan-13B-Chat", use_fast=False, trust_remote_code=True)
>>> model = AutoModelForCausalLM.from_pretrained("baichuan-inc/Baichuan-13B-Chat", device_map="auto", torch_dtype=torch.float16, trust_remote_code=True)
>>> model.generation_config = GenerationConfig.from_pretrained("baichuan-inc/Baichuan-13B-Chat")
>>> messages = []
>>> messages.append({"role": "user", "content": "ä¸çä¸ç¬¬äºé«çå±±å³°æ¯åªåº§"})
>>> response = model.chat(tokenizer, messages)
>>> print(response)
ä¹æéå³°ãä¸çç¬¬äºé«å³°âââä¹æéå³°è¥¿æ¹ç»å±±èç§°å¶ä¸ºk2å³°ï¼æµ·æé«åº¦æ¯8611ç±³ï¼ä½äºååæä»å±±èçä¸å·´è¾¹å¢ä¸

å¨ä¸è¿°ä»£ç ä¸ï¼æ¨¡åå è½½æå® device_map='auto'ï¼ä¼ä½¿ç¨ææå¯ç¨æ¾å¡ãå¦éæå®ä½¿ç¨çè®¾å¤ï¼å¯ä»¥ä½¿ç¨ç±»ä¼¼ export CUDA_VISIBLE_DEVICES=0,1ï¼ä½¿ç¨äº0ã1å·æ¾å¡ï¼çæ¹å¼æ§å¶ã

å½ä»¤è¡å·¥å·æ¹å¼

python cli_demo.py

æåè¾åºç¤ºä¾å¦ä¸ï¼

ç½é¡µ demo æ¹å¼

streamlit run web_demo.py

ææå¦ä¸ï¼

Baichuan-13B-Chat ç¤ºä¾è¾åº

åå®¹åä½

ç¨æ·ï¼
ä½ æ¯ä¸ä¸ªåå®¹åä½èï¼ç¼åä¸æ®µå³äºå¯¹é©¬æ¯ååæ¯å°çè¨è®¿é®ä¸å½è§è§ççè§é¢èæ¬

Baichun-13B-Chatï¼
æ¢ç´¢é©¬æ¯åä¸æ¯å°Â·çè¨çä¸å½ä¹æ

éå¤´1ï¼(èæ¯é³ä¹)ç»é¢åå¥ï¼å±ç¤ºä¸å½çç¹åé½å¸æ¯è±¡ã

æç½ï¼æ¬¢è¿æ¥å°æä»¬çèç®ï¼ä»å¤©æä»¬å°å¸¦æ¨ä¸èµ·äºè§£ä¸¤ä½ç§æå·¨å¤´ââåéÂ·é©¬æ¯ååæ¯å°Â·çè¨å¨ä¸å½çä¸æ¬¡ç¹æ®è®¿é®ã

éå¤´2ï¼(èæ¯é³ä¹)ç»é¢åæ¢è³é©¬æ¯ååæ¯å°Â·çè¨æµè¾¾ä¸å½çæºåºã

æç½ï¼è¿æ¬¡è®¿é®æ¯ä»ä»¬é¦æ¬¡å±åè¸ä¸ä¸å½çåå°ï¼ä»ä»¬å°å¨è¿éå±å¼ä¸åºå³äºç§æåæ°ãç¯ä¿åæè²çå¯¹è¯ã

éå¤´3ï¼(èæ¯é³ä¹)ç»é¢å±ç¤ºé©¬æ¯ååæ¯å°Â·çè¨åè§ä¸å®¶çµå¨æ±½è½¦å¶é åã

æç½ï¼å¨è¿æ¬¡è®¿é®ä¸ï¼ä»ä»¬é¦ååè§äºä¸å®¶çµå¨æ±½è½¦å¶é åï¼äºè§£äºä¸å½å¨æ°è½æºæ±½è½¦é¢åçææ°åå±ã

éå¤´4ï¼(èæ¯é³ä¹)ç»é¢å±ç¤ºé©¬æ¯ååæ¯å°Â·çè¨åå ä¸åºå³äºäººå·¥æºè½çè®ºåã

æç½ï¼éåï¼ä»ä»¬åå äºå³äºäººå·¥æºè½çä¸åºè®ºåï¼æ¢è®¨äºè¿ä¸é¢åçåå±è¶å¿åå¶å¯¹æªæ¥çå½±åã

éå¤´5ï¼(èæ¯é³ä¹)ç»é¢å±ç¤ºé©¬æ¯ååæ¯å°Â·çè¨åè§ä¸æå¦æ ¡ã

æç½ï¼å¨è¿æ¬¡è®¿é®ä¸ï¼ä»ä»¬è¿åè§äºä¸æå¦æ ¡ï¼äºè§£äºä¸å½å¨æè²é¢åçåæ°å®è·µã

éå¤´6ï¼(èæ¯é³ä¹)ç»é¢å±ç¤ºé©¬æ¯ååæ¯å°Â·çè¨æ¥ååªä½éè®¿ã

æç½ï¼æåï¼ä»ä»¬å¨åªä½éè®¿ç¯èè¡¨ç¤ºï¼å¸æéè¿è¿æ¬¡è®¿é®ï¼è½å¤ä¸ºä¸ç¾ä¸¤å½å¨ç§æåæ°ãç¯ä¿åæè²çé¢åçåä½æä¾ä¸äºå¯ç¤ºã

éå¤´7ï¼(èæ¯é³ä¹)ç»é¢ååç¹åé½å¸æ¯è±¡ã

æç½ï¼è¿å°±æ¯é©¬æ¯ååæ¯å°Â·çè¨çä¸å½ä¹è¡ï¼ä»ä»¬çè®¿é®ä¸ä»è®©æä»¬çå°äºä¸å½å¨ç§æåæ°æ¹é¢çå·¨å¤§æ½åï¼ä¹ä¸ºæä»¬æä¾äºå³äºæªæ¥åå±çæ´å¤æèãæè°¢æ¶çï¼æä»¬ä¸æåè§ï¼

å¹¿åææ¡

ç¨æ·ï¼
è¯·ä»¥éå®åçè§è²ï¼å°æ´è¡£æ¶²çå¹²åãæ æ®çãæ å¼å³ä»¥åé¤èçä¼ç¹è¿è¡æ¨é

Baichun-13B-Chatï¼
å°æ¬çé¡¾å®¢ï¼æ¨å¥½ï¼ææ¯ä¸åæ´è¡£æ¶²éå®åï¼éå¸¸é«å´ä¸ºæ¨ä»ç»æä»¬è¿æ¬¾äº§åãæä»¬çæ´è¡£æ¶²å·æä»¥ä¸å ä¸ªæ¾èä¼ç¹ï¼

å¹²åï¼æä»¬çæ´è¡£æ¶²éç¨äºåè¿çæ´æ¶¤ææ¯ï¼è½å¤æ·±å¥è¡£ç©çº¤ç»´ï¼ææå»é¤åç§æ±¡æ¸ï¼è®©æ¨çè¡£ç©çç¶ä¸æ°ãåæ¶ï¼å®ä¸ä¼å¯¹è¡£ç©çé¢è²åè´¨å°é æä»»ä½æå®³ï¼è®©æ¨çè¡£ç©ä¿æåæçåè´¨ã

æ æ®çï¼ä¸ä¼ ç»çæ´è¡£ç²ç¸æ¯ï¼æä»¬çæ´è¡£æ¶²å¨æ¸æ´è¿ç¨ä¸ä¸ä¼äº§çè¿å¤çæ³¡æ²«ï¼å æ¤æ¸æ´åçè¡£ç©æ´å æè½¯ï¼ä¸ä¼æé»è»çæè§ãæ¤å¤ï¼å®çå»æ±¡è½åå¼ºï¼è½å½»åºæ´åè¡£ç©ä¸çæ±¡æ¸ï¼é¿åäºä¼ ç»æ´è¡£ç²å®¹ææ®çå¨è¡£ç©ä¸çé®é¢ã

æ å¼å³ï¼æä»¬çæ´è¡£æ¶²éç¨å¤©ç¶æ¤ç©æåç©ä½ä¸ºåæï¼ä¸å«æå®³åå¦ç©è´¨ï¼å æ¤å¨ä½¿ç¨è¿ç¨ä¸ä¸ä¼äº§çåºé¼»çæ°å³ãå³ä½¿æ¯ææèè¤çäººç¾¤ä¹å¯ä»¥æ¾å¿ä½¿ç¨ï¼è®©æ¨çè¡£ç©æ£ååºæ·¡éçæ¸é¦ã

é¤èï¼ä¸ºäºæ»¡è¶³ç°ä»£äººå¯¹å¥åº·çæ´»çéæ±ï¼æä»¬çæ´è¡£æ¶²ç¹å«æ·»å äºé«æé¤èæåï¼è½å¤æææçè¡£ç©ä¸çç»èï¼ä¿æ¤æ¨åå®¶äººçèº«ä½å¥åº·ãæ è®ºæ¯å©´å¹¼å¿çè¡£ç©è¿æ¯è´´èº«åè¡£ï¼é½å¯ä»¥æ¾å¿ä½¿ç¨ã

æ»ä¹ï¼æä»¬çæ´è¡£æ¶²ååå¶å¹²åãæ æ®çãæ å¼å³ä»¥åé¤èçä¼ç¹ï¼æä¸ºäºå¹¿å¤§æ¶è´¹èçé¦éãå¦ææ¨è¿æ²¡æå°è¯è¿æä»¬çäº§åï¼é£ä¹ç°å¨å°±æ¯æå¥½çæ¶æºãç¸ä¿¡å®ä¸å®ä¼ç»æ¨å¸¦æ¥å¨æ°çæ´è¡£ä½éªï¼

ç¨æ·ï¼
åå¤ä¸ä»½ä»¥å® ç©é£åä¸ºä¸»é¢çæ¥åï¼åæ¬æ½å¨æ¶è´¹èï¼å¸åºè§æ¨¡ï¼å¸åºæºéï¼åçå©æ¨¡å¼çåå®¹


Baichun-13B-Chatï¼

ä¸ãå¼è¨

éçäººä»¬çæ´»æ°´å¹³çæé«åå¯¹å® ç©çå³ç±ç¨åº¦ä¸æå æ·±ï¼å® ç©é£åè¡ä¸å¾å°äºå¿«éåå±ãæ¬æ¥åå°å¯¹å® ç©é£åè¡ä¸çæ½å¨æ¶è´¹èãå¸åºè§æ¨¡ãå¸åºæºéä»¥åçå©æ¨¡å¼çæ¹é¢è¿è¡åæï¼ä»¥æä¸ºç¸å³ä¼ä¸æä¾æççåèä¿¡æ¯ã

äºãæ½å¨æ¶è´¹è

1. å»å® äººç¾¤ï¼éçå»å® äººæ°çå¢å ï¼å® ç©é£åçéæ±ä¹å¨ä¸æå¢é¿ãæ ¹æ®ç»è®¡æ°æ®æ¾ç¤ºï¼2019å¹´ä¸å½åéå»å® (ç¬ç«)äººå£æ°éå·²è¾¾7355ä¸äººï¼åæ¯å¢é¿4.6%ãé¢è®¡å°2023å¹´ï¼è¿ä¸æ°åå°è¾¾å°8742ä¸äººã

2. æ°çä»£æ¶è´¹èï¼æ°çä»£æ¶è´¹èå¨æ¶è´¹è§å¿µä¸æ´å æ³¨éå¥åº·ãç¯ä¿åä¸ªæ§åï¼è¿ä½¿å¾ä»ä»¬æ´æ¿æä¸ºå® ç©è´ä¹°é«åè´¨çé£åãæ¤å¤ï¼ä»ä»¬ä¹æ´å¾åäºéè¿ç½ç»å¹³å°äºè§£åè´ä¹°å® ç©é£åã

ä¸ãå¸åºè§æ¨¡

1. æ ¹æ®å½å®¶ç»è®¡å±æ°æ®ï¼2019å¹´æå½å® ç©é£åå¸åºè§æ¨¡è¾¾å°äº1,020äº¿åäººæ°å¸ï¼åæ¯å¢é¿çº¦10%ãé¢è®¡å°2023å¹´ï¼å¸åºè§æ¨¡å°è¾¾å°1,  500äº¿åäººæ°å¸ã

2. ä»äº§åç±»åæ¥çï¼å® ç©å¹²ç²®å¸åºå æ¯æé«ï¼çº¦å æ»å¸åºç70%;å¶æ¬¡æ¯å® ç©æ¹¿ç²®ï¼å æ¯çº¦ä¸º20%;å® ç©é¶é£å¸åºå æ¯çº¦ä¸º10%ã

åãå¸åºæºé

1. åè½æ§å® ç©é£åï¼éçæ¶è´¹èå¯¹å¥åº·çå³æ³¨åº¦ä¸ææé«ï¼å·æç¹å®åè½çå® ç©é£åå¦å¤æ¹ç²®ãåè¥ç²®çéæ±éæ¸å¢å¤§ã

2. å¤©ç¶ææºå® ç©é£åï¼è¶æ¥è¶å¤çæ¶è´¹èå¾åäºéæ©å¤©ç¶ãæ æ·»å åçå® ç©é£åï¼è¿ä¹ä¸ºå¸åºå¸¦æ¥äºæ°çåæºã

3. å®å¶åå® ç©é£åï¼éå¯¹ä¸ååç§ãå¹´é¾ãä½åçå® ç©ï¼æ¨åºå®å¶åçå® ç©é£åï¼æ»¡è¶³æ¶è´¹èä¸ªæ§åéæ±ã

äºãçå©æ¨¡å¼

1. ä¼ ç»éå®æ¸ éï¼éè¿è¶å¸ãå® ç©åºãçµåå¹³å°çä¼ ç»æ¸ ééå®å® ç©é£åã

2. ç´éæ¨¡å¼ï¼éè¿èªå»ºç½ç«æä¸ç¬¬ä¸æ¹å¹³å°åä½ï¼ç´æ¥åæ¶è´¹èéå®å® ç©é£åã

3. ä¼åå¶ï¼éè¿è®¾ç«ä¼åå¶åº¦ï¼ä¸ºä¼åæä¾ä¸å±ä¼æ åæå¡ï¼å¢å å®¢æ·ç²æ§ã

4. å¢å¼æå¡ï¼æä¾å® ç©ç¾å®¹ãå¯å»ãè®ç»çéå æå¡ï¼å®ç°å¤ååçå©ã

åãç»è®º

å® ç©é£åè¡ä¸å·æå·¨å¤§çå¸åºæ½åååå±ç©ºé´ãä¼ä¸åºæä½å¸åºæºéï¼å³æ³¨æ¶è´¹èéæ±ååï¼åæ°äº§å

ç²¾åé®ç

ç¨æ·ï¼
ä¸çä¸ç¬¬äºé«çå±±æ¯ä»ä¹å±±

Baichun-13B-Chatï¼
ä¹æéå³°(K2) æµ·æ8611ç±³ï¼ä½äºååæä»å±±èçä¸å·´è¾¹å¢ä¸ï¼æ¯ä¸å½ä»¥å¤ç»å±±èçæå¤§èéªï¼ä¹æ¯ä¸çä¸æé¾å¾æçå±±å³°ä¹ä¸ãå®çé«åº¦ä»æ¬¡äºç ç©æçå³°ï¼ä½å±ä¸çç¬¬äºã

ç¨æ·ï¼
æå´éçæ¯ä»ä¹ææ

Baichun-13B-Chatï¼
"æå´éç"æ¯ä¸ä¸ªæ±è¯æè¯ï¼å®çæææ¯å½¢å®¹ä¸ä¸ªäººå´è´ä¸é«ï¼å´è¶£åå¼±ãè¿ä¸ªè¯éå¸¸ç¨æ¥æè¿°ä¸ä¸ªäººå¨æä¸ªæ´»å¨ä¸å¤±å»äºåæççæåå´è¶£ã

ç¨æ·ï¼
è¯·é®å¼ºåå¦ä¹ éé¢çReward Shapingæ¯ä»ä¹ææ

Baichun-13B-Chatï¼
å¥å±å¡é (Reward Shaping)æ¯å¼ºåå¦ä¹ ä¸çä¸ä¸ªæ¦å¿µï¼å®æ¯æéè¿è°æ´ç¯å¢æè®¾è®¡ç¹å®çå¥å±å½æ°æ¥å¼å¯¼æºè½ä½(Agent)å¦ä¹ å°æ´ç¬¦åäººç±»ææçè¡ä¸ºãå¨ä¼ ç»çå¼ºåå¦ä¹ ä¸ï¼æºè½ä½éè¿ä¸ç¯å¢äº¤äºæ¥å¦ä¹ æä¼çç¥ï¼ä½è¿ç§æ¹æ³å¯è½å¯¼è´æºè½ä½å¦ä¹ å°ä¸äºæ¬¡ä¼çè³ä¸è¯è¡ä¸ºã

å¥å±å¡é çç®çæ¯ä¸ºäºè§£å³è¿ä¸ªé®é¢ï¼éè¿äººä¸ºå°è°æ´å¥å±å½æ°ï¼ä½¿å¾æºè½ä½æ´å®¹æå¦å°æä»¬å¸æå®å¦ä¼çè¡ä¸ºãè¿å¯ä»¥éè¿ä»¥ä¸å ç§æ¹å¼å®ç°ï¼

1. å¢å æ£åå¥å±ï¼å½æºè½ä½è¡¨ç°åºææè¡ä¸ºæ¶ï¼ç»äºå¶æ´å¤çå¥å±ï¼ä»¥é¼å±è¿ç§è¡ä¸ºã
2. åå°è´åå¥å±ï¼å½æºè½ä½è¡¨ç°åºä¸ææè¡ä¸ºæ¶ï¼åå°å¯¹å¶è¿è¡æ©ç½ï¼éä½å¶å¯¹è¿ç§è¡ä¸ºçåå¥½ã
3. å¼å¥æ°å¥å±ï¼ä¸ºæºè½ä½æä¾é¢å¤çå¥å±ï¼ä»¥ä¾¿æ´å¥½å°å¼å¯¼å¶å¦ä¹ ç¹å®è¡ä¸ºã
4. æ¹åå¥å±åå¸ï¼éè¿è°æ´å¥å±å½æ°ï¼ä½¿å¶å·æä¸åçå¥å±åå¸ï¼ä»èå½±åæºè½ä½çå¦ä¹ è¿ç¨ã

æ»ä¹ï¼å¥å±å¡é æ¯ä¸ç§éè¿è°æ´ç¯å¢åå¥å±å½æ°æ¥å¼å¯¼æºè½ä½å¦ä¹ æ´å¥½è¡ä¸ºçæ¹æ³ï¼æå©äºæé«å¼ºåå¦ä¹ çæ§è½ã

è¯è¨çè§£

ç¨æ·ï¼
å°å¼ ç»é¢å¯¼ä¹°äºä¸ä¸ªç¤¼ç©è¯´ï¼è¯´âææä¸ä¸âï¼é¢å¯¼è¯´âä»ä¹ææâï¼å°å¼ åçè¯´ï¼âæ²¡ä»ä¹ææï¼å°±æ¯éä¾¿ææææâãè¯·ä¾æ¬¡è§£è¯»ä¸ä¸å°å¼ è¯´çè¿å ä¸ªâææâçå«ä¹ã

Baichun-13B-Chatï¼
1. ç¬¬ä¸ä¸ªâææâï¼æçæ¯å°å¼ éç»é¢å¯¼çé£ä¸ªç¤¼ç©ï¼è¡¨ç¤ºä»å·²ç»ä¹°äºè¿ä¸ªç¤¼ç©å¹¶åå¤éç»ä»ã
2. ç¬¬äºä¸ªâææâï¼æçæ¯å°å¼ å¯¹é¢å¯¼çä¸ç§ç¤¼è²åå°éï¼è¡¨ç¤ºèªå·±å¹¶æ²¡æå¿è®°ä»çèº«ä»½åå°ä½ï¼å¨è´ä¹°ç¤¼ç©æ¶ä¹èèå°äºè¿ä¸ç¹ã
3. ç¬¬ä¸ä¸ªâææâï¼æçæ¯å°å¼ å¯¹äºè¿æ¬¡èµ éç¤¼ç©çè¡ä¸ºæ¬èº«çæåº¦ï¼å³ä¸å¸æè¿ä»¶äºæå¼èµ·è¿å¤çå³æ³¨æäºè®®ï¼åªæ¯æ³è¡¨è¾¾ä¸ç§å¿æèå·²ã

æ¨çæ§è½

Baichuan-13B ä½¿ç¨äº ALiBi çº¿æ§åç½®ææ¯ï¼ç¸å¯¹äº Rotary Embedding è®¡ç®éæ´å°ï¼å¯¹æ¨çæ§è½ææ¾èæåï¼ä¸æ åç LLaMA-13B ç¸æ¯ï¼å¹³åæ¨çéåº¦ (tokens/s) å®æµæå 31.6%ï¼

Model	tokens/s
LLaMA-13B	19.4
Baichuan-13B	25.4

æµè¯ç¯å¢ååæ°ï¼GPU A100-SXM4-80G, PyTorch 2.0.0+cu117, transformers 4.29.1, batch size = 1, çæé¿åº¦ = 2048, ç²¾åº¦ fp16, åºäº Baichuan-13B-Base

éåé¨ç½²

Baichuan-13B æ¯æ int8 å int4 éåï¼ç¨æ·åªéå¨æ¨çä»£ç ä¸ç®åä¿®æ¹ä¸¤è¡å³å¯å®ç°ã

ä½¿ç¨éåçç¨æ·è¯·å¡å¿æ³¨æï¼

å¦éä½¿ç¨ int8 éåï¼

model = AutoModelForCausalLM.from_pretrained("baichuan-inc/Baichuan-13B-Chat", torch_dtype=torch.float16, trust_remote_code=True)
model = model.quantize(8).cuda()

åæ ·çï¼å¦éä½¿ç¨ int4 éåï¼

model = AutoModelForCausalLM.from_pretrained("baichuan-inc/Baichuan-13B-Chat", torch_dtype=torch.float16, trust_remote_code=True)
model = model.quantize(4).cuda()

å¦å¤ï¼å¦æä½ ä¸æ³è°ç¨ quantize å¨çº¿éåï¼æä»¬æéåå¥½ç int8 Chat æ¨¡åå¯ä¾ä½¿ç¨ï¼Baichuan-13B-Chat-int8ï¼

model = AutoModelForCausalLM.from_pretrained("baichuan-inc/Baichuan-13B-Chat-int8", torch_dtype=torch.float16, trust_remote_code=True).cuda()

éåååå ç¨æ¾åæåµå¦ä¸ï¼

Precision	GPU Mem (GB)
bf16 / fp16	26.0
int8	15.8
int4	9.7

éååå¨åä¸ª benchmark ä¸çç»æååå§çæ¬å¯¹æ¯å¦ä¸ï¼

Model 5-shot	C-Eval	MMLU	CMMLU
Baichuan-13B-Base	52.4	51.6	55.3
Baichuan-13B-Base-int8	51.2	49.9	54.5
Baichuan-13B-Base-int4	47.6	46.0	51.0

CPU é¨ç½²

model = AutoModelForCausalLM.from_pretrained("baichuan-inc/Baichuan-13B-Chat", torch_dtype=torch.float32, trust_remote_code=True)

ä½¿ç¨CPUè¿è¡æ¨çå¤§æ¦éè¦ 60GB ååã

å¯¹æ¨¡åè¿è¡å¾®è°

å¼åèå¯ä»¥å¯¹ Baichuan-13B-Base æ Baichuan-13B-Chat è¿è¡å¾®è°ä½¿ç¨ãå¨æ¤æä»¬æµè¯äºä¸ Baichuan-13B å¼å®¹çå¾®è°å·¥å· LLaMA Efficient Tuningï¼å¹¶ç»åºå¨éå¾®è°å LoRAå¾®è°çä¸¤ç§ç¤ºèã

[
    {
        "instruction": "What are the three primary colors?",
        "input": "",
        "output": "The three primary colors are red, blue, and yellow."
    },
    ....
]

ä¸é¢æä»¬ç»åºä¸¤ç§å¾®è°åºæ¯ä¸æµè¯è·éçç¤ºèèæ¬ã

å¨éå¾®è°

æä»¬å¨ 8 * Nvidia A100 80 GB + deepspeed çç¯å¢ä¸è¿è¡äºå¨éå¾®è°æµè¯ã

è®ç»å¯å¨èæ¬ç¤ºä¾ï¼

deepspeed --num_gpus=8 src/train_bash.py \
    --stage sft \
    --model_name_or_path baichuan-inc/Baichuan-13B-Base \
    --do_train \
    --dataset alpaca_gpt4_en,alpaca_gpt4_zh \
    --finetuning_type full \
    --output_dir path_to_your_sft_checkpoint \
    --overwrite_cache \
    --per_device_train_batch_size 4 \ 
    --per_device_eval_batch_size 4 \ 
    --gradient_accumulation_steps 8 \ 
    --preprocessing_num_workers 16 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_steps 100 \
    --eval_steps 100 \
    --learning_rate 5e-5 \
    --max_grad_norm 0.5 \
    --num_train_epochs 2.0 \
    --dev_ratio 0.01 \
    --evaluation_strategy steps \
    --load_best_model_at_end \
    --plot_loss \
    --fp16 \
    --deepspeed deepspeed.json

deep_speed.json éç½®ç¤ºä¾ï¼

{
  "train_micro_batch_size_per_gpu": "auto",
  "zero_allow_untested_optimizer": true,
  "fp16": {
    "enabled": "auto",
    "loss_scale": 0,
    "initial_scale_power": 16, 
    "loss_scale_window": 1000,
    "hysteresis": 2,
    "min_loss_scale": 1
  },  
  "zero_optimization": {
    "stage": 2,
    "allgather_partitions": true,
    "allgather_bucket_size": 5e8,
    "overlap_comm": false,
    "reduce_scatter": true,
    "reduce_bucket_size": 5e8,
    "contiguous_gradients" : true
  }
}

LoRAå¾®è°

æä»¬å¨åå¼ Nvidia A100 80G æ¾å¡ä¸è¿è¡äº LoRA å¾®è°æµè¯ã

è®ç»å¯å¨èæ¬ç¤ºä¾ï¼

CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \
    --stage sft \
    --model_name_or_path baichuan-inc/Baichuan-13B-Base \
    --do_train \
    --dataset alpaca_gpt4_en,alpaca_gpt4_zh \
    --finetuning_type lora \
    --lora_rank 8 \ 
    --lora_target W_pack \
    --output_dir path_to_your_sft_checkpoint \
    --overwrite_cache \
    --per_device_train_batch_size 4 \ 
    --per_device_eval_batch_size 4 \ 
    --gradient_accumulation_steps 8 \ 
    --preprocessing_num_workers 16 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_steps 100 \
    --eval_steps 100 \
    --learning_rate 5e-5 \
    --max_grad_norm 0.5 \
    --num_train_epochs 2.0 \
    --dev_ratio 0.01 \
    --evaluation_strategy steps \
    --load_best_model_at_end \
    --plot_loss \
    --fp16

å³äºä½¿ç¨ LLaMA Efficient Tuning çæ´è¯¦ç»çç¨æ³ï¼è¯·åéå¶é¡¹ç®ä¸»é¡µè¯´æã

å£°æ

åè®®

å¯¹æ¬ä»åºæºç çä½¿ç¨éµå¾ªå¼æºè®¸å¯åè®® Apache 2.0ãå¯¹ Baichuan-13B æ¨¡åçç¤¾åºä½¿ç¨è§ãBaichuan-13B æ¨¡åç¤¾åºè®¸å¯åè®®ããBaichuan-13B æ¯æåç¨ãå¦æå° Baichuan-13B æ¨¡åæå¶è¡çåç¨ä½åä¸ç¨éï¼è¯·æ¨æç§å¦ä¸æ¹å¼èç³»è®¸å¯æ¹ï¼ä»¥è¿è¡ç»è®°å¹¶åè®¸å¯æ¹ç³è¯·ä¹¦é¢ææï¼èç³»é®ç®± opensource@baichuan-inc.comã

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

Top Related Projects

Quick Overview

Pros

Cons

Getting Started

Competitor Comparisons

Pros of Llama

Cons of Llama

Code Comparison

Pros of gpt-neox

Cons of gpt-neox

Code Comparison

Pros of ChatGLM-6B

Cons of ChatGLM-6B

Code Comparison

Pros of Yi

Cons of Yi

Code Comparison

Pros of llama-cookbook

Cons of llama-cookbook

Code Comparison

Convert designs to code with AI

README

Baichuan-13B

ä¸­æ | English

æ´æ°ä¿¡æ¯

ç®å½

ä»ç»

Benchmarkç»æ

æ¨¡åç»è

æ¨çåé¨ç½²

Pythonä»£ç æ¹å¼

å½ä»¤è¡å·¥å ·æ¹å¼

ç½é¡µ demo æ¹å¼

Baichuan-13B-Chat ç¤ºä¾è¾åº

æ¨çæ§è½

éåé¨ç½²

ä½¿ç¨éåçç¨æ·è¯·å¡å¿ æ³¨æï¼

CPU é¨ç½²

å¯¹æ¨¡åè¿è¡å¾®è°

å ¨éå¾®è°

LoRAå¾®è°

å£°æ

åè®®

Top Related Projects

Convert designs to code with AI

ä¸æ | English

æ´æ°ä¿¡æ¯

ç®å½

ä»ç»

Benchmarkç»æ

æ¨¡åç»è

æ¨çåé¨ç½²

Pythonä»£ç æ¹å¼

å½ä»¤è¡å·¥å·æ¹å¼

ç½é¡µ demo æ¹å¼

Baichuan-13B-Chat ç¤ºä¾è¾åº

æ¨çæ§è½

éåé¨ç½²

ä½¿ç¨éåçç¨æ·è¯·å¡å¿æ³¨æï¼

CPU é¨ç½²

å¯¹æ¨¡åè¿è¡å¾®è°

å¨éå¾®è°

LoRAå¾®è°

å£°æ

åè®®