Top Related Projects
Inference code for Llama models
An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型
A series of large language models trained from scratch by developers @01-ai
Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We also show you how to solve end to end problems using Llama model family and using them on various provider services
Quick Overview
Baichuan-13B is an open-source large language model (LLM) developed by Baichuan Intelligence. It is a 13 billion parameter model trained on a diverse multilingual dataset, with a focus on Chinese and English languages. The model aims to provide high-quality natural language processing capabilities for various applications.
Pros
- Open-source and freely available for research and commercial use
- Strong performance in both Chinese and English language tasks
- Supports efficient inference on consumer-grade hardware
- Actively maintained and updated by the Baichuan team
Cons
- Limited documentation and examples compared to some other popular LLMs
- May require fine-tuning for specific domain applications
- Performance in languages other than Chinese and English may be less robust
- Potential biases and limitations inherent to large language models
Getting Started
To use Baichuan-13B, follow these steps:
- Install the required dependencies:
pip install transformers torch
- Load the model and tokenizer:
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("baichuan-inc/Baichuan-13B-Chat", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("baichuan-inc/Baichuan-13B-Chat", trust_remote_code=True)
- Generate text:
inputs = tokenizer("Hello, how are you?", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
Note: This is a basic example. For more advanced usage and fine-tuning, refer to the project's documentation and the Hugging Face Transformers library documentation.
Competitor Comparisons
Inference code for Llama models
Pros of Llama
- More extensive documentation and community support
- Broader language support and multilingual capabilities
- Higher flexibility for fine-tuning and customization
Cons of Llama
- Larger model size, requiring more computational resources
- Potentially slower inference time for certain tasks
- More complex licensing and usage restrictions
Code Comparison
Baichuan-13B example:
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("baichuan-inc/Baichuan-13B-Chat", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("baichuan-inc/Baichuan-13B-Chat", trust_remote_code=True)
Llama example:
from transformers import LlamaTokenizer, LlamaForCausalLM
tokenizer = LlamaTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf")
model = LlamaForCausalLM.from_pretrained("meta-llama/Llama-2-7b-chat-hf")
Both repositories provide pre-trained language models, but they differ in their approach and focus. Llama offers a more versatile and widely-supported model, while Baichuan-13B is more specialized for certain tasks and languages. The code examples demonstrate similar usage patterns, with minor differences in model initialization and tokenizer selection.
An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
Pros of gpt-neox
- More extensive documentation and examples for training and fine-tuning
- Broader community support and contributions
- Designed for distributed training across multiple GPUs and nodes
Cons of gpt-neox
- Larger model size, requiring more computational resources
- More complex setup and configuration process
- Less focus on multilingual capabilities compared to Baichuan-13B
Code Comparison
gpt-neox:
from megatron import get_args
from megatron import print_rank_0
from megatron import get_tokenizer
from megatron import get_model
from megatron.training import train
train(model_provider=get_model,
optimizer=None,
model_type='GPT')
Baichuan-13B:
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("baichuan-inc/Baichuan-13B-Chat", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("baichuan-inc/Baichuan-13B-Chat", trust_remote_code=True)
The code snippets highlight the different approaches to model initialization and training. gpt-neox uses a custom training loop with Megatron-LM, while Baichuan-13B leverages the Hugging Face Transformers library for easier integration and use.
ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型
Pros of ChatGLM-6B
- Smaller model size (6B parameters) requires less computational resources
- Designed for efficient inference on consumer-grade hardware
- Supports both Chinese and English languages
Cons of ChatGLM-6B
- Lower parameter count may result in less sophisticated responses
- Limited fine-tuning options compared to Baichuan-13B
- Less extensive training data, potentially affecting performance on diverse tasks
Code Comparison
ChatGLM-6B:
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().cuda()
Baichuan-13B:
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("baichuan-inc/Baichuan-13B-Chat", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("baichuan-inc/Baichuan-13B-Chat", trust_remote_code=True).half().cuda()
The code snippets show similar usage patterns, with minor differences in model loading and class names. Both repositories utilize the Hugging Face Transformers library for easy integration and deployment.
A series of large language models trained from scratch by developers @01-ai
Pros of Yi
- More extensive documentation and examples provided in the repository
- Offers pre-trained models in both 6B and 34B parameter sizes
- Includes detailed model cards with performance metrics and benchmarks
Cons of Yi
- Less community engagement and fewer third-party contributions
- Limited multilingual support compared to Baichuan-13B
- Fewer fine-tuned variants available for specific tasks
Code Comparison
Yi:
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("01-ai/Yi-6B")
model = AutoModelForCausalLM.from_pretrained("01-ai/Yi-6B")
Baichuan-13B:
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("baichuan-inc/Baichuan-13B-Chat")
model = AutoModelForCausalLM.from_pretrained("baichuan-inc/Baichuan-13B-Chat")
Both repositories use similar code for loading models and tokenizers through the Hugging Face Transformers library. The main difference lies in the model names and sizes available. Yi offers 6B and 34B variants, while Baichuan-13B focuses on the 13B parameter size with chat-specific fine-tuning.
Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We also show you how to solve end to end problems using Llama model family and using them on various provider services
Pros of llama-cookbook
- Comprehensive documentation and examples for working with LLaMA models
- Broader scope, covering various aspects of LLM usage and fine-tuning
- Active community support and regular updates
Cons of llama-cookbook
- Not a standalone model, requires access to Meta's LLaMA models
- May have higher computational requirements for running examples
Code Comparison
Baichuan-13B (model loading):
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("baichuan-inc/Baichuan-13B-Chat", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("baichuan-inc/Baichuan-13B-Chat", trust_remote_code=True)
llama-cookbook (model loading):
from transformers import LlamaTokenizer, LlamaForCausalLM
tokenizer = LlamaTokenizer.from_pretrained("path/to/llama/tokenizer")
model = LlamaForCausalLM.from_pretrained("path/to/llama/model")
The main difference in code is that Baichuan-13B can be directly loaded from the Hugging Face model hub, while llama-cookbook requires local model files. Additionally, Baichuan-13B uses the trust_remote_code=True parameter for custom model implementations.
Convert
designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
Baichuan-13B
ð¤ Baichuan-13B-Base ⢠ð¤ Baichuan-13B-Chat ⢠ð¤ ModelScope ⢠ð¬ WeChat
æ´æ°ä¿¡æ¯
- [2023.09.06] æä»¬åå¸äºæ°ä¸ä»£å¼æºæ¨¡å Baichuan 2ï¼å å« 7Bã13B 尺寸 ð¥ð¥ð¥
- [2023.08.01] æ´æ°äºå¯¹é½æ¨¡å Baichuan-13B-Chat æéï¼ä¼åäºé¨ååºæ¯çææ
ç®å½
ä»ç»
Baichuan-13B æ¯ç±ç¾å·æºè½ç»§ Baichuan-7B ä¹åå¼åçå å« 130 äº¿åæ°ç弿ºå¯åç¨çå¤§è§æ¨¡è¯è¨æ¨¡åï¼å¨æå¨ç䏿åè±æ benchmark ä¸ååå¾å尺寸æå¥½çææãæ¬æ¬¡åå¸å 嫿é¢è®ç» (Baichuan-13B-Base) åå¯¹é½ (Baichuan-13B-Chat) ä¸¤ä¸ªçæ¬ãBaichuan-13B æå¦ä¸å 个ç¹ç¹ï¼
- **æ´å¤§å°ºå¯¸ãæ´å¤æ°æ®**ï¼Baichuan-13B å¨ Baichuan-7B çåºç¡ä¸è¿ä¸æ¥æ©å¤§åæ°éå° 130 亿ï¼å¹¶ä¸å¨é«è´¨éçè¯æä¸è®ç»äº 1.4 ä¸äº¿ tokensï¼è¶ è¿ LLaMA-13B 40%ï¼æ¯å½å弿º 13B 尺寸ä¸è®ç»æ°æ®éæå¤ç模åãæ¯æä¸è±åè¯ï¼ä½¿ç¨ ALiBi ä½ç½®ç¼ç ï¼ä¸ä¸æçªå£é¿åº¦ä¸º 4096ã
- 忶弿ºé¢è®ç»å坹齿¨¡åï¼é¢è®ç»æ¨¡åæ¯éç¨å¼åè çã åºåº§ ãï¼è广大æ®éç¨æ·å¯¹æå¯¹è¯åè½ç坹齿¨¡åå ·ææ´å¼ºçéæ±ãå æ¤æ¬æ¬¡å¼æºæä»¬åæ¶åå¸äºå¯¹é½æ¨¡åï¼Baichuan-13B-Chatï¼ï¼å ·æå¾å¼ºç对è¯è½åï¼å¼ç®±å³ç¨ï¼å è¡ä»£ç å³å¯ç®åçé¨ç½²ã
- æ´é«æçæ¨çï¼ä¸ºäºæ¯ææ´å¹¿å¤§ç¨æ·ç使ç¨ï¼æä»¬æ¬æ¬¡åæ¶å¼æºäº int8 å int4 çéåçæ¬ï¼ç¸å¯¹ééåçæ¬å¨å 乿²¡ææææå¤±çæ åµä¸å¤§å¤§éä½äºé¨ç½²çæºå¨èµæºé¨æ§ï¼å¯ä»¥é¨ç½²å¨å¦ Nvidia 3090 è¿æ ·çæ¶è´¹çº§æ¾å¡ä¸ã
- **弿ºå è´¹å¯åç¨**ï¼Baichuan-13B ä¸ä» 坹妿¯ç ç©¶å®å ¨å¼æ¾ï¼å¼åè ä¹ä» éé®ä»¶ç³è¯·å¹¶è·å¾å®æ¹åç¨è®¸å¯åï¼å³å¯ä»¥å è´¹åç¨ã
Benchmarkç»æ
æä»¬å¨å个æå¨å¤§è¯è¨æ¨¡åçä¸è±æ benchmark ä¸è¿è¡äº5-shotè¯æµãç»æå¦ä¸ï¼
C-Eval
| Model 5-shot | STEM | Social Sciences | Humanities | Others | Average |
|---|---|---|---|---|---|
| Baichuan-7B | 38.2 | 52.0 | 46.2 | 39.3 | 42.8 |
| Chinese-Alpaca-Plus-13B | 35.2 | 45.6 | 40.0 | 38.2 | 38.8 |
| Vicuna-13B | 30.5 | 38.2 | 32.5 | 32.5 | 32.8 |
| Chinese-LLaMA-Plus-13B | 30.3 | 38.0 | 32.9 | 29.1 | 32.1 |
| Ziya-LLaMA-13B-Pretrain | 27.6 | 34.4 | 32.0 | 28.6 | 30.0 |
| LLaMA-13B | 27.0 | 33.6 | 27.7 | 27.6 | 28.5 |
| moss-moon-003-base (16B) | 27.0 | 29.1 | 27.2 | 26.9 | 27.4 |
| Baichuan-13B-Base | 45.9 | 63.5 | 57.2 | 49.3 | 52.4 |
| Baichuan-13B-Chat | 43.7 | 64.6 | 56.2 | 49.2 | 51.5 |
MMLU
| Model 5-shot | STEM | Social Sciences | Humanities | Others | Average |
|---|---|---|---|---|---|
| Vicuna-13B | 40.4 | 60.5 | 49.5 | 58.4 | 52.0 |
| LLaMA-13B | 36.1 | 53.0 | 44.0 | 52.8 | 46.3 |
| Chinese-Alpaca-Plus-13B | 36.9 | 48.9 | 40.5 | 50.5 | 43.9 |
| Ziya-LLaMA-13B-Pretrain | 35.6 | 47.6 | 40.1 | 49.4 | 42.9 |
| Baichuan-7B | 35.6 | 48.9 | 38.4 | 48.1 | 42.3 |
| Chinese-LLaMA-Plus-13B | 33.1 | 42.8 | 37.0 | 44.6 | 39.2 |
| moss-moon-003-base (16B) | 22.4 | 22.8 | 24.2 | 24.4 | 23.6 |
| Baichuan-13B-Base | 41.6 | 60.9 | 47.4 | 58.5 | 51.6 |
| Baichuan-13B-Chat | 40.9 | 60.9 | 48.8 | 59.0 | 52.1 |
说æï¼æä»¬éç¨äº MMLU 宿¹çè¯æµæ¹æ¡ã
CMMLU
| Model 5-shot | STEM | Humanities | Social Sciences | Others | China Specific | Average |
|---|---|---|---|---|---|---|
| Baichuan-7B | 34.4 | 47.5 | 47.6 | 46.6 | 44.3 | 44.0 |
| Vicuna-13B | 31.8 | 36.2 | 37.6 | 39.5 | 34.3 | 36.3 |
| Chinese-Alpaca-Plus-13B | 29.8 | 33.4 | 33.2 | 37.9 | 32.1 | 33.4 |
| Chinese-LLaMA-Plus-13B | 28.1 | 33.1 | 35.4 | 35.1 | 33.5 | 33.0 |
| Ziya-LLaMA-13B-Pretrain | 29.0 | 30.7 | 33.8 | 34.4 | 31.9 | 32.1 |
| LLaMA-13B | 29.2 | 30.8 | 31.6 | 33.0 | 30.5 | 31.2 |
| moss-moon-003-base (16B) | 27.2 | 30.4 | 28.8 | 32.6 | 28.7 | 29.6 |
| Baichuan-13B-Base | 41.7 | 61.1 | 59.8 | 59.0 | 56.4 | 55.3 |
| Baichuan-13B-Chat | 42.8 | 62.6 | 59.7 | 59.0 | 56.1 | 55.8 |
说æï¼CMMLU æ¯ä¸ä¸ªç»¼åæ§ç䏿è¯ä¼°åºåï¼ä¸é¨ç¨äºè¯ä¼°è¯è¨æ¨¡åå¨ä¸æè¯å¢ä¸çç¥è¯åæ¨çè½åãæä»¬éç¨äºå ¶å®æ¹çè¯æµæ¹æ¡ã
模åç»è
| 模ååç§° | éèå±ç»´åº¦ | 屿° | 注æåå¤´æ° | è¯è¡¨å¤§å° | æ»åæ°é | è®ç»æ°æ®ï¼tokensï¼ | ä½ç½®ç¼ç | æå¤§é¿åº¦ |
|---|---|---|---|---|---|---|---|---|
| Baichuan-7B | 4,096 | 32 | 32 | 64,000 | 7,000,559,616 | 1.2 ä¸äº¿ | RoPE | 4,096 |
| Baichuan-13B | 5,120 | 40 | 40 | 64,000 | 13,264,901,120 | 1.4 ä¸äº¿ | ALiBi | 4,096 |
æ¨çåé¨ç½²
æ¨çæéçæ¨¡åæéãæºç ãé 置已åå¸å¨ Hugging Faceï¼Baichuan-13B-Base å Baichuan-13B-Chatãä¸é¢ä»¥ Baichuan-13B-Chat 为ä¾ç¤ºèå¤ç§æ¨çæ¹å¼ãç¨åºä¼èªå¨ä» Hugging Face ä¸è½½æéèµæºã
æ¨çå请å®è£ ä¾èµï¼
pip install -r requirements.txt
Pythonä»£ç æ¹å¼
>>> import torch
>>> from transformers import AutoModelForCausalLM, AutoTokenizer
>>> from transformers.generation.utils import GenerationConfig
>>> tokenizer = AutoTokenizer.from_pretrained("baichuan-inc/Baichuan-13B-Chat", use_fast=False, trust_remote_code=True)
>>> model = AutoModelForCausalLM.from_pretrained("baichuan-inc/Baichuan-13B-Chat", device_map="auto", torch_dtype=torch.float16, trust_remote_code=True)
>>> model.generation_config = GenerationConfig.from_pretrained("baichuan-inc/Baichuan-13B-Chat")
>>> messages = []
>>> messages.append({"role": "user", "content": "ä¸çä¸ç¬¬äºé«ç山峰æ¯åªåº§"})
>>> response = model.chat(tokenizer, messages)
>>> print(response)
乿éå³°ãä¸ç第äºé«å³°âââ乿é峰西æ¹ç»å±±è
ç§°å
¶ä¸ºk2å³°ï¼æµ·æé«åº¦æ¯8611ç±³ï¼ä½äºååæä»å±±èçä¸å·´è¾¹å¢ä¸
å¨ä¸è¿°ä»£ç ä¸ï¼æ¨¡åå è½½æå®
device_map='auto'ï¼ä¼ä½¿ç¨ææå¯ç¨æ¾å¡ãå¦éæå®ä½¿ç¨ç设å¤ï¼å¯ä»¥ä½¿ç¨ç±»ä¼¼export CUDA_VISIBLE_DEVICES=0,1ï¼ä½¿ç¨äº0ã1å·æ¾å¡ï¼çæ¹å¼æ§å¶ã
å½ä»¤è¡å·¥å ·æ¹å¼
python cli_demo.py
æåè¾åºç¤ºä¾å¦ä¸ï¼
ç½é¡µ demo æ¹å¼
ä¾é streamlitè¿è¡ä»¥ä¸å½ä»¤ï¼ä¼å¨æ¬å°å¯å¨ä¸ä¸ª web æå¡ï¼ææ§å¶å°ç»åºçå°åæ¾å ¥æµè§å¨å³å¯è®¿é®ã
streamlit run web_demo.py
ææå¦ä¸ï¼
Baichuan-13B-Chat 示ä¾è¾åº
å 容åä½
ç¨æ·ï¼
ä½ æ¯ä¸ä¸ªå
容åä½è
ï¼ç¼å䏿®µå
³äºå¯¹é©¬æ¯å
忝å°çè¨è®¿é®ä¸å½è§è§ççè§é¢èæ¬
Baichun-13B-Chatï¼
æ¢ç´¢é©¬æ¯å
䏿¯å°Â·çè¨çä¸å½ä¹æ
é头1ï¼(èæ¯é³ä¹)ç»é¢åå
¥ï¼å±ç¤ºä¸å½çç¹åé½å¸æ¯è±¡ã
æç½ï¼æ¬¢è¿æ¥å°æä»¬çèç®ï¼ä»å¤©æä»¬å°å¸¦æ¨ä¸èµ·äºè§£ä¸¤ä½ç§æå·¨å¤´ââåé·马æ¯å
忝å°Â·çè¨å¨ä¸å½ç䏿¬¡ç¹æ®è®¿é®ã
é头2ï¼(èæ¯é³ä¹)ç»é¢åæ¢è³é©¬æ¯å
忝å°Â·çè¨æµè¾¾ä¸å½çæºåºã
æç½ï¼è¿æ¬¡è®¿é®æ¯ä»ä»¬é¦æ¬¡å
±åè¸ä¸ä¸å½çåå°ï¼ä»ä»¬å°å¨è¿éå±å¼ä¸åºå
³äºç§æåæ°ãç¯ä¿åæè²ç对è¯ã
é头3ï¼(èæ¯é³ä¹)ç»é¢å±ç¤ºé©¬æ¯å
忝å°Â·çè¨åè§ä¸å®¶çµå¨æ±½è½¦å¶é åã
æç½ï¼å¨è¿æ¬¡è®¿é®ä¸ï¼ä»ä»¬é¦å
åè§äºä¸å®¶çµå¨æ±½è½¦å¶é åï¼äºè§£äºä¸å½å¨æ°è½æºæ±½è½¦é¢åçææ°åå±ã
é头4ï¼(èæ¯é³ä¹)ç»é¢å±ç¤ºé©¬æ¯å
忝å°Â·çè¨åå ä¸åºå
³äºäººå·¥æºè½ç论åã
æç½ï¼éåï¼ä»ä»¬åå äºå
³äºäººå·¥æºè½çä¸åºè®ºåï¼æ¢è®¨äºè¿ä¸é¢åçåå±è¶å¿åå
¶å¯¹æªæ¥çå½±åã
é头5ï¼(èæ¯é³ä¹)ç»é¢å±ç¤ºé©¬æ¯å
忝å°Â·çè¨åè§ä¸æå¦æ ¡ã
æç½ï¼å¨è¿æ¬¡è®¿é®ä¸ï¼ä»ä»¬è¿åè§äºä¸æå¦æ ¡ï¼äºè§£äºä¸å½å¨æè²é¢åçåæ°å®è·µã
é头6ï¼(èæ¯é³ä¹)ç»é¢å±ç¤ºé©¬æ¯å
忝å°Â·çè¨æ¥ååªä½é访ã
æç½ï¼æåï¼ä»ä»¬å¨åªä½é访ç¯è表示ï¼å¸æéè¿è¿æ¬¡è®¿é®ï¼è½å¤ä¸ºä¸ç¾ä¸¤å½å¨ç§æåæ°ãç¯ä¿åæè²çé¢åçå使ä¾ä¸äºå¯ç¤ºã
é头7ï¼(èæ¯é³ä¹)ç»é¢ååç¹åé½å¸æ¯è±¡ã
æç½ï¼è¿å°±æ¯é©¬æ¯å
忝å°Â·çè¨çä¸å½ä¹è¡ï¼ä»ä»¬ç访é®ä¸ä»
让æä»¬çå°äºä¸å½å¨ç§æåæ°æ¹é¢ç巨大æ½åï¼ä¹ä¸ºæä»¬æä¾äºå
³äºæªæ¥åå±çæ´å¤æèãæè°¢æ¶çï¼æä»¬ä¸æåè§ï¼
å¹¿åææ¡
ç¨æ·ï¼
请以éå®åçè§è²ï¼å°æ´è¡£æ¶²çå¹²åãæ æ®çãæ å¼å³ä»¥åé¤èçä¼ç¹è¿è¡æ¨é
Baichun-13B-Chatï¼
å°æ¬çé¡¾å®¢ï¼æ¨å¥½ï¼ææ¯ä¸åæ´è¡£æ¶²éå®åï¼é常é«å
´ä¸ºæ¨ä»ç»æä»¬è¿æ¬¾äº§åãæä»¬çæ´è¡£æ¶²å
·æä»¥ä¸å 个æ¾èä¼ç¹ï¼
å¹²åï¼æä»¬çæ´è¡£æ¶²éç¨äºå
è¿çæ´æ¶¤ææ¯ï¼è½å¤æ·±å
¥è¡£ç©çº¤ç»´ï¼ææå»é¤åç§æ±¡æ¸ï¼è®©æ¨çè¡£ç©çç¶ä¸æ°ãåæ¶ï¼å®ä¸ä¼å¯¹è¡£ç©çé¢è²åè´¨å°é æä»»ä½æå®³ï¼è®©æ¨çè¡£ç©ä¿æåæçåè´¨ã
æ æ®çï¼ä¸ä¼ ç»çæ´è¡£ç²ç¸æ¯ï¼æä»¬çæ´è¡£æ¶²å¨æ¸
æ´è¿ç¨ä¸ä¸ä¼äº§çè¿å¤ç泡沫ï¼å æ¤æ¸
æ´åçè¡£ç©æ´å æè½¯ï¼ä¸ä¼æé»è
»çæè§ãæ¤å¤ï¼å®ç廿±¡è½å强ï¼è½å½»åºæ´åè¡£ç©ä¸ç污æ¸ï¼é¿å
äºä¼ ç»æ´è¡£ç²å®¹ææ®çå¨è¡£ç©ä¸çé®é¢ã
æ å¼å³ï¼æä»¬çæ´è¡£æ¶²éç¨å¤©ç¶æ¤ç©æåç©ä½ä¸ºåæï¼ä¸å«æå®³åå¦ç©è´¨ï¼å æ¤å¨ä½¿ç¨è¿ç¨ä¸ä¸ä¼äº§çåºé¼»çæ°å³ãå³ä½¿æ¯ææèè¤ç人群ä¹å¯ä»¥æ¾å¿ä½¿ç¨ï¼è®©æ¨çè¡£ç©æ£ååºæ·¡é
çæ¸
é¦ã
é¤èï¼ä¸ºäºæ»¡è¶³ç°ä»£äººå¯¹å¥åº·çæ´»çéæ±ï¼æä»¬çæ´è¡£æ¶²ç¹å«æ·»å äºé«æé¤èæåï¼è½å¤æææçè¡£ç©ä¸çç»èï¼ä¿æ¤æ¨å家人ç身ä½å¥åº·ãæ 论æ¯å©´å¹¼å¿çè¡£ç©è¿æ¯è´´èº«å
è¡£ï¼é½å¯ä»¥æ¾å¿ä½¿ç¨ã
æ»ä¹ï¼æä»¬çæ´è¡£æ¶²ååå
¶å¹²åãæ æ®çãæ å¼å³ä»¥åé¤èçä¼ç¹ï¼æä¸ºäºå¹¿å¤§æ¶è´¹è
çé¦éã妿æ¨è¿æ²¡æå°è¯è¿æä»¬ç产åï¼é£ä¹ç°å¨å°±æ¯æå¥½çæ¶æºãç¸ä¿¡å®ä¸å®ä¼ç»æ¨å¸¦æ¥å
¨æ°çæ´è¡£ä½éªï¼
ç¨æ·ï¼
åå¤ä¸ä»½ä»¥å® ç©é£å为主é¢çæ¥åï¼å
æ¬æ½å¨æ¶è´¹è
ï¼å¸åºè§æ¨¡ï¼å¸åºæºéï¼åç婿¨¡å¼çå
容
Baichun-13B-Chatï¼
ä¸ãå¼è¨
éçäººä»¬çæ´»æ°´å¹³çæé«åå¯¹å® ç©çå
³ç±ç¨åº¦ä¸æå æ·±ï¼å® ç©é£åè¡ä¸å¾å°äºå¿«éåå±ãæ¬æ¥åå°å¯¹å® ç©é£åè¡ä¸çæ½å¨æ¶è´¹è
ãå¸åºè§æ¨¡ãå¸åºæºé以åç婿¨¡å¼çæ¹é¢è¿è¡åæï¼ä»¥æä¸ºç¸å
³ä¼ä¸æä¾æççåèä¿¡æ¯ã
äºãæ½å¨æ¶è´¹è
1. å
»å® 人群ï¼éçå
»å® 人æ°çå¢å ï¼å® ç©é£åçéæ±ä¹å¨ä¸æå¢é¿ãæ ¹æ®ç»è®¡æ°æ®æ¾ç¤ºï¼2019å¹´ä¸å½åéå
»å® (ç¬ç«)äººå£æ°é已达7355ä¸äººï¼åæ¯å¢é¿4.6%ãé¢è®¡å°2023å¹´ï¼è¿ä¸æ°åå°è¾¾å°8742ä¸äººã
2. æ°ç代æ¶è´¹è
ï¼æ°ç代æ¶è´¹è
卿¶è´¹è§å¿µä¸æ´å 注éå¥åº·ãç¯ä¿å个æ§åï¼è¿ä½¿å¾ä»ä»¬æ´æ¿æä¸ºå® ç©è´ä¹°é«åè´¨çé£åãæ¤å¤ï¼ä»ä»¬ä¹æ´å¾åäºéè¿ç½ç»å¹³å°äºè§£åè´ä¹°å® ç©é£åã
ä¸ãå¸åºè§æ¨¡
1. æ ¹æ®å½å®¶ç»è®¡å±æ°æ®ï¼2019å¹´æå½å® ç©é£åå¸åºè§æ¨¡è¾¾å°äº1,020亿å
人æ°å¸ï¼åæ¯å¢é¿çº¦10%ãé¢è®¡å°2023å¹´ï¼å¸åºè§æ¨¡å°è¾¾å°1, 500亿å
人æ°å¸ã
2. ä»äº§åç±»åæ¥çï¼å® ç©å¹²ç²®å¸åºå æ¯æé«ï¼çº¦å æ»å¸åºç70%;å
¶æ¬¡æ¯å® ç©æ¹¿ç²®ï¼å æ¯çº¦ä¸º20%;å® ç©é¶é£å¸åºå æ¯çº¦ä¸º10%ã
åãå¸åºæºé
1. åè½æ§å® ç©é£åï¼éçæ¶è´¹è
对å¥åº·çå
³æ³¨åº¦ä¸ææé«ï¼å
·æç¹å®åè½çå® ç©é£åå¦å¤æ¹ç²®ãåè¥ç²®ç鿱鿏å¢å¤§ã
2. å¤©ç¶ææºå® ç©é£åï¼è¶æ¥è¶å¤çæ¶è´¹è
å¾åäºéæ©å¤©ç¶ãæ æ·»å åçå® ç©é£åï¼è¿ä¹ä¸ºå¸åºå¸¦æ¥äºæ°çåæºã
3. å®å¶åå® ç©é£åï¼é对ä¸ååç§ãå¹´é¾ãä½åçå® ç©ï¼æ¨åºå®å¶åçå® ç©é£åï¼æ»¡è¶³æ¶è´¹è
个æ§åéæ±ã
äºãç婿¨¡å¼
1. ä¼ ç»é宿¸ éï¼éè¿è¶
å¸ãå® ç©åºãçµåå¹³å°çä¼ ç»æ¸ ééå®å® ç©é£åã
2. ç´é模å¼ï¼éè¿èªå»ºç½ç«æä¸ç¬¬ä¸æ¹å¹³å°åä½ï¼ç´æ¥åæ¶è´¹è
éå®å® ç©é£åã
3. ä¼åå¶ï¼éè¿è®¾ç«ä¼åå¶åº¦ï¼ä¸ºä¼åæä¾ä¸å±ä¼æ åæå¡ï¼å¢å 客æ·ç²æ§ã
4. å¢å¼æå¡ï¼æä¾å® ç©ç¾å®¹ãå¯å
»ãè®ç»çéå æå¡ï¼å®ç°å¤å
åçå©ã
å
ãç»è®º
å® ç©é£åè¡ä¸å
·æå·¨å¤§çå¸åºæ½åååå±ç©ºé´ãä¼ä¸åºæä½å¸åºæºéï¼å
³æ³¨æ¶è´¹è
éæ±ååï¼åæ°äº§å
ç²¾åé®ç
ç¨æ·ï¼
ä¸çä¸ç¬¬äºé«çå±±æ¯ä»ä¹å±±
Baichun-13B-Chatï¼
乿éå³°(K2) æµ·æ8611ç±³ï¼ä½äºååæä»å±±èçä¸å·´è¾¹å¢ä¸ï¼æ¯ä¸å½ä»¥å¤ç»å±±è
çæå¤§èéªï¼ä¹æ¯ä¸ç䏿é¾å¾æç山峰ä¹ä¸ãå®çé«åº¦ä»
次äºç ç©æçå³°ï¼ä½å±
ä¸ç第äºã
ç¨æ·ï¼
æå
´éçæ¯ä»ä¹ææ
Baichun-13B-Chatï¼
"æå
´éç"æ¯ä¸ä¸ªæ±è¯æè¯ï¼å®çæææ¯å½¢å®¹ä¸ä¸ªäººå
´è´ä¸é«ï¼å
´è¶£åå¼±ãè¿ä¸ªè¯éå¸¸ç¨æ¥æè¿°ä¸ä¸ªäººå¨æä¸ªæ´»å¨ä¸å¤±å»äºåæççæ
åå
´è¶£ã
ç¨æ·ï¼
请é®å¼ºåå¦ä¹ éé¢çReward Shapingæ¯ä»ä¹ææ
Baichun-13B-Chatï¼
å¥å±å¡é (Reward Shaping)æ¯å¼ºåå¦ä¹ ä¸çä¸ä¸ªæ¦å¿µï¼å®æ¯æéè¿è°æ´ç¯å¢æè®¾è®¡ç¹å®çå¥å±å½æ°æ¥å¼å¯¼æºè½ä½(Agent)å¦ä¹ å°æ´ç¬¦å人类ææçè¡ä¸ºãå¨ä¼ ç»ç强åå¦ä¹ ä¸ï¼æºè½ä½éè¿ä¸ç¯å¢äº¤äºæ¥å¦ä¹ æä¼çç¥ï¼ä½è¿ç§æ¹æ³å¯è½å¯¼è´æºè½ä½å¦ä¹ å°ä¸äºæ¬¡ä¼çè³ä¸è¯è¡ä¸ºã
å¥å±å¡é çç®çæ¯ä¸ºäºè§£å³è¿ä¸ªé®é¢ï¼éè¿äººä¸ºå°è°æ´å¥å±å½æ°ï¼ä½¿å¾æºè½ä½æ´å®¹æå¦å°æä»¬å¸æå®å¦ä¼çè¡ä¸ºãè¿å¯ä»¥éè¿ä»¥ä¸å ç§æ¹å¼å®ç°ï¼
1. å¢å æ£åå¥å±ï¼å½æºè½ä½è¡¨ç°åºææè¡ä¸ºæ¶ï¼ç»äºå
¶æ´å¤çå¥å±ï¼ä»¥é¼å±è¿ç§è¡ä¸ºã
2. åå°è´åå¥å±ï¼å½æºè½ä½è¡¨ç°åºä¸ææè¡ä¸ºæ¶ï¼åå°å¯¹å
¶è¿è¡æ©ç½ï¼éä½å
¶å¯¹è¿ç§è¡ä¸ºçå好ã
3. å¼å
¥æ°å¥å±ï¼ä¸ºæºè½ä½æä¾é¢å¤çå¥å±ï¼ä»¥ä¾¿æ´å¥½å°å¼å¯¼å
¶å¦ä¹ ç¹å®è¡ä¸ºã
4. æ¹åå¥å±åå¸ï¼éè¿è°æ´å¥å±å½æ°ï¼ä½¿å
¶å
·æä¸åçå¥å±åå¸ï¼ä»èå½±åæºè½ä½çå¦ä¹ è¿ç¨ã
æ»ä¹ï¼å¥å±å¡é æ¯ä¸ç§éè¿è°æ´ç¯å¢åå¥å±å½æ°æ¥å¼å¯¼æºè½ä½å¦ä¹ æ´å¥½è¡ä¸ºçæ¹æ³ï¼æå©äºæé«å¼ºåå¦ä¹ çæ§è½ã
è¯è¨çè§£
ç¨æ·ï¼
å°å¼ ç»é¢å¯¼ä¹°äºä¸ä¸ªç¤¼ç©è¯´ï¼è¯´âææä¸ä¸âï¼é¢å¯¼è¯´âä»ä¹ææâï¼å°å¼ åç说ï¼â没ä»ä¹ææï¼å°±æ¯éä¾¿ææææâãè¯·ä¾æ¬¡è§£è¯»ä¸ä¸å°å¼ 说çè¿å ä¸ªâææâçå«ä¹ã
Baichun-13B-Chatï¼
1. 第ä¸ä¸ªâææâï¼æçæ¯å°å¼ éç»é¢å¯¼çé£ä¸ªç¤¼ç©ï¼è¡¨ç¤ºä»å·²ç»ä¹°äºè¿ä¸ªç¤¼ç©å¹¶åå¤éç»ä»ã
2. 第äºä¸ªâææâï¼æçæ¯å°å¼ 对é¢å¯¼çä¸ç§ç¤¼è²åå°éï¼è¡¨ç¤ºèªå·±å¹¶æ²¡æå¿è®°ä»ç身份åå°ä½ï¼å¨è´ä¹°ç¤¼ç©æ¶ä¹èèå°äºè¿ä¸ç¹ã
3. 第ä¸ä¸ªâææâï¼æçæ¯å°å¼ 对äºè¿æ¬¡èµ é礼ç©çè¡ä¸ºæ¬èº«çæåº¦ï¼å³ä¸å¸æè¿ä»¶äºæ
å¼èµ·è¿å¤çå
³æ³¨æäºè®®ï¼åªæ¯æ³è¡¨è¾¾ä¸ç§å¿æèå·²ã
æ¨çæ§è½
Baichuan-13B 使ç¨äº ALiBi 线æ§åç½®ææ¯ï¼ç¸å¯¹äº Rotary Embedding 计ç®éæ´å°ï¼å¯¹æ¨çæ§è½ææ¾èæåï¼ä¸æ åç LLaMA-13B ç¸æ¯ï¼å¹³åæ¨çé度 (tokens/s) 宿µæå 31.6%ï¼
| Model | tokens/s |
|---|---|
| LLaMA-13B | 19.4 |
| Baichuan-13B | 25.4 |
æµè¯ç¯å¢ååæ°ï¼GPU A100-SXM4-80G, PyTorch 2.0.0+cu117, transformers 4.29.1, batch size = 1, çæé¿åº¦ = 2048, 精度 fp16, åºäº Baichuan-13B-Base
éåé¨ç½²
Baichuan-13B æ¯æ int8 å int4 éåï¼ç¨æ·åªé卿¨ç代ç ä¸ç®åä¿®æ¹ä¸¤è¡å³å¯å®ç°ã
使ç¨éåçç¨æ·è¯·å¡å¿ 注æï¼
请ä»ç»é 读æ¥ä¸æ¥çç示ä¾ä»£ç ï¼å°¤å ¶æ¯ç¬¬ä¸è¡æ¨¡åå è½½é¨åï¼åä¸é¢çæ¨çç¤ºä¾æ¯ä¸åçã
å¼åè
å¯ä»¥æç
§èªå·±çéæ±ä¿®æ¹æ¨¡åçå è½½æ¹å¼ï¼ä½æ¯è¯·æ³¨æï¼å¦ææ¯ä¸ºäºèçæ¾åèè¿è¡éåï¼åºå è½½åå§ç²¾åº¦æ¨¡åå° CPU ååå¼å§éåï¼é¿å
å¨from_pretrainedæ¶æ·»å device_map='auto'æè
å
¶å®ä¼å¯¼è´æåå§ç²¾åº¦æ¨¡åç´æ¥å è½½å° GPU çè¡ä¸ºçåæ°ã
å¦éä½¿ç¨ int8 éåï¼
model = AutoModelForCausalLM.from_pretrained("baichuan-inc/Baichuan-13B-Chat", torch_dtype=torch.float16, trust_remote_code=True)
model = model.quantize(8).cuda()
åæ ·çï¼å¦éä½¿ç¨ int4 éåï¼
model = AutoModelForCausalLM.from_pretrained("baichuan-inc/Baichuan-13B-Chat", torch_dtype=torch.float16, trust_remote_code=True)
model = model.quantize(4).cuda()
å¦å¤ï¼å¦æä½ 䏿³è°ç¨ quantize å¨çº¿éåï¼æä»¬æéå好ç int8 Chat 模åå¯ä¾ä½¿ç¨ï¼Baichuan-13B-Chat-int8ï¼
model = AutoModelForCausalLM.from_pretrained("baichuan-inc/Baichuan-13B-Chat-int8", torch_dtype=torch.float16, trust_remote_code=True).cuda()
éåååå ç¨æ¾åæ åµå¦ä¸ï¼
| Precision | GPU Mem (GB) |
|---|---|
| bf16 / fp16 | 26.0 |
| int8 | 15.8 |
| int4 | 9.7 |
éååå¨å个 benchmark ä¸çç»æååå§çæ¬å¯¹æ¯å¦ä¸ï¼
| Model 5-shot | C-Eval | MMLU | CMMLU |
|---|---|---|---|
| Baichuan-13B-Base | 52.4 | 51.6 | 55.3 |
| Baichuan-13B-Base-int8 | 51.2 | 49.9 | 54.5 |
| Baichuan-13B-Base-int4 | 47.6 | 46.0 | 51.0 |
CPU é¨ç½²
Baichuan-13B æ¯æ CPU æ¨çï¼ä½éè¦å¼ºè°çæ¯ï¼CPU çæ¨çé度ç¸å¯¹è¾æ ¢ãéæå¦ä¸æ¹å¼ä¿®æ¹æ¨¡åå è½½çæ¹å¼ï¼
model = AutoModelForCausalLM.from_pretrained("baichuan-inc/Baichuan-13B-Chat", torch_dtype=torch.float32, trust_remote_code=True)
使ç¨CPUè¿è¡æ¨ç大æ¦éè¦ 60GB å åã
对模åè¿è¡å¾®è°
å¼åè
å¯ä»¥å¯¹ Baichuan-13B-Base æ Baichuan-13B-Chat è¿è¡å¾®è°ä½¿ç¨ã卿¤æä»¬æµè¯äºä¸ Baichuan-13B å
¼å®¹çå¾®è°å·¥å
· LLaMA Efficient Tuningï¼å¹¶ç»åºå
¨éå¾®è°å LoRAå¾®è°ç两ç§ç¤ºèã
å¨å¼å§ä¹åï¼å¼åè éä¸è½½ LLaMA Efficient Tuning 项ç®å¹¶æå ¶è¦æ±å®è£ ä¾èµã
è¾å
¥æ°æ®ä¸ºæ¾ç½®å¨é¡¹ç®dataç®å½ä¸ç json æä»¶ï¼ç¨--dataseté项æå®ï¼åèä¸é¢ç¤ºä¾ï¼ï¼å¤ä¸ªè¾å
¥æä»¶ç¨,åéãjson æä»¶ç¤ºä¾æ ¼å¼ååæ®µè¯´æå¦ä¸ï¼
[
{
"instruction": "What are the three primary colors?",
"input": "",
"output": "The three primary colors are red, blue, and yellow."
},
....
]
json æä»¶ä¸åå¨ä¸ä¸ªå表ï¼åè¡¨çæ¯ä¸ªå
ç´ æ¯ä¸ä¸ª sampleãå
¶ä¸instructionä»£è¡¨ç¨æ·è¾å
¥ï¼inputæ¯å¯é项ï¼å¦æå¼åè
åæ¶æå®äºinstructionåinputï¼ä¼æäºè
ç¨\nè¿æ¥èµ·æ¥ä»£è¡¨ç¨æ·è¾å
¥ï¼output代表ææç模åè¾åºã
ä¸é¢æä»¬ç»åºä¸¤ç§å¾®è°åºæ¯ä¸æµè¯è·éç示èèæ¬ã
å ¨éå¾®è°
æä»¬å¨ 8 * Nvidia A100 80 GB + deepspeed çç¯å¢ä¸è¿è¡äºå ¨éå¾®è°æµè¯ã
è®ç»å¯å¨èæ¬ç¤ºä¾ï¼
deepspeed --num_gpus=8 src/train_bash.py \
--stage sft \
--model_name_or_path baichuan-inc/Baichuan-13B-Base \
--do_train \
--dataset alpaca_gpt4_en,alpaca_gpt4_zh \
--finetuning_type full \
--output_dir path_to_your_sft_checkpoint \
--overwrite_cache \
--per_device_train_batch_size 4 \
--per_device_eval_batch_size 4 \
--gradient_accumulation_steps 8 \
--preprocessing_num_workers 16 \
--lr_scheduler_type cosine \
--logging_steps 10 \
--save_steps 100 \
--eval_steps 100 \
--learning_rate 5e-5 \
--max_grad_norm 0.5 \
--num_train_epochs 2.0 \
--dev_ratio 0.01 \
--evaluation_strategy steps \
--load_best_model_at_end \
--plot_loss \
--fp16 \
--deepspeed deepspeed.json
deep_speed.json é 置示ä¾ï¼
{
"train_micro_batch_size_per_gpu": "auto",
"zero_allow_untested_optimizer": true,
"fp16": {
"enabled": "auto",
"loss_scale": 0,
"initial_scale_power": 16,
"loss_scale_window": 1000,
"hysteresis": 2,
"min_loss_scale": 1
},
"zero_optimization": {
"stage": 2,
"allgather_partitions": true,
"allgather_bucket_size": 5e8,
"overlap_comm": false,
"reduce_scatter": true,
"reduce_bucket_size": 5e8,
"contiguous_gradients" : true
}
}
LoRAå¾®è°
æä»¬å¨åå¼ Nvidia A100 80G æ¾å¡ä¸è¿è¡äº LoRA å¾®è°æµè¯ã
è®ç»å¯å¨èæ¬ç¤ºä¾ï¼
CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \
--stage sft \
--model_name_or_path baichuan-inc/Baichuan-13B-Base \
--do_train \
--dataset alpaca_gpt4_en,alpaca_gpt4_zh \
--finetuning_type lora \
--lora_rank 8 \
--lora_target W_pack \
--output_dir path_to_your_sft_checkpoint \
--overwrite_cache \
--per_device_train_batch_size 4 \
--per_device_eval_batch_size 4 \
--gradient_accumulation_steps 8 \
--preprocessing_num_workers 16 \
--lr_scheduler_type cosine \
--logging_steps 10 \
--save_steps 100 \
--eval_steps 100 \
--learning_rate 5e-5 \
--max_grad_norm 0.5 \
--num_train_epochs 2.0 \
--dev_ratio 0.01 \
--evaluation_strategy steps \
--load_best_model_at_end \
--plot_loss \
--fp16
å ³äºä½¿ç¨ LLaMA Efficient Tuning çæ´è¯¦ç»çç¨æ³ï¼è¯·åé å ¶é¡¹ç®ä¸»é¡µè¯´æã
声æ
æä»¬å¨æ¤å£°æï¼æä»¬çå¼åå¢éå¹¶æªåºäº Baichuan-13B 模åå¼åä»»ä½åºç¨ï¼æ 论æ¯å¨ iOSãAndroidãç½é¡µæä»»ä½å ¶ä»å¹³å°ãæä»¬å¼ºçå¼åææä½¿ç¨è ï¼ä¸è¦å©ç¨ Baichuan-13B 模åè¿è¡ä»»ä½å±å®³å½å®¶ç¤¾ä¼å®å ¨æè¿æ³çæ´»å¨ãå¦å¤ï¼æä»¬ä¹è¦æ±ä½¿ç¨è ä¸è¦å° Baichuan-13B 模åç¨äºæªç»éå½å®å ¨å®¡æ¥å夿¡çäºèç½æå¡ãæä»¬å¸æææç使ç¨è é½è½éµå®è¿ä¸ªååï¼ç¡®ä¿ç§æçåå±è½å¨è§èååæ³çç¯å¢ä¸è¿è¡ã
æä»¬å·²ç»å°½æä»¬æè½ï¼æ¥ç¡®ä¿æ¨¡åè®ç»è¿ç¨ä¸ä½¿ç¨çæ°æ®çåè§æ§ãç¶èï¼å°½ç®¡æä»¬å·²ç»ååºäºå·¨å¤§çåªåï¼ä½ç±äºæ¨¡ååæ°æ®ç夿æ§ï¼ä»æå¯è½åå¨ä¸äºæ æ³é¢è§çé®é¢ãå æ¤ï¼å¦æç±äºä½¿ç¨ Baichuan-13B 弿ºæ¨¡åè导è´çä»»ä½é®é¢ï¼å æ¬ä½ä¸éäºæ°æ®å®å ¨é®é¢ãå ¬å ±è论é£é©ï¼ææ¨¡åè¢«è¯¯å¯¼ãæ»¥ç¨ãä¼ ææä¸å½å©ç¨æå¸¦æ¥çä»»ä½é£é©åé®é¢ï¼æä»¬å°ä¸æ¿æ ä»»ä½è´£ä»»ã
åè®®
对æ¬ä»åºæºç ç使ç¨éµå¾ªå¼æºè®¸å¯åè®® Apache 2.0ã对 Baichuan-13B 模åç社åºä½¿ç¨è§ãBaichuan-13B 模å社åºè®¸å¯åè®®ããBaichuan-13B æ¯æåç¨ãå¦æå° Baichuan-13B 模åæå ¶è¡çåç¨ä½åä¸ç¨éï¼è¯·æ¨æç §å¦ä¸æ¹å¼èç³»è®¸å¯æ¹ï¼ä»¥è¿è¡ç»è®°å¹¶åè®¸å¯æ¹ç³è¯·ä¹¦é¢ææï¼èç³»é®ç®± opensource@baichuan-inc.comã
Top Related Projects
Inference code for Llama models
An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型
A series of large language models trained from scratch by developers @01-ai
Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We also show you how to solve end to end problems using Llama model family and using them on various provider services
Convert
designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot