Top Related Projects
ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型
ChatGLM2-6B: An Open Bilingual Chat LLM | 开源双语对话语言模型
FlagAI (Fast LArge-scale General AI models) is a fast, easy-to-use and extensible toolkit for large-scale model.
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
A 13B large language model developed by Baichuan Intelligent Technology
GLM-130B: An Open Bilingual Pre-Trained Model (ICLR 2023)
Quick Overview
ChatGLM3 is an open-source, bilingual (Chinese and English) large language model developed by Zhipu AI. It is the latest iteration in the ChatGLM series, featuring improved performance, expanded knowledge, and enhanced instruction-following capabilities compared to its predecessors.
Pros
- Bilingual support for Chinese and English
- Open-source and freely available for research and commercial use
- Improved performance and expanded knowledge base compared to previous versions
- Supports various deployment options, including quantized versions for efficiency
Cons
- May require significant computational resources for optimal performance
- Documentation is primarily in Chinese, which could be a barrier for non-Chinese speakers
- As with all large language models, potential for biases and inaccuracies in generated content
- Limited support for languages other than Chinese and English
Code Examples
# Loading and using the ChatGLM3 model
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True).half().cuda()
response, history = model.chat(tokenizer, "Hello! How are you?", history=[])
print(response)
# Using ChatGLM3 with streaming output
for response, history in model.stream_chat(tokenizer, "Tell me about artificial intelligence", history=[]):
print(response)
# Quantized inference for improved efficiency
model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True).quantize(4).half().cuda()
response, history = model.chat(tokenizer, "Explain quantum computing", history=[])
print(response)
Getting Started
To get started with ChatGLM3, follow these steps:
-
Install the required dependencies:
pip install transformers torch -
Load the model and tokenizer:
from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True) model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True).half().cuda() -
Start chatting with the model:
response, history = model.chat(tokenizer, "Hello, ChatGLM3!", history=[]) print(response)
For more advanced usage and deployment options, refer to the project's documentation on GitHub.
Competitor Comparisons
ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型
Pros of ChatGLM-6B
- More established and stable, with a longer development history
- Better documented, making it easier for new users to get started
- Potentially more optimized for specific use cases
Cons of ChatGLM-6B
- Smaller model size (6B parameters) compared to ChatGLM3's larger variants
- May lack some of the advanced features and improvements introduced in ChatGLM3
- Potentially lower performance on certain tasks due to its smaller size
Code Comparison
ChatGLM-6B:
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().cuda()
ChatGLM3:
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True).half().cuda()
The code snippets are very similar, with the main difference being the model name used in the from_pretrained method. ChatGLM3 uses "THUDM/chatglm3-6b" instead of "THUDM/chatglm-6b". This reflects the evolution of the model while maintaining a similar API for ease of use and compatibility.
ChatGLM2-6B: An Open Bilingual Chat LLM | 开源双语对话语言模型
Pros of ChatGLM2-6B
- More established and stable, with a longer development history
- Potentially better performance on certain tasks due to its 6B parameter size
- Wider community adoption and more extensive documentation
Cons of ChatGLM2-6B
- Less advanced features compared to the newer ChatGLM3
- May lack some of the latest improvements in language modeling
- Potentially slower inference speed for certain applications
Code Comparison
ChatGLM2-6B:
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True).half().cuda()
response, history = model.chat(tokenizer, "你好", history=[])
print(response)
ChatGLM3:
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True).half().cuda()
response, history = model.chat(tokenizer, "你好", history=[])
print(response)
The code structure is similar, with the main difference being the model name used in the from_pretrained method. ChatGLM3 uses "THUDM/chatglm3-6b" instead of "THUDM/chatglm2-6b".
FlagAI (Fast LArge-scale General AI models) is a fast, easy-to-use and extensible toolkit for large-scale model.
Pros of FlagAI
- Broader scope: FlagAI is a comprehensive AI framework supporting various tasks beyond language models
- More extensive documentation and examples for different AI applications
- Active community with frequent updates and contributions
Cons of FlagAI
- Steeper learning curve due to its broader feature set
- Potentially more complex setup and configuration for specific use cases
- May have higher computational requirements for some tasks
Code Comparison
ChatGLM3:
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True).half().cuda()
FlagAI:
from flagai.auto_model.auto_loader import AutoLoader
from flagai.model.predictor.predictor import Predictor
auto_loader = AutoLoader("GLM-large-ch", "lm")
model = auto_loader.get_model()
tokenizer = auto_loader.get_tokenizer()
Both repositories provide easy-to-use interfaces for loading and using language models. ChatGLM3 focuses specifically on the ChatGLM3 model, while FlagAI offers a more generalized approach for various AI models and tasks. The code snippets demonstrate the simplicity of model initialization in both frameworks, with FlagAI providing a slightly more abstracted interface through its AutoLoader class.
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
Pros of Qwen
- Offers a wider range of model sizes, including smaller variants for resource-constrained environments
- Provides more extensive documentation and usage examples
- Includes pre-trained models for specific tasks like text generation and question answering
Cons of Qwen
- Less focus on multilingual capabilities compared to ChatGLM3
- May require more computational resources for larger model variants
- Has a shorter development history, potentially leading to fewer community contributions
Code Comparison
ChatGLM3:
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True).half().cuda()
Qwen:
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-7B", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B", trust_remote_code=True).half().cuda()
Both repositories use similar approaches for loading models and tokenizers, with minor differences in the specific classes and model names used.
A 13B large language model developed by Baichuan Intelligent Technology
Pros of Baichuan-13B
- Larger model size (13B parameters) potentially offering more advanced language understanding and generation capabilities
- Supports both Chinese and English languages, making it more versatile for multilingual applications
- Provides pre-trained models and fine-tuning scripts, enabling easier customization for specific use cases
Cons of Baichuan-13B
- Less extensive documentation and examples compared to ChatGLM3, which may make it more challenging for beginners to use
- Fewer optimization techniques for inference speed and memory usage, potentially limiting its deployment on resource-constrained devices
- Limited community contributions and updates, possibly resulting in slower development and issue resolution
Code Comparison
ChatGLM3:
model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True).half().cuda()
response, history = model.chat(tokenizer, "你好", history=[])
print(response)
Baichuan-13B:
tokenizer = AutoTokenizer.from_pretrained("baichuan-inc/Baichuan-13B-Chat", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("baichuan-inc/Baichuan-13B-Chat", trust_remote_code=True)
inputs = tokenizer("你好", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Both repositories offer powerful language models, but ChatGLM3 focuses more on optimization and ease of use, while Baichuan-13B provides a larger model with multilingual support. The code examples demonstrate similar usage patterns, with slight differences in model initialization and generation methods.
GLM-130B: An Open Bilingual Pre-Trained Model (ICLR 2023)
Pros of GLM-130B
- Larger model size (130B parameters) potentially offering more advanced capabilities
- Designed for general language tasks, providing broader applicability
- Supports multiple languages, enhancing its versatility
Cons of GLM-130B
- Requires more computational resources due to its size
- May have slower inference times compared to smaller models
- Less optimized for specific chat-based applications
Code Comparison
GLM-130B:
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("THUDM/glm-130b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/glm-130b", trust_remote_code=True)
ChatGLM3:
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True)
The code snippets show similar usage patterns for both models, with the main difference being the model name in the from_pretrained method. GLM-130B uses "THUDM/glm-130b", while ChatGLM3 uses "THUDM/chatglm3-6b", reflecting their respective model sizes and versions.
Convert
designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
ChatGLM3
ð Report ⢠ð¤ HF Repo ⢠ð¤ ModelScope ⢠ð£ WiseModel ⢠ð Document ⢠𧰠OpenXLab ⢠ð¦ Twitter
ð å å ¥æä»¬ç Discord å 微信
ðå¨ chatglm.cn ä½éªæ´å¤§è§æ¨¡ç ChatGLM 模åã
ð å
³äºChatGLM3-6B æ´ä¸ºè¯¦ç»ç使ç¨ä¿¡æ¯ï¼å¯ä»¥åè
GLM-4 弿ºæ¨¡ååAPI
æä»¬å·²ç»å叿æ°ç GLM-4 模åï¼è¯¥æ¨¡åå¨å¤ä¸ªææ ä¸æäºæ°ççªç ´ï¼æ¨å¯ä»¥å¨ä»¥ä¸ä¸¤ä¸ªæ¸ éä½éªæä»¬çææ°æ¨¡åã
-
GLM-4 弿ºæ¨¡å æä»¬å·²ç»å¼æºäº GLM-4-9B ç³»åæ¨¡åï¼å¨åé¡¹ææ çæµè¯ä¸æææ¾æåï¼æ¬¢è¿å°è¯ã
-
æºè°±æ¸ è¨ ä½éªææ°ç GLM-4ï¼å æ¬ GLMsï¼All toolsçåè½ã
-
APIå¹³å° æ°ä¸ä»£ API å¹³å°å·²ç»ä¸çº¿ï¼æ¨å¯ä»¥ç´æ¥å¨ API å¹³å°ä¸ä½éª
GLM-4-0520ãGLM-4-airãGLM-4-airxãGLM-4-flashãGLM-4ãGLM-3-TurboãCharacterGLM-3ï¼CogView-3çæ°æ¨¡åã å ¶ä¸GLM-4ãGLM-3-Turboä¸¤ä¸ªæ¨¡åæ¯æäºSystem PromptãFunction CallãRetrievalãWeb_Searchçæ°åè½ï¼æ¬¢è¿ä½éªã -
GLM-4 API 弿ºæç¨ GLM-4 APIæç¨ååºç¡åºç¨ï¼æ¬¢è¿å°è¯ã APIç¸å ³é®é¢å¯ä»¥å¨æ¬å¼æºæç¨çé®ï¼æè ä½¿ç¨ GLM-4 API AI婿 æ¥è·å¾å¸¸è§é®é¢ç帮å©ã
ChatGLM3 ä»ç»
ChatGLM3 æ¯æºè°±AI忏 åå¤§å¦ KEG å®éªå®¤èååå¸ç对è¯é¢è®ç»æ¨¡åãChatGLM3-6B æ¯ ChatGLM3 ç³»åä¸ç弿ºæ¨¡åï¼å¨ä¿çäºå两代模åå¯¹è¯æµç ãé¨ç½²é¨æ§ä½çä¼å¤ä¼ç§ç¹æ§çåºç¡ä¸ï¼ChatGLM3-6B å¼å ¥äºå¦ä¸ç¹æ§ï¼
- æ´å¼ºå¤§çåºç¡æ¨¡åï¼ ChatGLM3-6B çåºç¡æ¨¡å ChatGLM3-6B-Base éç¨äºæ´å¤æ ·çè®ç»æ°æ®ãæ´å åçè®ç»æ¥æ°åæ´åççè®ç»çç¥ãå¨è¯ä¹ãæ°å¦ãæ¨çã代ç ãç¥è¯çä¸åè§åº¦çæ°æ®é䏿µè¯æ¾ç¤ºï¼* *ChatGLM3-6B-Base å ·æå¨ 10B 以ä¸çåºç¡æ¨¡åä¸æå¼ºçæ§è½**ã
- æ´å®æ´çåè½æ¯æï¼ ChatGLM3-6B éç¨äºå ¨æ°è®¾è®¡ç Prompt æ ¼å¼ ï¼é¤æ£å¸¸çå¤è½®å¯¹è¯å¤ãåæ¶åçæ¯æå·¥å ·è°ç¨ï¼Function Callï¼ãä»£ç æ§è¡ï¼Code Interpreterï¼å Agent ä»»å¡çå¤æåºæ¯ã
- æ´å ¨é¢ç弿ºåºåï¼ é¤äºå¯¹è¯æ¨¡å ChatGLM3-6B å¤ï¼è¿å¼æºäºåºç¡æ¨¡å ChatGLM3-6B-Base ãé¿ææ¬å¯¹è¯æ¨¡å ChatGLM3-6B-32K åè¿ä¸æ¥å¼ºåäºå¯¹äºé¿ææ¬çè§£è½åç ChatGLM3-6B-128Kã以䏿ææé坹妿¯ç ç©¶å®å ¨å¼æ¾ ï¼å¨å¡«å é®å· è¿è¡ç»è®°å**亦å 许å è´¹åä¸ä½¿ç¨**ã
ChatGLM3 弿ºæ¨¡åæ¨å¨ä¸å¼æºç¤¾åºä¸èµ·æ¨å¨å¤§æ¨¡åææ¯åå±ï¼æ³è¯·å¼åè å大家éµå® 弿ºåè®® ï¼å¿å°å¼æºæ¨¡åå代ç ååºäºå¼æºé¡¹ç®äº§ççè¡çç©ç¨äºä»»ä½å¯è½ç»å½å®¶å社ä¼å¸¦æ¥å±å®³çç¨é以åç¨äºä»»ä½æªç»è¿å®å ¨è¯ä¼°å夿¡çæå¡ãç®åï¼æ¬é¡¹ç®å¢éæªåºäº ChatGLM3 弿ºæ¨¡åå¼åä»»ä½åºç¨ï¼å æ¬ç½é¡µç«¯ãå®åãè¹æ iOS å Windows App çåºç¨ã
尽管模åå¨è®ç»çåä¸ªé¶æ®µé½å°½åç¡®ä¿æ°æ®çåè§æ§ååç¡®æ§ï¼ä½ç±äº ChatGLM3-6B 模åè§æ¨¡è¾å°ï¼ä¸æ¨¡å忦çéæºæ§å ç´ å½±åï¼æ æ³ä¿è¯è¾åºå 容çåç¡®ãåæ¶æ¨¡åçè¾åºå®¹æè¢«ç¨æ·çè¾å ¥è¯¯å¯¼ã* æ¬é¡¹ç®ä¸æ¿æ 弿ºæ¨¡åå代ç 导è´çæ°æ®å®å ¨ãèæ é£é©æåç任使¨¡åè¢«è¯¯å¯¼ãæ»¥ç¨ãä¼ æãä¸å½å©ç¨è产ççé£é©å责任ã*
模åå表
| Model | Seq Length | Download |
|---|---|---|
| ChatGLM3-6B | 8k | HuggingFace | ModelScope | WiseModel | OpenXLab |
| ChatGLM3-6B-Base | 8k | HuggingFace | ModelScope | WiseModel | OpenXLabl |
| ChatGLM3-6B-32K | 32k | HuggingFace | ModelScope | WiseModel | OpenXLab |
| ChatGLM3-6B-128K | 128k | HuggingFace ï½ ModelScope| OpenXLab |
请注æï¼æææ¨¡åçææ°æ´æ°é½ä¼å¨ Huggingface çå
åå¸ã ModelScope å WiseModel ç±äºæ²¡æä¸ Huggingface 忥ï¼éè¦å¼å人åæå¨æ´æ°ï¼å¯è½ä¼å¨
Huggingface æ´æ°å䏿®µæ¶é´å
åæ¥æ´æ°ã
åæ é¾æ¥
以ä¸ä¼ç§å¼æºä»åºå·²ç»å¯¹ ChatGLM3-6B æ¨¡åæ·±åº¦æ¯æï¼æ¬¢è¿å¤§å®¶æ©å±å¦ä¹ ã
æ¨çå éï¼
- chatglm.cpp: 类似 llama.cpp çéåå éæ¨çæ¹æ¡ï¼å®ç°ç¬è®°æ¬ä¸å®æ¶å¯¹è¯
- ChatGLM3-TPU: éç¨TPUå éæ¨çæ¹æ¡ï¼å¨ç®è½ç«¯ä¾§è¯çBM1684Xï¼16T@FP16ï¼å å16Gï¼ä¸å®æ¶è¿è¡çº¦7.5 token/s
- TensorRT-LLM: NVIDIAå¼åç髿§è½ GPU å éæ¨çæ¹æ¡ï¼å¯ä»¥åèæ¤ æ¥éª¤ é¨ç½² ChatGLM3-6B 模å
- OpenVINO: Intel å¼åç髿§è½ CPU å GPU å éæ¨çæ¹æ¡ï¼å¯ä»¥åèæ¤ æ¥éª¤ é¨ç½² ChatGLM3-6B 模å
髿微è°ï¼
- LLaMA-Factory: ä¼ç§æä¸æçé«æå¾®è°æ¡æ¶ã
åºç¨æ¡æ¶ï¼
-
LangChain-Chatchat: åºäº ChatGLM ç大è¯è¨æ¨¡åä¸ Langchain çåºç¨æ¡æ¶å®ç°ï¼å¼æºãå¯ç¦»çº¿é¨ç½²çæ£ç´¢å¢å¼ºçæ(RAG)大模åç¥è¯åºé¡¹ç®ã
-
BISHENG: 弿ºå¤§æ¨¡ååºç¨å¼åå¹³å°,èµè½åå é大模ååºç¨å¼åè½å°ï¼å¸®å©ç¨æ·ä»¥æä½³ä½éªè¿å ¥ä¸ä¸ä»£åºç¨å¼å模å¼ã
-
RAGFlow: RAGFlow æ¯ä¸æ¬¾åºäºæ·±åº¦ææ¡£çè§£æå»ºç弿º RAGï¼Retrieval-Augmented Generationï¼å¼æãå¯ä¸ºåç§è§æ¨¡çä¼ä¸å个人æä¾ä¸å¥ç²¾ç®ç RAG 工使µç¨ï¼ç»å大è¯è¨æ¨¡åï¼LLMï¼éå¯¹ç¨æ·åç±»ä¸åçå¤ææ ¼å¼æ°æ®æä¾å¯é çé®ç以åæçææ®çå¼ç¨ã
è¯æµç»æ
å ¸åä»»å¡
æä»¬éåäº 8 个ä¸è±æå ¸åæ°æ®éï¼å¨ ChatGLM3-6B (base) çæ¬ä¸è¿è¡äºæ§è½æµè¯ã
| Model | GSM8K | MATH | BBH | MMLU | C-Eval | CMMLU | MBPP | AGIEval |
|---|---|---|---|---|---|---|---|---|
| ChatGLM2-6B-Base | 32.4 | 6.5 | 33.7 | 47.9 | 51.7 | 50.0 | - | - |
| Best Baseline | 52.1 | 13.1 | 45.0 | 60.1 | 63.5 | 62.2 | 47.5 | 45.8 |
| ChatGLM3-6B-Base | 72.3 | 25.7 | 66.1 | 61.4 | 69.0 | 67.5 | 52.4 | 53.7 |
Best Baseline æçæ¯æªæ¢ 2023å¹´10æ27æ¥ã模ååæ°å¨ 10B 以ä¸ãå¨å¯¹åºæ°æ®éä¸è¡¨ç°æå¥½çé¢è®ç»æ¨¡åï¼ä¸å æ¬åªé对æä¸é¡¹ä»»å¡è®ç»èæªä¿æéç¨è½åçæ¨¡åã
对 ChatGLM3-6B-Base çæµè¯ä¸ï¼BBH éç¨ 3-shot æµè¯ï¼éè¦æ¨çç GSM8KãMATH éç¨ 0-shot CoT æµè¯ï¼MBPP éç¨ 0-shot çæåè¿è¡æµä¾è®¡ç® Pass@1 ï¼å ¶ä»éæ©é¢ç±»åæ°æ®éåéç¨ 0-shot æµè¯ã
æä»¬å¨å¤ä¸ªé¿ææ¬åºç¨åºæ¯ä¸å¯¹ ChatGLM3-6B-32K è¿è¡äºäººå·¥è¯ä¼°æµè¯ãä¸äºä»£æ¨¡åç¸æ¯ï¼å ¶ææå¹³åæåäºè¶ è¿ 50%ãå¨è®ºæé è¯»ãææ¡£æè¦åè´¢æ¥åæçåºç¨ä¸ï¼è¿ç§æå尤为æ¾èãæ¤å¤ï¼æä»¬è¿å¨ LongBench è¯æµéä¸å¯¹æ¨¡åè¿è¡äºæµè¯ï¼å ·ä½ç»æå¦ä¸è¡¨æç¤º
| Model | å¹³å | Summary | Single-Doc QA | Multi-Doc QA | Code | Few-shot | Synthetic |
|---|---|---|---|---|---|---|---|
| ChatGLM2-6B-32K | 41.5 | 24.8 | 37.6 | 34.7 | 52.8 | 51.3 | 47.7 |
| ChatGLM3-6B-32K | 50.2 | 26.6 | 45.8 | 46.1 | 56.2 | 61.2 | 65 |
ä½¿ç¨æ¹å¼
ç¯å¢å®è£
é¦å éè¦ä¸è½½æ¬ä»åºï¼
git clone https://github.com/THUDM/ChatGLM3
cd ChatGLM3
ç¶åä½¿ç¨ pip å®è£ ä¾èµï¼
pip install -r requirements.txt
- 为äºä¿è¯
torchççæ¬æ£ç¡®ï¼è¯·ä¸¥æ ¼æç § 宿¹ææ¡£ ç说æå®è£ ã
综å Demo
æä»¬æä¾äºä¸ä¸ªéæä»¥ä¸ä¸ç§åè½ç综å Demoï¼è¿è¡æ¹æ³è¯·åè综å Demo
- Chat: å¯¹è¯æ¨¡å¼ï¼å¨æ¤æ¨¡å¼ä¸å¯ä»¥ä¸æ¨¡åè¿è¡å¯¹è¯ã
- Tool: å·¥å ·æ¨¡å¼ï¼æ¨¡åé¤äºå¯¹è¯å¤ï¼è¿å¯ä»¥éè¿å·¥å ·è¿è¡å ¶ä»æä½ã
- Code Interpreter: 代ç è§£é卿¨¡å¼ï¼æ¨¡åå¯ä»¥å¨ä¸ä¸ª Jupyter ç¯å¢ä¸æ§è¡ä»£ç å¹¶è·åç»æï¼ä»¥å®æå¤æä»»å¡ã
代ç è°ç¨
å¯ä»¥éè¿å¦ä¸ä»£ç è°ç¨ ChatGLM æ¨¡åæ¥çæå¯¹è¯ï¼
>> from transformers import AutoTokenizer, AutoModel
>> tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True)
>> model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True, device='cuda')
>> model = model.eval()
>> response, history = model.chat(tokenizer, "ä½ å¥½", history=[])
>> print(response)
ä½ å¥½ð!ææ¯äººå·¥æºè½å©æ ChatGLM3 - 6B, å¾é«å
´è§å°ä½ , 欢è¿é®æä»»ä½é®é¢ã
>> response, history = model.chat(tokenizer, "æä¸ç¡ä¸çåºè¯¥æä¹å", history=history)
>> print(response)
æä¸ç¡ä¸çå¯è½ä¼è®©ä½ æå°ç¦èæä¸èæ, ä½ä»¥ä¸æ¯ä¸äºå¯ä»¥å¸®å©ä½ å
¥ç¡çæ¹æ³:
1.å¶å®è§å¾çç¡ç æ¶é´è¡¨: ä¿æè§å¾çç¡ç æ¶é´è¡¨å¯ä»¥å¸®å©ä½ 建ç«å¥åº·çç¡ç ä¹ æ¯, ä½¿ä½ æ´å®¹æå
¥ç¡ãå°½é卿¯å¤©çç¸åæ¶é´ä¸åº, å¹¶å¨å䏿¶é´èµ·åºã
2.åé ä¸ä¸ªèéçç¡ç ç¯å¢: ç¡®ä¿ç¡ç ç¯å¢èé, å®é, 黿䏿¸©åº¦éå®ãå¯ä»¥ä½¿ç¨èéçåºä¸ç¨å, å¹¶ä¿ææ¿é´éé£ã
3.æ¾æ¾èº«å¿: å¨ç¡ååäºæ¾æ¾çæ´»å¨, ä¾å¦æ³¡ä¸ªç水澡, å¬äºè½»æçé³ä¹, é
读ä¸äºæè¶£ç书ç±ç, æå©äºç¼è§£ç´§å¼ åç¦è, ä½¿ä½ æ´å®¹æå
¥ç¡ã
4.é¿å
饮ç¨å«æåå¡å ç饮æ: åå¡å æ¯ä¸ç§åºæ¿æ§ç©è´¨, ä¼å½±åä½ çç¡ç è´¨éãå°½éé¿å
å¨ç¡å饮ç¨å«æåå¡å ç饮æ, ä¾å¦åå¡, è¶åå¯ä¹ã
5.é¿å
å¨åºä¸åä¸ç¡ç æ å
³çäºæ
: å¨åºä¸åäºä¸ç¡ç æ å
³çäºæ
, ä¾å¦ççµå½±, ç©æ¸¸ææå·¥ä½ç, å¯è½ä¼å¹²æ°ä½ çç¡ç ã
6.å°è¯å¼å¸æå·§: æ·±å¼å¸æ¯ä¸ç§æ¾æ¾æå·§, å¯ä»¥å¸®å©ä½ ç¼è§£ç´§å¼ åç¦è, ä½¿ä½ æ´å®¹æå
¥ç¡ãè¯çæ
¢æ
¢å¸æ°, ä¿æå ç§é, ç¶åç¼æ
¢å¼æ°ã
妿è¿äºæ¹æ³æ æ³å¸®å©ä½ å
¥ç¡, ä½ å¯ä»¥èèå¨è¯¢å»çæç¡ç ä¸å®¶, 寻æ±è¿ä¸æ¥ç建议ã
仿¬å°å 载模å
以ä¸ä»£ç ä¼ç± transformers
èªå¨ä¸è½½æ¨¡åå®ç°ååæ°ã宿´ç模åå®ç°å¨ Hugging Face Hub
ãå¦æä½ çç½ç»ç¯å¢è¾å·®ï¼ä¸è½½æ¨¡ååæ°å¯è½ä¼è±è´¹è¾é¿æ¶é´çè³å¤±è´¥ãæ¤æ¶å¯ä»¥å
å°æ¨¡åä¸è½½å°æ¬å°ï¼ç¶å仿¬å°å è½½ã
ä» Hugging Face Hub ä¸è½½æ¨¡åéè¦å å®è£ Git LFS ï¼ç¶åè¿è¡
git clone https://huggingface.co/THUDM/chatglm3-6b
妿ä»ä½ ä» HuggingFace ä¸è½½æ¯è¾æ ¢ï¼ä¹å¯ä»¥ä» ModelScope ä¸ä¸è½½ã
模åå¾®è°
æä»¬æä¾äºä¸ä¸ªå¾®è° ChatGLM3-6B 模åçåºç¡å¥ä»¶ï¼å¯ä»¥ç¨æ¥å¾®è° ChatGLM3-6B 模åãå¾®è°å¥ä»¶çä½¿ç¨æ¹æ³è¯·åè å¾®è°å¥ä»¶ã
ç½é¡µçå¯¹è¯ Demo
å¯ä»¥éè¿ä»¥ä¸å½ä»¤å¯å¨åºäº Gradio çç½é¡µç demoï¼
python web_demo_gradio.py

å¯ä»¥éè¿ä»¥ä¸å½ä»¤å¯å¨åºäº Streamlit çç½é¡µç demoï¼
streamlit run web_demo_streamlit.py
ç½é¡µç demo ä¼è¿è¡ä¸ä¸ª Web Serverï¼å¹¶è¾åºå°åã卿µè§å¨ä¸æå¼è¾åºçå°åå³å¯ä½¿ç¨ã ç»æµè¯ï¼åºäº Streamlit çç½é¡µç Demo 伿´æµç ã
å½ä»¤è¡å¯¹è¯ Demo

è¿è¡ä»åºä¸ cli_demo.pyï¼
python cli_demo.py
ç¨åºä¼å¨å½ä»¤è¡ä¸è¿è¡äº¤äºå¼ç对è¯ï¼å¨å½ä»¤è¡ä¸è¾å
¥æç¤ºå¹¶å车å³å¯çæåå¤ï¼è¾å
¥ clear å¯ä»¥æ¸
空对è¯åå²ï¼è¾å
¥ stop ç»æ¢ç¨åºã
LangChain Demo
代ç å®ç°è¯·åè LangChain Demoã
å·¥å ·è°ç¨
å ³äºå·¥å ·è°ç¨çæ¹æ³è¯·åè å·¥å ·è°ç¨ã
OpenAI API / Zhipu API Demo
æä»¬å·²ç»æ¨åºäº OpenAI / ZhipuAI æ ¼å¼ç 弿ºæ¨¡å API é¨ç½²ä»£ç ï¼å¯ä»¥ä½ä¸ºä»»æåºäº ChatGPT çåºç¨çå端ã ç®åï¼å¯ä»¥éè¿è¿è¡ä»åºä¸ç api_server.py è¿è¡é¨ç½²
cd openai_api_demo
python api_server.py
åæ¶ï¼æä»¬ä¹ä¹¦åäºä¸ä¸ªç¤ºä¾ä»£ç ï¼ç¨æ¥æµè¯APIè°ç¨çæ§è½ã
-
OpenAI æµè¯èæ¬ï¼openai_api_request.py
-
ZhipuAI æµè¯èæ¬ï¼zhipu_api_request.py
-
使ç¨Curlè¿è¡æµè¯
-
chat Curl æµè¯
curl -X POST "http://127.0.0.1:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
-d "{\"model\": \"chatglm3-6b\", \"messages\": [{\"role\": \"system\", \"content\": \"You are ChatGLM3, a large language model trained by Zhipu.AI. Follow the user's instructions carefully. Respond using markdown.\"}, {\"role\": \"user\", \"content\": \"ä½ å¥½ï¼ç»æè®²ä¸ä¸ªæ
äºï¼å¤§æ¦100å\"}], \"stream\": false, \"max_tokens\": 100, \"temperature\": 0.8, \"top_p\": 0.8}"
- Standard openai interface agent-chat Curl æµè¯
curl -X POST "http://127.0.0.1:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
-d "{\"model\": \"chatglm3-6b\", \"messages\": [{\"role\": \"user\", \"content\": \"37ä¹ä»¥8å 7é¤2çäºå¤å°ï¼\"}], "tools": [{"name": "track", "description": "追踪æå®è¡ç¥¨ç宿¶ä»·æ ¼",
"parameters": {"type": "object", "properties": {"symbol": {"description": "éè¦è¿½è¸ªçè¡ç¥¨ä»£ç "}},
"required": []}},
{"name": "Calculator", "description": "æ°å¦è®¡ç®å¨ï¼è®¡ç®æ°å¦é®é¢",
"parameters": {"type": "object", "properties": {"symbol": {"description": "è¦è®¡ç®çæ°å¦å
¬å¼"}},
"required": []}}
], \"stream\": true, \"max_tokens\": 100, \"temperature\": 0.8, \"top_p\": 0.8}"
- Openai style custom interface agent-chat Curl æµè¯ï¼ä½ éè¦å®ç°èªå®ä¹çå·¥å ·æè¿°èæ¬openai_api_demo/tools/schema.pyçå 容ï¼å¹¶ä¸å°api_server.pyä¸AGENT_CONTROLLERæå®ä¸º'true'ï¼ï¼
curl -X POST "http://127.0.0.1:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
-d "{\"model\": \"chatglm3-6b\", \"messages\": [{\"role\": \"user\", \"content\": \"37ä¹ä»¥8å 7é¤2çäºå¤å°ï¼\"}], \"stream\": true, \"max_tokens\": 100, \"temperature\": 0.8, \"top_p\": 0.8}"
该æ¥å£ç¨äºopenai飿 ¼çèªå®ä¹å·¥å ·ç®±çèªä¸»è°åº¦ãå ·æè°åº¦å¼å¸¸çèªå¤çåå¤è½åï¼æ éå¦å¤å®ç°è°åº¦ç®æ³ï¼ç¨æ·æ éapi_keyã
- 使ç¨Pythonè¿è¡æµè¯
cd openai_api_demo
python openai_api_request.py
妿æµè¯æåï¼å模ååºè¯¥è¿å䏿®µæ äºã
使æ¬é¨ç½²
模åéå
é»è®¤æ åµä¸ï¼æ¨¡å以 FP16 精度å è½½ï¼è¿è¡ä¸è¿°ä»£ç éè¦å¤§æ¦ 13GB æ¾åãå¦æä½ ç GPU æ¾åæéï¼å¯ä»¥å°è¯ä»¥éåæ¹å¼å 载模åï¼ä½¿ç¨æ¹æ³å¦ä¸ï¼
model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True).quantize(4).cuda()
模åéåä¼å¸¦æ¥ä¸å®çæ§è½æå¤±ï¼ç»è¿æµè¯ï¼ChatGLM3-6B å¨ 4-bit éåä¸ä»ç¶è½å¤è¿è¡èªç¶æµç ççæã
CPU é¨ç½²
å¦æä½ æ²¡æ GPU 硬件çè¯ï¼ä¹å¯ä»¥å¨ CPU ä¸è¿è¡æ¨çï¼ä½æ¯æ¨çéåº¦ä¼æ´æ ¢ãä½¿ç¨æ¹æ³å¦ä¸ï¼éè¦å¤§æ¦ 32GB å åï¼
model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True).float()
Mac é¨ç½²
å¯¹äºæè½½äº Apple Silicon æè AMD GPU ç Macï¼å¯ä»¥ä½¿ç¨ MPS å端æ¥å¨ GPU ä¸è¿è¡ ChatGLM3-6Bãéè¦åè Apple ç 宿¹è¯´æ å®è£ PyTorch-Nightlyï¼æ£ç¡®ççæ¬å·åºè¯¥æ¯2.x.x.dev2023xxxxï¼è䏿¯ 2.x.xï¼ã
ç®åå¨ MacOS ä¸åªæ¯æä»æ¬å°å 载模åãå°ä»£ç ä¸ç模åå è½½æ¹ä¸ºä»æ¬å°å è½½ï¼å¹¶ä½¿ç¨ mps å端ï¼
model = AutoModel.from_pretrained("your local path", trust_remote_code=True).to('mps')
å è½½å精度ç ChatGLM3-6B 模åéè¦å¤§æ¦ 13GB å åãå åè¾å°çæºå¨ï¼æ¯å¦ 16GB å åç MacBook Proï¼ï¼å¨ç©ºä½å åä¸è¶³çæ åµä¸ä¼ä½¿ç¨ç¡¬çä¸çèæå åï¼å¯¼è´æ¨çé度严éåæ ¢ã
å¤å¡é¨ç½²
å¦æä½ æå¤å¼ GPUï¼ä½æ¯æ¯å¼ GPU çæ¾å大å°é½ä¸è¶³ä»¥å®¹çº³å®æ´ç模åï¼é£ä¹å¯ä»¥å°æ¨¡åååå¨å¤å¼ GPUä¸ãé¦å
å®è£
accelerate: pip install accelerateï¼ç¶åå³å¯æ£å¸¸å 载模åã
OpenVINO Demo
ChatGLM3-6B å·²ç»æ¯æä½¿ç¨ OpenVINO å·¥å ·å è¿è¡å 鿍çï¼å¨è±ç¹å°çGPUåGPU设å¤ä¸æè¾å¤§æ¨çé度æåãå ·ä½ä½¿ç¨æ¹æ³è¯·åè OpenVINO Demoã
TensorRT-LLM Demo
ChatGLM3-6Bå·²ç»æ¯æä½¿ç¨ TensorRT-LLM å·¥å ·å è¿è¡å 鿍çï¼æ¨¡åæ¨çé度å¾å°å¤åçæåãå ·ä½ä½¿ç¨æ¹æ³è¯·åè TensorRT-LLM Demo å 宿¹ææ¯ææ¡£ã
å¼ç¨
å¦æä½ è§å¾æä»¬ç工使叮å©çè¯ï¼è¯·èèå¼ç¨ä¸å论æã
@misc{glm2024chatglm,
title={ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools},
author={Team GLM and Aohan Zeng and Bin Xu and Bowen Wang and Chenhui Zhang and Da Yin and Diego Rojas and Guanyu Feng and Hanlin Zhao and Hanyu Lai and Hao Yu and Hongning Wang and Jiadai Sun and Jiajie Zhang and Jiale Cheng and Jiayi Gui and Jie Tang and Jing Zhang and Juanzi Li and Lei Zhao and Lindong Wu and Lucen Zhong and Mingdao Liu and Minlie Huang and Peng Zhang and Qinkai Zheng and Rui Lu and Shuaiqi Duan and Shudan Zhang and Shulin Cao and Shuxun Yang and Weng Lam Tam and Wenyi Zhao and Xiao Liu and Xiao Xia and Xiaohan Zhang and Xiaotao Gu and Xin Lv and Xinghan Liu and Xinyi Liu and Xinyue Yang and Xixuan Song and Xunkai Zhang and Yifan An and Yifan Xu and Yilin Niu and Yuantao Yang and Yueyan Li and Yushi Bai and Yuxiao Dong and Zehan Qi and Zhaoyu Wang and Zhen Yang and Zhengxiao Du and Zhenyu Hou and Zihan Wang},
year={2024},
eprint={2406.12793},
archivePrefix={arXiv},
primaryClass={id='cs.CL' full_name='Computation and Language' is_active=True alt_name='cmp-lg' in_archive='cs' is_general=False description='Covers natural language processing. Roughly includes material in ACM Subject Class I.2.7. Note that work on artificial languages (programming languages, logics, formal systems) that does not explicitly address natural-language issues broadly construed (natural-language processing, computational linguistics, speech, text retrieval, etc.) is not appropriate for this area.'}
}
Top Related Projects
ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型
ChatGLM2-6B: An Open Bilingual Chat LLM | 开源双语对话语言模型
FlagAI (Fast LArge-scale General AI models) is a fast, easy-to-use and extensible toolkit for large-scale model.
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
A 13B large language model developed by Baichuan Intelligent Technology
GLM-130B: An Open Bilingual Pre-Trained Model (ICLR 2023)
Convert
designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot