ChatGLM3

ChatGLM3 series: Open Bilingual Chat LLMs | 开源双语对话语言模型

13,753

1,610

13,753

View on GitHub

Top Related Projects

ChatGLM-6B

41,153

ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型

ChatGLM2-6B

15,691

ChatGLM2-6B: An Open Bilingual Chat LLM | 开源双语对话语言模型

FlagAI

3,880

FlagAI (Fast LArge-scale General AI models) is a fast, easy-to-use and extensible toolkit for large-scale model.

Qwen

19,915

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

Baichuan-13B

2,951

A 13B large language model developed by Baichuan Intelligent Technology

GLM-130B

7,684

GLM-130B: An Open Bilingual Pre-Trained Model (ICLR 2023)

Quick Overview

ChatGLM3 is an open-source, bilingual (Chinese and English) large language model developed by Zhipu AI. It is the latest iteration in the ChatGLM series, featuring improved performance, expanded knowledge, and enhanced instruction-following capabilities compared to its predecessors.

Pros

Bilingual support for Chinese and English
Open-source and freely available for research and commercial use
Improved performance and expanded knowledge base compared to previous versions
Supports various deployment options, including quantized versions for efficiency

Cons

May require significant computational resources for optimal performance
Documentation is primarily in Chinese, which could be a barrier for non-Chinese speakers
As with all large language models, potential for biases and inaccuracies in generated content
Limited support for languages other than Chinese and English

Code Examples

# Loading and using the ChatGLM3 model
from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True).half().cuda()
response, history = model.chat(tokenizer, "Hello! How are you?", history=[])
print(response)

# Using ChatGLM3 with streaming output
for response, history in model.stream_chat(tokenizer, "Tell me about artificial intelligence", history=[]):
    print(response)

# Quantized inference for improved efficiency
model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True).quantize(4).half().cuda()
response, history = model.chat(tokenizer, "Explain quantum computing", history=[])
print(response)

Getting Started

To get started with ChatGLM3, follow these steps:

Install the required dependencies:
```
pip install transformers torch
```

Load the model and tokenizer:

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True).half().cuda()

Start chatting with the model:

response, history = model.chat(tokenizer, "Hello, ChatGLM3!", history=[])
print(response)

For more advanced usage and deployment options, refer to the project's documentation on GitHub.

Competitor Comparisons

ChatGLM-6B

41,153

ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型

Pros of ChatGLM-6B

More established and stable, with a longer development history
Better documented, making it easier for new users to get started
Potentially more optimized for specific use cases

Cons of ChatGLM-6B

Smaller model size (6B parameters) compared to ChatGLM3's larger variants
May lack some of the advanced features and improvements introduced in ChatGLM3
Potentially lower performance on certain tasks due to its smaller size

Code Comparison

ChatGLM-6B:

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().cuda()

ChatGLM3:

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True).half().cuda()

The code snippets are very similar, with the main difference being the model name used in the from_pretrained method. ChatGLM3 uses "THUDM/chatglm3-6b" instead of "THUDM/chatglm-6b". This reflects the evolution of the model while maintaining a similar API for ease of use and compatibility.

ChatGLM2-6B

15,691

ChatGLM2-6B: An Open Bilingual Chat LLM | 开源双语对话语言模型

Pros of ChatGLM2-6B

More established and stable, with a longer development history
Potentially better performance on certain tasks due to its 6B parameter size
Wider community adoption and more extensive documentation

Cons of ChatGLM2-6B

Less advanced features compared to the newer ChatGLM3
May lack some of the latest improvements in language modeling
Potentially slower inference speed for certain applications

Code Comparison

ChatGLM2-6B:

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True).half().cuda()
response, history = model.chat(tokenizer, "你好", history=[])
print(response)

ChatGLM3:

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True).half().cuda()
response, history = model.chat(tokenizer, "你好", history=[])
print(response)

The code structure is similar, with the main difference being the model name used in the from_pretrained method. ChatGLM3 uses "THUDM/chatglm3-6b" instead of "THUDM/chatglm2-6b".

FlagAI

3,880

FlagAI (Fast LArge-scale General AI models) is a fast, easy-to-use and extensible toolkit for large-scale model.

Pros of FlagAI

Broader scope: FlagAI is a comprehensive AI framework supporting various tasks beyond language models
More extensive documentation and examples for different AI applications
Active community with frequent updates and contributions

Cons of FlagAI

Steeper learning curve due to its broader feature set
Potentially more complex setup and configuration for specific use cases
May have higher computational requirements for some tasks

Code Comparison

ChatGLM3:

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True).half().cuda()

FlagAI:

from flagai.auto_model.auto_loader import AutoLoader
from flagai.model.predictor.predictor import Predictor

auto_loader = AutoLoader("GLM-large-ch", "lm")
model = auto_loader.get_model()
tokenizer = auto_loader.get_tokenizer()

Both repositories provide easy-to-use interfaces for loading and using language models. ChatGLM3 focuses specifically on the ChatGLM3 model, while FlagAI offers a more generalized approach for various AI models and tasks. The code snippets demonstrate the simplicity of model initialization in both frameworks, with FlagAI providing a slightly more abstracted interface through its AutoLoader class.

Qwen

19,915

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

Pros of Qwen

Offers a wider range of model sizes, including smaller variants for resource-constrained environments
Provides more extensive documentation and usage examples
Includes pre-trained models for specific tasks like text generation and question answering

Cons of Qwen

Less focus on multilingual capabilities compared to ChatGLM3
May require more computational resources for larger model variants
Has a shorter development history, potentially leading to fewer community contributions

Code Comparison

ChatGLM3:

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True).half().cuda()

Qwen:

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-7B", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B", trust_remote_code=True).half().cuda()

Both repositories use similar approaches for loading models and tokenizers, with minor differences in the specific classes and model names used.

Baichuan-13B

2,951

A 13B large language model developed by Baichuan Intelligent Technology

Pros of Baichuan-13B

Larger model size (13B parameters) potentially offering more advanced language understanding and generation capabilities
Supports both Chinese and English languages, making it more versatile for multilingual applications
Provides pre-trained models and fine-tuning scripts, enabling easier customization for specific use cases

Cons of Baichuan-13B

Less extensive documentation and examples compared to ChatGLM3, which may make it more challenging for beginners to use
Fewer optimization techniques for inference speed and memory usage, potentially limiting its deployment on resource-constrained devices
Limited community contributions and updates, possibly resulting in slower development and issue resolution

Code Comparison

ChatGLM3:

model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True).half().cuda()
response, history = model.chat(tokenizer, "你好", history=[])
print(response)

Baichuan-13B:

tokenizer = AutoTokenizer.from_pretrained("baichuan-inc/Baichuan-13B-Chat", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("baichuan-inc/Baichuan-13B-Chat", trust_remote_code=True)
inputs = tokenizer("你好", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Both repositories offer powerful language models, but ChatGLM3 focuses more on optimization and ease of use, while Baichuan-13B provides a larger model with multilingual support. The code examples demonstrate similar usage patterns, with slight differences in model initialization and generation methods.

GLM-130B

7,684

GLM-130B: An Open Bilingual Pre-Trained Model (ICLR 2023)

Pros of GLM-130B

Larger model size (130B parameters) potentially offering more advanced capabilities
Designed for general language tasks, providing broader applicability
Supports multiple languages, enhancing its versatility

Cons of GLM-130B

Requires more computational resources due to its size
May have slower inference times compared to smaller models
Less optimized for specific chat-based applications

Code Comparison

GLM-130B:

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("THUDM/glm-130b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/glm-130b", trust_remote_code=True)

ChatGLM3:

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True)

The code snippets show similar usage patterns for both models, with the main difference being the model name in the from_pretrained method. GLM-130B uses "THUDM/glm-130b", while ChatGLM3 uses "THUDM/chatglm3-6b", reflecting their respective model sizes and versions.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

ChatGLM3

ð Report â¢ ð¤ HF Repo â¢ ð¤ ModelScope â¢ ð£ WiseModel â¢ ð Document â¢ ð§° OpenXLab â¢ ð¦ Twitter

ð å å¥æä»¬ç Discord å å¾®ä¿¡

ðå¨ chatglm.cn ä½éªæ´å¤§è§æ¨¡ç ChatGLM æ¨¡åã

Read this in English.

GLM-4 å¼æºæ¨¡ååAPI

GLM-4 å¼æºæ¨¡å æä»¬å·²ç»å¼æºäº GLM-4-9B ç³»åæ¨¡åï¼å¨åé¡¹ææ çæµè¯ä¸æææ¾æåï¼æ¬¢è¿å°è¯ã
æºè°±æ¸è¨ ä½éªææ°ç GLM-4ï¼åæ¬ GLMsï¼All toolsçåè½ã
APIå¹³å° æ°ä¸ä»£ API å¹³å°å·²ç»ä¸çº¿ï¼æ¨å¯ä»¥ç´æ¥å¨ API å¹³å°ä¸ä½éª GLM-4-0520ãGLM-4-airãGLM-4-airxãGLM-4-flashãGLM-4ãGLM-3-TurboãCharacterGLM-3ï¼CogView-3 çæ°æ¨¡åã å¶ä¸GLM-4ãGLM-3-Turboä¸¤ä¸ªæ¨¡åæ¯æäº System PromptãFunction Callã RetrievalãWeb_Searchçæ°åè½ï¼æ¬¢è¿ä½éªã
GLM-4 API å¼æºæç¨ GLM-4 APIæç¨ååºç¡åºç¨ï¼æ¬¢è¿å°è¯ã APIç¸å³é®é¢å¯ä»¥å¨æ¬å¼æºæç¨çé®ï¼æèä½¿ç¨ GLM-4 API AIå©æ æ¥è·å¾å¸¸è§é®é¢çå¸®å©ã

ChatGLM3 ä»ç»

æ´å¼ºå¤§çåºç¡æ¨¡åï¼ ChatGLM3-6B çåºç¡æ¨¡å ChatGLM3-6B-Base éç¨äºæ´å¤æ ·çè®ç»æ°æ®ãæ´ååçè®ç»æ¥æ°åæ´åççè®ç»çç¥ãå¨è¯ä¹ãæ°å¦ãæ¨çãä»£ç ãç¥è¯çä¸åè§åº¦çæ°æ®éä¸æµè¯æ¾ç¤ºï¼* *ChatGLM3-6B-Base å·æå¨ 10B ä»¥ä¸çåºç¡æ¨¡åä¸æå¼ºçæ§è½**ã
æ´å®æ´çåè½æ¯æï¼ ChatGLM3-6B éç¨äºå¨æ°è®¾è®¡ç Prompt æ ¼å¼ ï¼é¤æ£å¸¸çå¤è½®å¯¹è¯å¤ãåæ¶åçæ¯æå·¥å·è°ç¨ï¼Function Callï¼ãä»£ç æ§è¡ï¼Code Interpreterï¼å Agent ä»»å¡çå¤æåºæ¯ã
æ´å¨é¢çå¼æºåºåï¼ é¤äºå¯¹è¯æ¨¡å ChatGLM3-6B å¤ï¼è¿å¼æºäºåºç¡æ¨¡å ChatGLM3-6B-Base ãé¿ææ¬å¯¹è¯æ¨¡å ChatGLM3-6B-32K åè¿ä¸æ¥å¼ºåäºå¯¹äºé¿ææ¬çè§£è½åç ChatGLM3-6B-128Kãä»¥ä¸æææéå¯¹å¦æ¯ç ç©¶å®å¨å¼æ¾ ï¼å¨å¡«å é®å· è¿è¡ç»è®°å**äº¦åè®¸åè´¹åä¸ä½¿ç¨**ã

æ¨¡ååè¡¨

Model	Seq Length	Download
ChatGLM3-6B	8k	HuggingFace \| ModelScope \| WiseModel \| OpenXLab
ChatGLM3-6B-Base	8k	HuggingFace \| ModelScope \| WiseModel \| OpenXLabl
ChatGLM3-6B-32K	32k	HuggingFace \| ModelScope \| WiseModel \| OpenXLab
ChatGLM3-6B-128K	128k	HuggingFace ï½ ModelScope\| OpenXLab

è¯·æ³¨æï¼æææ¨¡åçææ°æ´æ°é½ä¼å¨ Huggingface çååå¸ã ModelScope å WiseModel ç±äºæ²¡æä¸ Huggingface åæ¥ï¼éè¦å¼åäººåæå¨æ´æ°ï¼å¯è½ä¼å¨ Huggingface æ´æ°åä¸æ®µæ¶é´ååæ¥æ´æ°ã

åæé¾æ¥

æ¨çå éï¼

chatglm.cpp: ç±»ä¼¼ llama.cpp çéåå éæ¨çæ¹æ¡ï¼å®ç°ç¬è®°æ¬ä¸å®æ¶å¯¹è¯
ChatGLM3-TPU: éç¨TPUå éæ¨çæ¹æ¡ï¼å¨ç®è½ç«¯ä¾§è¯çBM1684Xï¼16T@FP16ï¼åå16Gï¼ä¸å®æ¶è¿è¡çº¦7.5 token/s
TensorRT-LLM: NVIDIAå¼åçé«æ§è½ GPU å éæ¨çæ¹æ¡ï¼å¯ä»¥åèæ¤ æ¥éª¤ é¨ç½² ChatGLM3-6B æ¨¡å
OpenVINO: Intel å¼åçé«æ§è½ CPU å GPU å éæ¨çæ¹æ¡ï¼å¯ä»¥åèæ¤ æ¥éª¤ é¨ç½² ChatGLM3-6B æ¨¡å

é«æå¾®è°ï¼

LLaMA-Factory: ä¼ç§æä¸æçé«æå¾®è°æ¡æ¶ã

åºç¨æ¡æ¶ï¼

LangChain-Chatchat: åºäº ChatGLM çå¤§è¯è¨æ¨¡åä¸ Langchain çåºç¨æ¡æ¶å®ç°ï¼å¼æºãå¯ç¦»çº¿é¨ç½²çæ£ç´¢å¢å¼ºçæ(RAG)å¤§æ¨¡åç¥è¯åºé¡¹ç®ã
BISHENG: å¼æºå¤§æ¨¡ååºç¨å¼åå¹³å°,èµè½åå éå¤§æ¨¡ååºç¨å¼åè½å°ï¼å¸®å©ç¨æ·ä»¥æä½³ä½éªè¿å¥ä¸ä¸ä»£åºç¨å¼åæ¨¡å¼ã
RAGFlow: RAGFlow æ¯ä¸æ¬¾åºäºæ·±åº¦ææ¡£çè§£æå»ºçå¼æº RAGï¼Retrieval-Augmented Generationï¼å¼æãå¯ä¸ºåç§è§æ¨¡çä¼ä¸åä¸ªäººæä¾ä¸å¥ç²¾ç®ç RAG å·¥ä½æµç¨ï¼ç»åå¤§è¯è¨æ¨¡åï¼LLMï¼éå¯¹ç¨æ·åç±»ä¸åçå¤ææ ¼å¼æ°æ®æä¾å¯é çé®çä»¥åæçææ®çå¼ç¨ã

è¯æµç»æ

å¸åä»»å¡

Model	GSM8K	MATH	BBH	MMLU	C-Eval	CMMLU	MBPP	AGIEval
ChatGLM2-6B-Base	32.4	6.5	33.7	47.9	51.7	50.0	-	-
Best Baseline	52.1	13.1	45.0	60.1	63.5	62.2	47.5	45.8
ChatGLM3-6B-Base	72.3	25.7	66.1	61.4	69.0	67.5	52.4	53.7

Best Baseline æçæ¯æªæ¢ 2023å¹´10æ27æ¥ãæ¨¡ååæ°å¨ 10B ä»¥ä¸ãå¨å¯¹åºæ°æ®éä¸è¡¨ç°æå¥½çé¢è®ç»æ¨¡åï¼ä¸åæ¬åªéå¯¹æä¸é¡¹ä»»å¡è®ç»èæªä¿æéç¨è½åçæ¨¡åã

å¯¹ ChatGLM3-6B-Base çæµè¯ä¸ï¼BBH éç¨ 3-shot æµè¯ï¼éè¦æ¨çç GSM8KãMATH éç¨ 0-shot CoT æµè¯ï¼MBPP éç¨ 0-shot çæåè¿è¡æµä¾è®¡ç® Pass@1 ï¼å¶ä»éæ©é¢ç±»åæ°æ®éåéç¨ 0-shot æµè¯ã

Model	å¹³å	Summary	Single-Doc QA	Multi-Doc QA	Code	Few-shot	Synthetic
ChatGLM2-6B-32K	41.5	24.8	37.6	34.7	52.8	51.3	47.7
ChatGLM3-6B-32K	50.2	26.6	45.8	46.1	56.2	61.2	65

ä½¿ç¨æ¹å¼

ç¯å¢å®è£

é¦åéè¦ä¸è½½æ¬ä»åºï¼

git clone https://github.com/THUDM/ChatGLM3
cd ChatGLM3

ç¶åä½¿ç¨ pip å®è£ä¾èµï¼

pip install -r requirements.txt

ä¸ºäºä¿è¯ torch ççæ¬æ£ç¡®ï¼è¯·ä¸¥æ ¼æç§ å®æ¹ææ¡£ çè¯´æå®è£ã

ç»¼å Demo

Chat: å¯¹è¯æ¨¡å¼ï¼å¨æ¤æ¨¡å¼ä¸å¯ä»¥ä¸æ¨¡åè¿è¡å¯¹è¯ã
Tool: å·¥å·æ¨¡å¼ï¼æ¨¡åé¤äºå¯¹è¯å¤ï¼è¿å¯ä»¥éè¿å·¥å·è¿è¡å¶ä»æä½ã

Code Interpreter: ä»£ç è§£éå¨æ¨¡å¼ï¼æ¨¡åå¯ä»¥å¨ä¸ä¸ª Jupyter ç¯å¢ä¸æ§è¡ä»£ç å¹¶è·åç»æï¼ä»¥å®æå¤æä»»å¡ã

ä»£ç è°ç¨

å¯ä»¥éè¿å¦ä¸ä»£ç è°ç¨ ChatGLM æ¨¡åæ¥çæå¯¹è¯ï¼

>> from transformers import AutoTokenizer, AutoModel
>> tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True)
>> model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True, device='cuda')
>> model = model.eval()
>> response, history = model.chat(tokenizer, "ä½ å¥½", history=[])
>> print(response)

ä½ å¥½ð!ææ¯äººå·¥æºè½å©æ ChatGLM3 - 6B, å¾é«å´è§å°ä½ , æ¬¢è¿é®æä»»ä½é®é¢ã
>> response, history = model.chat(tokenizer, "æä¸ç¡ä¸çåºè¯¥æä¹å", history=history)
>> print(response)

æä¸ç¡ä¸çå¯è½ä¼è®©ä½ æå°ç¦èæä¸èæ, ä½ä»¥ä¸æ¯ä¸äºå¯ä»¥å¸®å©ä½ å¥ç¡çæ¹æ³:

1.å¶å®è§å¾çç¡ç æ¶é´è¡¨: ä¿æè§å¾çç¡ç æ¶é´è¡¨å¯ä»¥å¸®å©ä½ å»ºç«å¥åº·çç¡ç ä¹ æ¯, ä½¿ä½ æ´å®¹æå¥ç¡ãå°½éå¨æ¯å¤©çç¸åæ¶é´ä¸åº, å¹¶å¨åä¸æ¶é´èµ·åºã
2.åé ä¸ä¸ªèéçç¡ç ç¯å¢: ç¡®ä¿ç¡ç ç¯å¢èé, å®é, é»æä¸æ¸©åº¦éå®ãå¯ä»¥ä½¿ç¨èéçåºä¸ç¨å, å¹¶ä¿ææ¿é´éé£ã
3.æ¾æ¾èº«å¿: å¨ç¡ååäºæ¾æ¾çæ´»å¨, ä¾å¦æ³¡ä¸ªçæ°´æ¾¡, å¬äºè½»æçé³ä¹, éè¯»ä¸äºæè¶£çä¹¦ç±ç, æå©äºç¼è§£ç´§å¼ åç¦è, ä½¿ä½ æ´å®¹æå¥ç¡ã
4.é¿åé¥®ç¨å«æåå¡å çé¥®æ: åå¡å æ¯ä¸ç§åºæ¿æ§ç©è´¨, ä¼å½±åä½ çç¡ç è´¨éãå°½éé¿åå¨ç¡åé¥®ç¨å«æåå¡å çé¥®æ, ä¾å¦åå¡, è¶åå¯ä¹ã
5.é¿åå¨åºä¸åä¸ç¡ç æ å³çäºæ: å¨åºä¸åäºä¸ç¡ç æ å³çäºæ, ä¾å¦ççµå½±, ç©æ¸¸ææå·¥ä½ç, å¯è½ä¼å¹²æ°ä½ çç¡ç ã
6.å°è¯å¼å¸æå·§: æ·±å¼å¸æ¯ä¸ç§æ¾æ¾æå·§, å¯ä»¥å¸®å©ä½ ç¼è§£ç´§å¼ åç¦è, ä½¿ä½ æ´å®¹æå¥ç¡ãè¯çæ¢æ¢å¸æ°, ä¿æå ç§é, ç¶åç¼æ¢å¼æ°ã

å¦æè¿äºæ¹æ³æ æ³å¸®å©ä½ å¥ç¡, ä½ å¯ä»¥èèå¨è¯¢å»çæç¡ç ä¸å®¶, å¯»æ±è¿ä¸æ¥çå»ºè®®ã

ä»æ¬å°å è½½æ¨¡å

ä» Hugging Face Hub ä¸è½½æ¨¡åéè¦åå®è£Git LFS ï¼ç¶åè¿è¡

git clone https://huggingface.co/THUDM/chatglm3-6b

æ¨¡åå¾®è°

ç½é¡µçå¯¹è¯ Demo

web-demo å¯ä»¥éè¿ä»¥ä¸å½ä»¤å¯å¨åºäº Gradio çç½é¡µç demoï¼

python web_demo_gradio.py

web-demo

streamlit run web_demo_streamlit.py

å½ä»¤è¡å¯¹è¯ Demo

cli-demo

è¿è¡ä»åºä¸ cli_demo.pyï¼

python cli_demo.py

LangChain Demo

ä»£ç å®ç°è¯·åè LangChain Demoã

å·¥å·è°ç¨

å³äºå·¥å·è°ç¨çæ¹æ³è¯·åè å·¥å·è°ç¨ã

OpenAI API / Zhipu API Demo

cd openai_api_demo
python api_server.py

OpenAI æµè¯èæ¬ï¼openai_api_request.py
ZhipuAI æµè¯èæ¬ï¼zhipu_api_request.py
ä½¿ç¨Curlè¿è¡æµè¯
chat Curl æµè¯

curl -X POST "http://127.0.0.1:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
-d "{\"model\": \"chatglm3-6b\", \"messages\": [{\"role\": \"system\", \"content\": \"You are ChatGLM3, a large language model trained by Zhipu.AI. Follow the user's instructions carefully. Respond using markdown.\"}, {\"role\": \"user\", \"content\": \"ä½ å¥½ï¼ç»æè®²ä¸ä¸ªæäºï¼å¤§æ¦100å\"}], \"stream\": false, \"max_tokens\": 100, \"temperature\": 0.8, \"top_p\": 0.8}"

Standard openai interface agent-chat Curl æµè¯

curl -X POST "http://127.0.0.1:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
-d "{\"model\": \"chatglm3-6b\", \"messages\": [{\"role\": \"user\", \"content\": \"37ä¹ä»¥8å 7é¤2çäºå¤å°ï¼\"}], "tools": [{"name": "track", "description": "è¿½è¸ªæå®è¡ç¥¨çå®æ¶ä»·æ ¼",
          "parameters": {"type": "object", "properties": {"symbol": {"description": "éè¦è¿½è¸ªçè¡ç¥¨ä»£ç "}},
                         "required": []}},
         {"name": "Calculator", "description": "æ°å¦è®¡ç®å¨ï¼è®¡ç®æ°å¦é®é¢",
          "parameters": {"type": "object", "properties": {"symbol": {"description": "è¦è®¡ç®çæ°å¦å¬å¼"}},
                         "required": []}}
         ], \"stream\": true, \"max_tokens\": 100, \"temperature\": 0.8, \"top_p\": 0.8}"

Openai style custom interface agent-chat Curl æµè¯ï¼ä½ éè¦å®ç°èªå®ä¹çå·¥å·æè¿°èæ¬openai_api_demo/tools/schema.pyçåå®¹ï¼å¹¶ä¸å°api_server.pyä¸AGENT_CONTROLLERæå®ä¸º'true'ï¼ï¼

curl -X POST "http://127.0.0.1:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
-d "{\"model\": \"chatglm3-6b\", \"messages\": [{\"role\": \"user\", \"content\": \"37ä¹ä»¥8å 7é¤2çäºå¤å°ï¼\"}], \"stream\": true, \"max_tokens\": 100, \"temperature\": 0.8, \"top_p\": 0.8}"

ä½¿ç¨Pythonè¿è¡æµè¯

cd openai_api_demo
python openai_api_request.py

å¦ææµè¯æåï¼åæ¨¡ååºè¯¥è¿åä¸æ®µæäºã

ä½ææ¬é¨ç½²

æ¨¡åéå

model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True).quantize(4).cuda()

CPU é¨ç½²

model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True).float()

Mac é¨ç½²

å¯¹äºæè½½äº Apple Silicon æè AMD GPU ç Macï¼å¯ä»¥ä½¿ç¨ MPS åç«¯æ¥å¨ GPU ä¸è¿è¡ ChatGLM3-6Bãéè¦åè Apple ç å®æ¹è¯´æ å®è£ PyTorch-Nightlyï¼æ£ç¡®ççæ¬å·åºè¯¥æ¯2.x.x.dev2023xxxxï¼èä¸æ¯ 2.x.xï¼ã

model = AutoModel.from_pretrained("your local path", trust_remote_code=True).to('mps')

å è½½åç²¾åº¦ç ChatGLM3-6B æ¨¡åéè¦å¤§æ¦ 13GB ååãååè¾å°çæºå¨ï¼æ¯å¦ 16GB ååç MacBook Proï¼ï¼å¨ç©ºä½ååä¸è¶³çæåµä¸ä¼ä½¿ç¨ç¡¬çä¸çèæååï¼å¯¼è´æ¨çéåº¦ä¸¥éåæ¢ã

å¤å¡é¨ç½²

OpenVINO Demo

ChatGLM3-6B å·²ç»æ¯æä½¿ç¨ OpenVINO å·¥å·åè¿è¡å éæ¨çï¼å¨è±ç¹å°çGPUåGPUè®¾å¤ä¸æè¾å¤§æ¨çéåº¦æåãå·ä½ä½¿ç¨æ¹æ³è¯·åè OpenVINO Demoã

TensorRT-LLM Demo

ChatGLM3-6Bå·²ç»æ¯æä½¿ç¨ TensorRT-LLM å·¥å·åè¿è¡å éæ¨çï¼æ¨¡åæ¨çéåº¦å¾å°å¤åçæåãå·ä½ä½¿ç¨æ¹æ³è¯·åè TensorRT-LLM Demo å å®æ¹ææ¯ææ¡£ã

å¼ç¨

@misc{glm2024chatglm,
      title={ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools}, 
      author={Team GLM and Aohan Zeng and Bin Xu and Bowen Wang and Chenhui Zhang and Da Yin and Diego Rojas and Guanyu Feng and Hanlin Zhao and Hanyu Lai and Hao Yu and Hongning Wang and Jiadai Sun and Jiajie Zhang and Jiale Cheng and Jiayi Gui and Jie Tang and Jing Zhang and Juanzi Li and Lei Zhao and Lindong Wu and Lucen Zhong and Mingdao Liu and Minlie Huang and Peng Zhang and Qinkai Zheng and Rui Lu and Shuaiqi Duan and Shudan Zhang and Shulin Cao and Shuxun Yang and Weng Lam Tam and Wenyi Zhao and Xiao Liu and Xiao Xia and Xiaohan Zhang and Xiaotao Gu and Xin Lv and Xinghan Liu and Xinyi Liu and Xinyue Yang and Xixuan Song and Xunkai Zhang and Yifan An and Yifan Xu and Yilin Niu and Yuantao Yang and Yueyan Li and Yushi Bai and Yuxiao Dong and Zehan Qi and Zhaoyu Wang and Zhen Yang and Zhengxiao Du and Zhenyu Hou and Zihan Wang},
      year={2024},
      eprint={2406.12793},
      archivePrefix={arXiv},
      primaryClass={id='cs.CL' full_name='Computation and Language' is_active=True alt_name='cmp-lg' in_archive='cs' is_general=False description='Covers natural language processing. Roughly includes material in ACM Subject Class I.2.7. Note that work on artificial languages (programming languages, logics, formal systems) that does not explicitly address natural-language issues broadly construed (natural-language processing, computational linguistics, speech, text retrieval, etc.) is not appropriate for this area.'}
}

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

Top Related Projects

Quick Overview

Pros

Cons

Code Examples

Getting Started

Competitor Comparisons

Pros of ChatGLM-6B

Cons of ChatGLM-6B

Code Comparison

Pros of ChatGLM2-6B

Cons of ChatGLM2-6B

Code Comparison

Pros of FlagAI

Cons of FlagAI

Code Comparison

Pros of Qwen

Cons of Qwen

Code Comparison

Pros of Baichuan-13B

Cons of Baichuan-13B

Code Comparison

Pros of GLM-130B

Cons of GLM-130B

Code Comparison

Convert designs to code with AI

README

ChatGLM3

GLM-4 å¼æºæ¨¡ååAPI

ChatGLM3 ä»ç»

æ¨¡ååè¡¨

åæ é¾æ¥

è¯æµç»æ

å ¸åä»»å¡

ä½¿ç¨æ¹å¼

ç¯å¢å®è£

ç»¼å Demo

ä»£ç è°ç¨

ä»æ¬å°å è½½æ¨¡å

æ¨¡åå¾®è°

ç½é¡µçå¯¹è¯ Demo

å½ä»¤è¡å¯¹è¯ Demo

LangChain Demo

å·¥å ·è°ç¨

OpenAI API / Zhipu API Demo

ä½ææ¬é¨ç½²

æ¨¡åéå

CPU é¨ç½²

Mac é¨ç½²

å¤å¡é¨ç½²

OpenVINO Demo

TensorRT-LLM Demo

å¼ç¨

Top Related Projects

Convert designs to code with AI

GLM-4 å¼æºæ¨¡ååAPI

ChatGLM3 ä»ç»

æ¨¡ååè¡¨

åæé¾æ¥

è¯æµç»æ

å¸åä»»å¡

ä½¿ç¨æ¹å¼

ç¯å¢å®è£

ç»¼å Demo

ä»£ç è°ç¨

ä»æ¬å°å è½½æ¨¡å

æ¨¡åå¾®è°

ç½é¡µçå¯¹è¯ Demo

å½ä»¤è¡å¯¹è¯ Demo

å·¥å·è°ç¨

ä½ææ¬é¨ç½²

æ¨¡åéå

CPU é¨ç½²

Mac é¨ç½²

å¤å¡é¨ç½²

å¼ç¨