Convert Figma logo to code with AI

eugeneyan logoopen-llms

📋 A list of open LLMs available for commercial use.

12,588
946
12,588
5

Top Related Projects

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

41,188

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries

58,906

Inference code for Llama models

39,797

TensorFlow code and pre-trained models for BERT

Quick Overview

The eugeneyan/open-llms repository is a curated list of open Large Language Models (LLMs) available for commercial use. It provides an overview of various open-source LLMs, their characteristics, and licensing information, serving as a valuable resource for developers and researchers interested in working with these models.

Pros

  • Comprehensive list of open LLMs with key details
  • Regular updates to include new models and information
  • Clear licensing information for each model
  • Includes links to model repositories and relevant papers

Cons

  • Limited technical details about model architecture and performance
  • No direct code examples or implementation guidance
  • May not include all available open LLMs
  • Requires users to navigate to external sources for more in-depth information

Code Examples

This repository is not a code library, so code examples are not applicable.

Getting Started

This repository is a curated list and does not require installation or setup. To use the information:

  1. Visit the repository at https://github.com/eugeneyan/open-llms
  2. Browse the table of open LLMs
  3. Click on the provided links for more information about specific models
  4. Check the licensing information before using any model in your project

Competitor Comparisons

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Pros of transformers

  • Comprehensive library with support for numerous models and tasks
  • Extensive documentation and community support
  • Regular updates and new model implementations

Cons of transformers

  • Larger codebase, potentially more complex for beginners
  • May include unnecessary features for users focused solely on LLMs

Code comparison

transformers:

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer = AutoTokenizer.from_pretrained("gpt2")

open-llms:

# No direct code implementation; primarily a curated list of open-source LLMs

Summary

transformers is a comprehensive library for various NLP tasks, including but not limited to LLMs. It offers extensive functionality, documentation, and community support. However, its broad scope may be overwhelming for users specifically interested in LLMs.

open-llms, on the other hand, is a curated list of open-source LLMs. It doesn't provide direct code implementation but serves as a valuable resource for discovering and comparing available open-source language models.

While transformers offers a complete toolkit for working with models, open-llms focuses on providing an overview of available open-source LLMs, making it easier for users to find and evaluate suitable models for their projects.

41,188

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Pros of DeepSpeed

  • Comprehensive optimization toolkit for deep learning
  • Supports distributed training and inference
  • Actively maintained with frequent updates

Cons of DeepSpeed

  • Steeper learning curve due to complexity
  • Requires more setup and configuration
  • Primarily focused on performance optimization

Code Comparison

DeepSpeed:

import deepspeed
model_engine, optimizer, _, _ = deepspeed.initialize(
    args=args,
    model=model,
    model_parameters=params
)

open-llms:

# No specific code implementation
# Primarily a curated list of open-source LLMs

Summary

DeepSpeed is a powerful toolkit for optimizing deep learning models, offering advanced features for distributed training and inference. It's actively maintained but requires more setup and expertise to use effectively. In contrast, open-llms is a curated list of open-source language models, serving as a reference rather than a tool. While DeepSpeed provides code for implementation, open-llms focuses on cataloging available models without specific code examples.

An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries

Pros of gpt-neox

  • Comprehensive codebase for training large language models from scratch
  • Includes advanced features like distributed training and model parallelism
  • Actively maintained with regular updates and improvements

Cons of gpt-neox

  • Steeper learning curve due to its complexity and advanced features
  • Requires significant computational resources for training large models
  • More focused on model training rather than providing a curated list of pre-trained models

Code comparison

gpt-neox:

from megatron.neox_arguments import NeoXArgs
from megatron.global_vars import set_global_variables, get_tokenizer
from megatron.training import pretrain
args_defaults = NeoXArgs.from_ymls(["configs/your_config.yml"])

open-llms:

| Model | Parameters | Context | Architecture | License |
|-------|------------|---------|--------------|---------|
| GPT-J | 6B | 2048 | GPT-3 | Apache 2.0 |
| BLOOM | 176B | 2048 | BLOOM | Responsible AI |

The gpt-neox repository provides a complete framework for training large language models, while open-llms serves as a curated list of open-source language models with their specifications. gpt-neox is more suitable for researchers and developers looking to train custom models, whereas open-llms is a valuable resource for those seeking information about existing open-source models.

58,906

Inference code for Llama models

Pros of Llama

  • Official repository for Meta's Llama model, providing direct access to the latest updates and resources
  • Includes comprehensive documentation and examples for using Llama in various applications
  • Offers pre-trained models and fine-tuning scripts for specific tasks

Cons of Llama

  • Limited to Llama models only, while open-llms covers a wider range of open-source language models
  • Requires approval and licensing for access, unlike the open-llms repository which is freely accessible
  • May have stricter usage restrictions compared to the models listed in open-llms

Code Comparison

Llama example:

from llama import Llama

model = Llama(model_path="path/to/model.pth")
output = model.generate("Hello, how are you?")
print(output)

open-llms doesn't provide direct code examples, as it's a curated list of open-source LLMs. Users would need to refer to the specific model repositories for implementation details.

Summary

Llama is the official repository for Meta's Llama model, offering direct access to resources and updates. However, it's limited to Llama models and requires approval for access. open-llms, on the other hand, provides a comprehensive list of various open-source language models, offering more flexibility but without direct implementation examples.

39,797

TensorFlow code and pre-trained models for BERT

Pros of BERT

  • Developed by Google Research, offering high credibility and extensive documentation
  • Focuses on a specific NLP model (BERT), providing in-depth implementation details
  • Includes pre-trained models and fine-tuning scripts for various tasks

Cons of BERT

  • Limited to BERT architecture, not covering other LLM types
  • Less frequently updated compared to the open-llms repository
  • Primarily research-oriented, which may be less accessible for practical applications

Code Comparison

BERT example (model initialization):

from transformers import BertModel, BertTokenizer

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased')

open-llms doesn't provide code examples as it's a curated list of open-source LLMs.

Summary

BERT is a focused repository for the BERT model, offering detailed implementation and pre-trained models. open-llms serves as a comprehensive list of various open-source LLMs, providing a broader overview of available models without specific implementations. BERT is ideal for those working specifically with BERT, while open-llms is better for exploring different LLM options.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Open LLMs

These LLMs (Large Language Models) are all licensed for commercial use (e.g., Apache 2.0, MIT, OpenRAIL-M). Contributions welcome!

Language ModelRelease DateCheckpointsPaper/BlogParams (B)Context LengthLicenceTry it
T52019/10T5 & Flan-T5, Flan-T5-xxl (HF)Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer0.06 - 11512Apache 2.0T5-Large
RWKV 42021/08RWKV, ChatRWKVThe RWKV Language Model (and my LM tricks)0.1 - 14infinity (RNN)Apache 2.0
GPT-NeoX-20B2022/04GPT-NEOX-20BGPT-NeoX-20B: An Open-Source Autoregressive Language Model202048Apache 2.0
YaLM-100B2022/06yalm-100bYandex publishes YaLM 100B, the largest GPT-like neural network in open source1001024Apache 2.0
UL22022/10UL2 & Flan-UL2, Flan-UL2 (HF)UL2 20B: An Open Source Unified Language Learner20512, 2048Apache 2.0
Bloom2022/11BloomBLOOM: A 176B-Parameter Open-Access Multilingual Language Model1762048OpenRAIL-M v1
ChatGLM2023/03chatglm-6bChatGLM, Github62048Custom Free with some usage restriction (might require registration)
Cerebras-GPT2023/03Cerebras-GPTCerebras-GPT: A Family of Open, Compute-efficient, Large Language Models (Paper)0.111 - 132048Apache 2.0Cerebras-GPT-1.3B
Open Assistant (Pythia family)2023/03OA-Pythia-12B-SFT-8, OA-Pythia-12B-SFT-4, OA-Pythia-12B-SFT-1Democratizing Large Language Model Alignment122048Apache 2.0Pythia-2.8B
Pythia2023/04pythia 70M - 12BPythia: A Suite for Analyzing Large Language Models Across Training and Scaling0.07 - 122048Apache 2.0
Dolly2023/04dolly-v2-12bFree Dolly: Introducing the World's First Truly Open Instruction-Tuned LLM3, 7, 122048MIT
StableLM-Alpha2023/04StableLM-AlphaStability AI Launches the First of its StableLM Suite of Language Models3 - 654096CC BY-SA-4.0
FastChat-T52023/04fastchat-t5-3b-v1.0We are excited to release FastChat-T5: our compact and commercial-friendly chatbot!3512Apache 2.0
DLite2023/05dlite-v2-1_5bAnnouncing DLite V2: Lightweight, Open LLMs That Can Run Anywhere0.124 - 1.51024Apache 2.0DLite-v2-1.5B
h2oGPT2023/05h2oGPTBuilding the World’s Best Open-Source Large Language Model: H2O.ai’s Journey12 - 20256 - 2048Apache 2.0
MPT-7B2023/05MPT-7B, MPT-7B-InstructIntroducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs784k (ALiBi)Apache 2.0, CC BY-SA-3.0
RedPajama-INCITE2023/05RedPajama-INCITEReleasing 3B and 7B RedPajama-INCITE family of models including base, instruction-tuned & chat models3 - 72048Apache 2.0RedPajama-INCITE-Instruct-3B-v1
OpenLLaMA2023/05open_llama_3b, open_llama_7b, open_llama_13bOpenLLaMA: An Open Reproduction of LLaMA3, 72048Apache 2.0OpenLLaMA-7B-Preview_200bt
Falcon2023/05Falcon-180B, Falcon-40B, Falcon-7BThe RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only180, 40, 72048Apache 2.0
GPT-J-6B2023/06GPT-J-6B, GPT4All-JGPT-J-6B: 6B JAX-Based Transformer62048Apache 2.0
MPT-30B2023/06MPT-30B, MPT-30B-instructMPT-30B: Raising the bar for open-source foundation models308192Apache 2.0, CC BY-SA-3.0MPT 30B inference code using CPU
LLaMA 22023/06LLaMA 2 Weights Llama 2: Open Foundation and Fine-Tuned Chat Models7 - 704096Custom Free if you have under 700M users and you cannot use LLaMA outputs to train other LLMs besides LLaMA and its derivativesHuggingChat
ChatGLM22023/06chatglm2-6bChatGLM2-6B, Github632kCustom Free with some usage restriction (might require registration)
XGen-7B2023/06xgen-7b-4k-base, xgen-7b-8k-baseLong Sequence Modeling with XGen74096, 8192Apache 2.0
Jais-13b2023/08jais-13b, jais-13b-chatJais and Jais-chat: Arabic-Centric Foundation and Instruction-Tuned Open Generative Large Language Models132048Apache 2.0
OpenHermes2023/09OpenHermes-7B, OpenHermes-13BNous Research7, 134096MITOpenHermes-V2 Finetuned on Mistral 7B
OpenLM2023/09OpenLM 1B, OpenLM 7B Open LM: a minimal but performative language modeling (LM) repository1, 72048MIT
Mistral 7B2023/09Mistral-7B-v0.1, Mistral-7B-Instruct-v0.1Mistral 7B74096-16K with Sliding WindowsApache 2.0Mistral Transformer
ChatGLM32023/10chatglm3-6b, chatglm3-6b-base, chatglm3-6b-32k, chatglm3-6b-128kChatGLM368192, 32k, 128kCustom Free with some usage restriction (might require registration)
Skywork2023/10Skywork-13B-Base, Skywork-13B-MathSkywork134096Custom Free with usage restriction and models trained on Skywork outputs become Skywork derivatives, subject to this license.
Jais-30b2023/11jais-30b-v1, jais-30b-chat-v1Jais-30B: Expanding the Horizon in Open-Source Arabic NLP302048Apache 2.0
Zephyr2023/11Zephyr 7BWebsite78192Apache 2.0
DeepSeek2023/11deepseek-llm-7b-base, deepseek-llm-7b-chat, deepseek-llm-67b-base, deepseek-llm-67b-chatIntroducing DeepSeek LLM,7, 674096Custom Free with usage restriction and models trained on DeepSeek outputs become DeepSeek derivatives, subject to this license.
Mistral 7B v0.22023/12Mistral-7B-v0.2, Mistral-7B-Instruct-v0.2La Plateforme732kApache 2.0
Mixtral 8x7B v0.12023/12Mixtral-8x7B-v0.1, Mixtral-8x7B-Instruct-v0.1Mixtral of experts46.732kApache 2.0
LLM360 Amber2023/12Amber, AmberChat, AmberSafeIntroducing LLM360: Fully Transparent Open-Source LLMs6.72048Apache 2.0
SOLAR2023/12Solar-10.7BUpstage10.74096apache-2.0
phi-22023/12phi-2 2.7BMicrosoft2.72048MIT
FLOR2023/12FLOR-760M, FLOR-1.3B, FLOR-1.3B-Instructed, FLOR-6.3B, FLOR-6.3B-InstructedFLOR-6.3B: a chinchilla-compliant model for Catalan, Spanish and English0.76, 1.3, 6.32048Apache 2.0 with usage restriction inherited from BLOOM
RWKV 5 v22024/01rwkv-5-world-0.4b-2, rwkv-5-world-1.5b-2, rwkv-5-world-3b-2, rwkv-5-world-3b-2(16k), rwkv-5-world-7b-2RWKV 50.4, 1.5, 3, 7unlimited(RNN), trained on 4096 (and 16k for 3b)Apache 2.0
OLMo2024/02OLMo 1B, OLMo 7B, OLMo 7B Twin 2TAI21,72048Apache 2.0
Qwen1.52024/02Qwen1.5-7B, Qwen1.5-7B-Chat, Qwen1.5-14B, Qwen1.5-14B-Chat, Qwen1.5-72B, Qwen1.5-72B-ChatIntroducing Qwen1.57, 14, 7232kCustom Free if you have under 100M users and you cannot use Qwen outputs to train other LLMs besides Qwen and its derivatives
LWM2024/02LWM-Text-Chat-128K, LWM-Text-Chat-256K, LWM-Text-Chat-512K, LWM-Text-Chat-1M, LWM-Text-128K, LWM-Text-256K, LWM-Text-512K, LWM-Text-1MLarge World Model (LWM)7128k, 256k, 512k, 1MLLaMA 2 license
Jais-30b v32024/03jais-30b-v3, jais-30b-chat-v3Jais 30b v3308192Apache 2.0
Gemma2024/02Gemma 7B, Gemma 7B it, Gemma 2B, Gemma 2B itTechnical report2-78192Gemma Terms of Use Free with usage restriction and models trained on Gemma outputs become Gemma derivatives, subject to this license.
Grok-12024/03Grok-1Open Release of Grok-13148192Apache 2.0
Qwen1.5 MoE2024/03Qwen1.5-MoE-A2.7B, Qwen1.5-MoE-A2.7B-ChatQwen1.5-MoE: Matching 7B Model Performance with 1/3 Activated Parameters14.38192Custom Free if you have under 100M users and you cannot use Qwen outputs to train other LLMs besides Qwen and its derivatives
Jamba 0.12024/03Jamba-v0.1Introducing Jamba: AI21's Groundbreaking SSM-Transformer Model52256kApache 2.0
Qwen1.5 32B2024/04Qwen1.5-32B, Qwen1.5-32B-ChatQwen1.5-32B: Fitting the Capstone of the Qwen1.5 Language Model Series3232kCustom Free if you have under 100M users and you cannot use Qwen outputs to train other LLMs besides Qwen and its derivatives
Mamba-7B2024/04mamba-7b-rwToyota Research Institute7unlimited(RNN), trained on 2048Apache 2.0
Mixtral8x22B v0.12024/04Mixtral-8x22B-v0.1, Mixtral-8x22B-Instruct-v0.1Cheaper, Better, Faster, Stronger14164kApache 2.0
Llama 32024/04Llama-3-8B, Llama-3-8B-Instruct, Llama-3-70B, Llama-3-70B-Instruct, Llama-Guard-2-8BIntroducing Meta Llama 3, Meta Llama 38, 708192Meta Llama 3 Community License Agreement Free if you have under 700M users and you cannot use LLaMA 3 outputs to train other LLMs besides LLaMA 3 and its derivatives
Phi-3 Mini2024/04Phi-3-mini-4k-instruct, Phi-3-mini-128k-instructIntroducing Phi-3, Technical Report3.84096, 128kMIT
OpenELM2024/04OpenELM-270M, OpenELM-270M-Instruct, OpenELM-450M, OpenELM-450M-Instruct, OpenELM-1_1B, OpenELM-1_1B-Instruct, OpenELM-3B, OpenELM-3B-InstructOpenELM: An Efficient Language Model Family with Open Training and Inference Framework0.27, 0.45, 1.1, 32048Custom open license No usage or training restrictions
Snowflake Arctic2024/04snowflake-arctic-base, snowflake-arctic-instructSnowflake Arctic: The Best LLM for Enterprise AI — Efficiently Intelligent, Truly Open4804096Apache 2.0
Qwen1.5 110B2024/04Qwen1.5-110B, Qwen1.5-110B-ChatQwen1.5-110B: The First 100B+ Model of the Qwen1.5 Series11032kCustom Free if you have under 100M users and you cannot use Qwen outputs to train other LLMs besides Qwen and its derivatives
RWKV 6 v2.12024/05rwkv-6-world-1.6b-2.1, rwkv-6-world-3b-2.1, rwkv-6-world-7b-2.1RWKV 61.6, 3, 7unlimited(RNN), trained on 4096Apache 2.0
DeepSeek-V22024/05DeepSeek-V2, DeepSeek-V2-ChatDeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model236128kCustom Free with usage restriction and models trained on DeepSeek outputs become DeepSeek derivatives, subject to this license.
Fugaku-LLM2024/05Fugaku-LLM-13B, Fugaku-LLM-13B-instructRelease of "Fugaku-LLM" – a large language model trained on the supercomputer "Fugaku"132048Custom Free with usage restrictions
Falcon 22024/05falcon2-11BMeet Falcon 2: TII Releases New AI Model Series, Outperforming Meta’s New Llama 3118192Custom Apache 2.0 with mild acceptable use policy
Yi-1.52024/05Yi-1.5-6B, Yi-1.5-6B-Chat, Yi-1.5-9B, Yi-1.5-9B-Chat, Yi-1.5-34B, Yi-1.5-34B-ChatYi-1.56, 9, 344096Apache 2.0
DeepSeek-V2-Lite2024/05DeepSeek-V2-Lite, DeepSeek-V2-Lite-ChatDeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model1632kCustom Free with usage restriction and models trained on DeepSeek outputs become DeepSeek derivatives, subject to this license.
Phi-3 small/medium2024/05Phi-3-mini-4k-instruct, Phi-3-mini-128k-instruct, Phi-3-medium-4k-instruct, Phi-3-medium-128k-instructNew models added to the Phi-3 family, available on Microsoft Azure, Technical Report7, 144096, 128kMIT
Phi-42024/12Phi-4Introducing Phi-4: Microsoft’s Newest Small Language Model Specializing in Complex Reasoning, Technical Report144096MIT
YuLan-Mini2024/12YuLan-MiniYuLan-Mini: An Open Data-efficient Language Model, GitHub1428672MITYuLan-Mini
Selene Mini2025/01Selene MiniAtla Selene Mini: A General Purpose Evaluation Model, GitHub8128KApache 2.0Hugging Face Space

Open LLMs for code

Language ModelRelease DateCheckpointsPaper/BlogParams (B)Context LengthLicenceTry it
SantaCoder2023/01santacoderSantaCoder: don't reach for the stars!1.12048OpenRAIL-M v1SantaCoder
CodeGen22023/04codegen2 1B-16BCodeGen2: Lessons for Training LLMs on Programming and Natural Languages1 - 162048Apache 2.0
StarCoder2023/05starcoderStarCoder: A State-of-the-Art LLM for Code, StarCoder: May the source be with you!1.1-158192OpenRAIL-M v1
StarChat Alpha2023/05starchat-alphaCreating a Coding Assistant with StarCoder168192OpenRAIL-M v1
Replit Code2023/05replit-code-v1-3bTraining a SOTA Code LLM in 1 week and Quantifying the Vibes — with Reza Shabani of Replit2.7infinity? (ALiBi)CC BY-SA-4.0Replit-Code-v1-3B
CodeT5+2023/05CodeT5+CodeT5+: Open Code Large Language Models for Code Understanding and Generation0.22 - 16512BSD-3-ClauseCodet5+-6B
XGen-7B2023/06XGen-7B-8K-BaseLong Sequence Modeling with XGen: A 7B LLM Trained on 8K Input Sequence Length78192Apache 2.0
CodeGen2.52023/07CodeGen2.5-7B-multiCodeGen2.5: Small, but mighty72048Apache 2.0
DeciCoder-1B2023/08DeciCoder-1BIntroducing DeciCoder: The New Gold Standard in Efficient and Accurate Code Generation1.12048Apache 2.0DeciCoder Demo
Code Llama2023/08Inference Code for CodeLlama models Code Llama: Open Foundation Models for Code7 - 344096Custom Free if you have under 700M users and you cannot use LLaMA outputs to train other LLMs besides LLaMA and its derivativesHuggingChat

Open LLM datasets for pre-training

NameRelease DatePaper/BlogDatasetTokens (T)License
RedPajama2023/04RedPajama, a project to create leading open-source models, starts by reproducing LLaMA training dataset of over 1.2 trillion tokensRedPajama-Data1.2Apache 2.0
starcoderdata2023/05StarCoder: A State-of-the-Art LLM for Codestarcoderdata0.25Apache 2.0

Open LLM datasets for instruction-tuning

NameRelease DatePaper/BlogDatasetSamples (K)License
OIG (Open Instruction Generalist)2023/03THE OIG DATASETOIG44,000Apache 2.0
databricks-dolly-15k2023/04Free Dolly: Introducing the World's First Truly Open Instruction-Tuned LLMdatabricks-dolly-15k15CC BY-SA-3.0
MPT-7B-Instruct2023/05Introducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMsdolly_hhrlhf59CC BY-SA-3.0

Open LLM datasets for alignment-tuning

NameRelease DatePaper/BlogDatasetSamples (K)License
OpenAssistant Conversations Dataset2023/04OpenAssistant Conversations - Democratizing Large Language Model Alignmentoasst1161Apache 2.0

Evals on open LLMs


What do the licences mean?

  • Apache 2.0: Allows users to use the software for any purpose, to distribute it, to modify it, and to distribute modified versions of the software under the terms of the license, without concern for royalties.
  • MIT: Similar to Apache 2.0 but shorter and simpler. Also, in contrast to Apache 2.0, does not require stating any significant changes to the original code.
  • CC BY-SA-4.0: Allows (i) copying and redistributing the material and (ii) remixing, transforming, and building upon the material for any purpose, even commercially. But if you do the latter, you must distribute your contributions under the same license as the original. (Thus, may not be viable for internal teams.)
  • OpenRAIL-M v1: Allows royalty-free access and flexible downstream use and sharing of the model and modifications of it, and comes with a set of use restrictions (see Attachment A)
  • BSD-3-Clause: This version allows unlimited redistribution for any purpose as long as its copyright notices and the license's disclaimers of warranty are maintained.

Disclaimer: The information provided in this repo does not, and is not intended to, constitute legal advice. Maintainers of this repo are not responsible for the actions of third parties who use the models. Please consult an attorney before using models for commercial purposes.


Improvements

  • Complete entries for context length, and check entries with ?
  • Add number of tokens trained? (see considerations)
  • Add (links to) training code?
  • Add (links to) eval benchmarks?