llama-cookbook

Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We also show you how to solve end to end problems using Llama model family and using them on various provider services

17,938

2,627

17,938

View on GitHub

Top Related Projects

llama

58,906

Inference code for Llama models

stanford_alpaca

30,211

Code and documentation to train Stanford's Alpaca models, and generate the data.

alpaca-lora

18,976

Instruct-tune LLaMA on consumer hardware

llama.cpp

89,484

LLM inference in C/C++

llama2.c

18,921

Inference Llama 2 in one file of pure C

FastChat

39,243

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

Quick Overview

The llama-cookbook repository is a comprehensive guide and collection of resources for working with Meta's LLaMA (Large Language Model Meta AI) models. It provides examples, tutorials, and best practices for fine-tuning, deploying, and using LLaMA models in various applications.

Pros

Extensive documentation and examples for different use cases
Regularly updated with new features and improvements
Supports multiple frameworks and deployment options
Includes performance optimization techniques

Cons

Requires access to LLaMA model weights, which may not be available to everyone
Some advanced topics may be challenging for beginners
Limited to LLaMA models, not applicable to other language models

Code Examples

Loading and using a LLaMA model:

from transformers import LlamaForCausalLM, LlamaTokenizer

model = LlamaForCausalLM.from_pretrained("path/to/llama/model")
tokenizer = LlamaTokenizer.from_pretrained("path/to/llama/tokenizer")

input_text = "Hello, how are you?"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids

output = model.generate(input_ids, max_length=50)
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(generated_text)

Fine-tuning LLaMA on a custom dataset:

from transformers import Trainer, TrainingArguments

training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=3,
    per_device_train_batch_size=8,
    learning_rate=5e-5,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    data_collator=data_collator,
)

trainer.train()

Deploying LLaMA with ONNX Runtime:

import onnxruntime as ort

ort_session = ort.InferenceSession("path/to/llama_model.onnx")

input_name = ort_session.get_inputs()[0].name
output_name = ort_session.get_outputs()[0].name

input_ids = tokenizer(input_text, return_tensors="np").input_ids
outputs = ort_session.run([output_name], {input_name: input_ids})

generated_text = tokenizer.decode(outputs[0][0], skip_special_tokens=True)
print(generated_text)

Getting Started

To get started with the llama-cookbook:

Clone the repository:

git clone https://github.com/meta-llama/llama-cookbook.git
cd llama-cookbook

Install dependencies:
```
pip install -r requirements.txt
```
Follow the examples and tutorials in the repository's README and Jupyter notebooks to start working with LLaMA models.

Competitor Comparisons

llama

58,906

Inference code for Llama models

Pros of Llama

Contains the core model implementation and training code
Provides direct access to the model architecture and parameters
Allows for fine-tuning and customization of the base model

Cons of Llama

Requires more technical expertise to use effectively
Less documentation and examples for quick start and common use cases
Heavier resource requirements for running and training

Code Comparison

Llama (model definition):

class Transformer(nn.Module):
    def __init__(self, params: ModelArgs):
        super().__init__()
        self.params = params
        self.vocab_size = params.vocab_size
        self.n_layers = params.n_layers

Llama Cookbook (usage example):

from llama import Llama

llm = Llama.build(
    ckpt_dir="llama-2-7b/",
    tokenizer_path="tokenizer.model",
    max_seq_len=512,
    max_batch_size=8,
)

Key Differences

Llama focuses on the core model implementation, while Llama Cookbook provides practical examples and tutorials
Llama Cookbook offers more accessible entry points for developers new to LLMs
Llama is better suited for researchers and advanced users looking to modify the model architecture
Llama Cookbook emphasizes ease of use and integration into existing projects

stanford_alpaca

30,211

Code and documentation to train Stanford's Alpaca models, and generate the data.

Pros of Stanford Alpaca

Focuses on fine-tuning LLaMA models for instruction-following tasks
Provides a more specific and targeted approach to model improvement
Includes a dataset of 52K instruction-following demonstrations

Cons of Stanford Alpaca

Limited scope compared to the broader LLaMA Cookbook
Less comprehensive documentation and examples
Primarily centered around a single fine-tuning technique

Code Comparison

Stanford Alpaca:

def generate_prompt(instruction, input=None):
    if input:
        return f"Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Input:\n{input}\n\n### Response:"
    else:
        return f"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response:"

LLaMA Cookbook:

def format_prompt(prompt, system_prompt=DEFAULT_SYSTEM_PROMPT):
    return f"""[INST] <<SYS>>
{system_prompt}
<</SYS>>

{prompt} [/INST]"""

The Stanford Alpaca code focuses on generating prompts for instruction-following tasks, while the LLaMA Cookbook example demonstrates a more general prompt formatting approach with system prompts.

alpaca-lora

18,976

Instruct-tune LLaMA on consumer hardware

Pros of Alpaca-LoRA

Focuses specifically on fine-tuning LLaMA models using LoRA technique
Provides a streamlined approach for creating custom language models
Includes scripts for inference and evaluation of fine-tuned models

Cons of Alpaca-LoRA

Limited scope compared to the broader LLaMA Cookbook
May require more technical expertise to implement effectively
Less comprehensive documentation and examples

Code Comparison

Alpaca-LoRA (fine-tuning script):

lora_model = PeftModel.from_pretrained(
    model,
    lora_weights,
    torch_dtype=torch.float16,
)
lora_model.eval()

LLaMA Cookbook (loading script):

model = LlamaForCausalLM.from_pretrained(
    model_path,
    load_in_8bit=True,
    torch_dtype=torch.float16,
    device_map="auto",
)

Both repositories provide valuable resources for working with LLaMA models. Alpaca-LoRA offers a more specialized approach to fine-tuning, while the LLaMA Cookbook covers a broader range of topics and use cases. The choice between them depends on the specific needs of the project and the user's level of expertise in working with large language models.

llama.cpp

89,484

LLM inference in C/C++

Pros of llama.cpp

Optimized C/C++ implementation for efficient inference on various hardware
Supports quantization for reduced memory usage and faster inference
Includes command-line interface for easy model interaction

Cons of llama.cpp

Focused primarily on inference, less emphasis on training or fine-tuning
May require more technical expertise to set up and use effectively
Limited built-in support for higher-level NLP tasks

Code Comparison

llama.cpp:

int main(int argc, char ** argv) {
    gpt_params params;
    if (gpt_params_parse(argc, argv, params) == false) {
        return 1;
    }
    llama_init_backend();
    ...
}

llama-cookbook:

def load_model(model_id, device_map="auto"):
    model = AutoModelForCausalLM.from_pretrained(
        model_id,
        device_map=device_map,
        torch_dtype=torch.float16,
        load_in_8bit=True,
    )
    return model

The llama.cpp example shows low-level C++ code for initializing the model, while the llama-cookbook example demonstrates high-level Python code using the Transformers library for model loading.

llama2.c

18,921

Inference Llama 2 in one file of pure C

Pros of llama2.c

Lightweight and minimalistic implementation in C
Focuses on inference, making it easier to understand and modify
Designed for running on CPU, suitable for resource-constrained environments

Cons of llama2.c

Limited features compared to the comprehensive Llama Cookbook
Less documentation and examples for various use cases
Primarily targets inference, lacking training and fine-tuning capabilities

Code Comparison

llama2.c:

float* forward(Transformer* transformer, int token, int pos) {
    float* x = transformer->tok_embeddings + token * transformer->dim;
    for (int l = 0; l < transformer->n_layers; l++) {
        // ... (attention and feedforward operations)
    }
    return x;
}

Llama Cookbook (Python example):

def forward(self, tokens: torch.Tensor, start_pos: int):
    _bsz, seqlen = tokens.shape
    h = self.tok_embeddings(tokens)
    for layer in self.layers:
        h = layer(h, start_pos)
    h = self.norm(h)
    return self.output(h[:, -1, :])  # only return the last logits

The code comparison shows that llama2.c implements the forward pass in C, focusing on low-level operations, while the Llama Cookbook provides a higher-level Python implementation using PyTorch, offering more abstraction and flexibility.

FastChat

39,243

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

Pros of FastChat

Focuses on building and serving chatbots, providing a more specialized toolkit for conversational AI
Includes a web UI for easy interaction with chatbots
Offers multi-model support, allowing users to work with various language models

Cons of FastChat

Less comprehensive documentation compared to Llama Cookbook
Narrower scope, primarily centered on chatbot applications
May require more setup and configuration for specific use cases

Code Comparison

FastChat example (model loading):

from fastchat.model import load_model

model, tokenizer = load_model("vicuna-7b", device="cuda", num_gpus=1)

Llama Cookbook example (model loading):

from transformers import LlamaForCausalLM, LlamaTokenizer

model = LlamaForCausalLM.from_pretrained("path/to/llama/model")
tokenizer = LlamaTokenizer.from_pretrained("path/to/llama/tokenizer")

Both repositories provide tools for working with large language models, but they serve different purposes. FastChat is more focused on building and deploying chatbots, while Llama Cookbook offers a broader range of examples and tutorials for working with the Llama model family. The choice between the two depends on the specific requirements of your project and the level of customization you need.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Llama Cookbook

Official Guide to building with Llama

Welcome to the official repository for helping you get started with inference, fine-tuning and end-to-end use-cases of building with the Llama Model family.

This repository covers the most popular community approaches, use-cases and the latest recipes for Llama Text and Vision models.

Latest Llama 4 recipes

Get started with Llama API
Integrate Llama API with WhatsApp
5M long context using Llama 4 Scout
Analyze research papers with Llama 4 Maverick
Create a character mind map from a book using Llama 4 Maverick

Repository Structure:

3P Integrations: Getting Started Recipes and End to End Use-Cases from various Llama providers
End to End Use Cases: As the name suggests, spanning various domains and applications
Getting Started: Reference for inferencing, fine-tuning and RAG examples
src: Contains the src for the original llama-recipes library along with some FAQs for fine-tuning.

Note: We recently did a refactor of the repo, archive-main is a snapshot branch from before the refactor

FAQ:

Q: What happened to llama-recipes? A: We recently renamed llama-recipes to llama-cookbook.
Q: I have some questions for Fine-Tuning, is there a section to address these? A: Check out the Fine-Tuning FAQ here.
Q: Some links are broken/folders are missing: A: We recently did a refactor of the repo, archive-main is a snapshot branch from before the refactor.
Q: Where can we find details about the latest models? A: Official Llama models website.

Contributing

Please read CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests to us.

License

See the License file for Meta Llama 4 here and Acceptable Use Policy here

See the License file for Meta Llama 3.3 here and Acceptable Use Policy here

See the License file for Meta Llama 3.2 here and Acceptable Use Policy here

See the License file for Meta Llama 3.1 here and Acceptable Use Policy here

See the License file for Meta Llama 3 here and Acceptable Use Policy here

See the License file for Meta Llama 2 here and Acceptable Use Policy here

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot