Convert Figma logo to code with AI

meta-llama logollama-cookbook

Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We also show you how to solve end to end problems using Llama model family and using them on various provider services

17,938
2,627
17,938
56

Top Related Projects

58,906

Inference code for Llama models

Code and documentation to train Stanford's Alpaca models, and generate the data.

Instruct-tune LLaMA on consumer hardware

89,484

LLM inference in C/C++

18,921

Inference Llama 2 in one file of pure C

39,243

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

Quick Overview

The llama-cookbook repository is a comprehensive guide and collection of resources for working with Meta's LLaMA (Large Language Model Meta AI) models. It provides examples, tutorials, and best practices for fine-tuning, deploying, and using LLaMA models in various applications.

Pros

  • Extensive documentation and examples for different use cases
  • Regularly updated with new features and improvements
  • Supports multiple frameworks and deployment options
  • Includes performance optimization techniques

Cons

  • Requires access to LLaMA model weights, which may not be available to everyone
  • Some advanced topics may be challenging for beginners
  • Limited to LLaMA models, not applicable to other language models

Code Examples

  1. Loading and using a LLaMA model:
from transformers import LlamaForCausalLM, LlamaTokenizer

model = LlamaForCausalLM.from_pretrained("path/to/llama/model")
tokenizer = LlamaTokenizer.from_pretrained("path/to/llama/tokenizer")

input_text = "Hello, how are you?"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids

output = model.generate(input_ids, max_length=50)
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(generated_text)
  1. Fine-tuning LLaMA on a custom dataset:
from transformers import Trainer, TrainingArguments

training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=3,
    per_device_train_batch_size=8,
    learning_rate=5e-5,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    data_collator=data_collator,
)

trainer.train()
  1. Deploying LLaMA with ONNX Runtime:
import onnxruntime as ort

ort_session = ort.InferenceSession("path/to/llama_model.onnx")

input_name = ort_session.get_inputs()[0].name
output_name = ort_session.get_outputs()[0].name

input_ids = tokenizer(input_text, return_tensors="np").input_ids
outputs = ort_session.run([output_name], {input_name: input_ids})

generated_text = tokenizer.decode(outputs[0][0], skip_special_tokens=True)
print(generated_text)

Getting Started

To get started with the llama-cookbook:

  1. Clone the repository:

    git clone https://github.com/meta-llama/llama-cookbook.git
    cd llama-cookbook
    
  2. Install dependencies:

    pip install -r requirements.txt
    
  3. Follow the examples and tutorials in the repository's README and Jupyter notebooks to start working with LLaMA models.

Competitor Comparisons

58,906

Inference code for Llama models

Pros of Llama

  • Contains the core model implementation and training code
  • Provides direct access to the model architecture and parameters
  • Allows for fine-tuning and customization of the base model

Cons of Llama

  • Requires more technical expertise to use effectively
  • Less documentation and examples for quick start and common use cases
  • Heavier resource requirements for running and training

Code Comparison

Llama (model definition):

class Transformer(nn.Module):
    def __init__(self, params: ModelArgs):
        super().__init__()
        self.params = params
        self.vocab_size = params.vocab_size
        self.n_layers = params.n_layers

Llama Cookbook (usage example):

from llama import Llama

llm = Llama.build(
    ckpt_dir="llama-2-7b/",
    tokenizer_path="tokenizer.model",
    max_seq_len=512,
    max_batch_size=8,
)

Key Differences

  • Llama focuses on the core model implementation, while Llama Cookbook provides practical examples and tutorials
  • Llama Cookbook offers more accessible entry points for developers new to LLMs
  • Llama is better suited for researchers and advanced users looking to modify the model architecture
  • Llama Cookbook emphasizes ease of use and integration into existing projects

Code and documentation to train Stanford's Alpaca models, and generate the data.

Pros of Stanford Alpaca

  • Focuses on fine-tuning LLaMA models for instruction-following tasks
  • Provides a more specific and targeted approach to model improvement
  • Includes a dataset of 52K instruction-following demonstrations

Cons of Stanford Alpaca

  • Limited scope compared to the broader LLaMA Cookbook
  • Less comprehensive documentation and examples
  • Primarily centered around a single fine-tuning technique

Code Comparison

Stanford Alpaca:

def generate_prompt(instruction, input=None):
    if input:
        return f"Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Input:\n{input}\n\n### Response:"
    else:
        return f"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response:"

LLaMA Cookbook:

def format_prompt(prompt, system_prompt=DEFAULT_SYSTEM_PROMPT):
    return f"""[INST] <<SYS>>
{system_prompt}
<</SYS>>

{prompt} [/INST]"""

The Stanford Alpaca code focuses on generating prompts for instruction-following tasks, while the LLaMA Cookbook example demonstrates a more general prompt formatting approach with system prompts.

Instruct-tune LLaMA on consumer hardware

Pros of Alpaca-LoRA

  • Focuses specifically on fine-tuning LLaMA models using LoRA technique
  • Provides a streamlined approach for creating custom language models
  • Includes scripts for inference and evaluation of fine-tuned models

Cons of Alpaca-LoRA

  • Limited scope compared to the broader LLaMA Cookbook
  • May require more technical expertise to implement effectively
  • Less comprehensive documentation and examples

Code Comparison

Alpaca-LoRA (fine-tuning script):

lora_model = PeftModel.from_pretrained(
    model,
    lora_weights,
    torch_dtype=torch.float16,
)
lora_model.eval()

LLaMA Cookbook (loading script):

model = LlamaForCausalLM.from_pretrained(
    model_path,
    load_in_8bit=True,
    torch_dtype=torch.float16,
    device_map="auto",
)

Both repositories provide valuable resources for working with LLaMA models. Alpaca-LoRA offers a more specialized approach to fine-tuning, while the LLaMA Cookbook covers a broader range of topics and use cases. The choice between them depends on the specific needs of the project and the user's level of expertise in working with large language models.

89,484

LLM inference in C/C++

Pros of llama.cpp

  • Optimized C/C++ implementation for efficient inference on various hardware
  • Supports quantization for reduced memory usage and faster inference
  • Includes command-line interface for easy model interaction

Cons of llama.cpp

  • Focused primarily on inference, less emphasis on training or fine-tuning
  • May require more technical expertise to set up and use effectively
  • Limited built-in support for higher-level NLP tasks

Code Comparison

llama.cpp:

int main(int argc, char ** argv) {
    gpt_params params;
    if (gpt_params_parse(argc, argv, params) == false) {
        return 1;
    }
    llama_init_backend();
    ...
}

llama-cookbook:

def load_model(model_id, device_map="auto"):
    model = AutoModelForCausalLM.from_pretrained(
        model_id,
        device_map=device_map,
        torch_dtype=torch.float16,
        load_in_8bit=True,
    )
    return model

The llama.cpp example shows low-level C++ code for initializing the model, while the llama-cookbook example demonstrates high-level Python code using the Transformers library for model loading.

18,921

Inference Llama 2 in one file of pure C

Pros of llama2.c

  • Lightweight and minimalistic implementation in C
  • Focuses on inference, making it easier to understand and modify
  • Designed for running on CPU, suitable for resource-constrained environments

Cons of llama2.c

  • Limited features compared to the comprehensive Llama Cookbook
  • Less documentation and examples for various use cases
  • Primarily targets inference, lacking training and fine-tuning capabilities

Code Comparison

llama2.c:

float* forward(Transformer* transformer, int token, int pos) {
    float* x = transformer->tok_embeddings + token * transformer->dim;
    for (int l = 0; l < transformer->n_layers; l++) {
        // ... (attention and feedforward operations)
    }
    return x;
}

Llama Cookbook (Python example):

def forward(self, tokens: torch.Tensor, start_pos: int):
    _bsz, seqlen = tokens.shape
    h = self.tok_embeddings(tokens)
    for layer in self.layers:
        h = layer(h, start_pos)
    h = self.norm(h)
    return self.output(h[:, -1, :])  # only return the last logits

The code comparison shows that llama2.c implements the forward pass in C, focusing on low-level operations, while the Llama Cookbook provides a higher-level Python implementation using PyTorch, offering more abstraction and flexibility.

39,243

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

Pros of FastChat

  • Focuses on building and serving chatbots, providing a more specialized toolkit for conversational AI
  • Includes a web UI for easy interaction with chatbots
  • Offers multi-model support, allowing users to work with various language models

Cons of FastChat

  • Less comprehensive documentation compared to Llama Cookbook
  • Narrower scope, primarily centered on chatbot applications
  • May require more setup and configuration for specific use cases

Code Comparison

FastChat example (model loading):

from fastchat.model import load_model

model, tokenizer = load_model("vicuna-7b", device="cuda", num_gpus=1)

Llama Cookbook example (model loading):

from transformers import LlamaForCausalLM, LlamaTokenizer

model = LlamaForCausalLM.from_pretrained("path/to/llama/model")
tokenizer = LlamaTokenizer.from_pretrained("path/to/llama/tokenizer")

Both repositories provide tools for working with large language models, but they serve different purposes. FastChat is more focused on building and deploying chatbots, while Llama Cookbook offers a broader range of examples and tutorials for working with the Llama model family. The choice between the two depends on the specific requirements of your project and the level of customization you need.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Llama Cookbook

Llama Model cards Llama Documentation Hugging Face meta-llama

Llama Tools Syntethic Data Kit Llama Tools Syntethic Data Kit

Official Guide to building with Llama

Welcome to the official repository for helping you get started with inference, fine-tuning and end-to-end use-cases of building with the Llama Model family.

This repository covers the most popular community approaches, use-cases and the latest recipes for Llama Text and Vision models.

Latest Llama 4 recipes

Repository Structure:

  • 3P Integrations: Getting Started Recipes and End to End Use-Cases from various Llama providers
  • End to End Use Cases: As the name suggests, spanning various domains and applications
  • Getting Started: Reference for inferencing, fine-tuning and RAG examples
  • src: Contains the src for the original llama-recipes library along with some FAQs for fine-tuning.

Note: We recently did a refactor of the repo, archive-main is a snapshot branch from before the refactor

FAQ:

  • Q: What happened to llama-recipes? A: We recently renamed llama-recipes to llama-cookbook.

  • Q: I have some questions for Fine-Tuning, is there a section to address these? A: Check out the Fine-Tuning FAQ here.

  • Q: Some links are broken/folders are missing: A: We recently did a refactor of the repo, archive-main is a snapshot branch from before the refactor.

  • Q: Where can we find details about the latest models? A: Official Llama models website.

Contributing

Please read CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests to us.

License

See the License file for Meta Llama 4 here and Acceptable Use Policy here

See the License file for Meta Llama 3.3 here and Acceptable Use Policy here

See the License file for Meta Llama 3.2 here and Acceptable Use Policy here

See the License file for Meta Llama 3.1 here and Acceptable Use Policy here

See the License file for Meta Llama 3 here and Acceptable Use Policy here

See the License file for Meta Llama 2 here and Acceptable Use Policy here