llama-cookbook
Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We also show you how to solve end to end problems using Llama model family and using them on various provider services
Top Related Projects
Inference code for Llama models
Code and documentation to train Stanford's Alpaca models, and generate the data.
Instruct-tune LLaMA on consumer hardware
LLM inference in C/C++
Inference Llama 2 in one file of pure C
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Quick Overview
The llama-cookbook repository is a comprehensive guide and collection of resources for working with Meta's LLaMA (Large Language Model Meta AI) models. It provides examples, tutorials, and best practices for fine-tuning, deploying, and using LLaMA models in various applications.
Pros
- Extensive documentation and examples for different use cases
- Regularly updated with new features and improvements
- Supports multiple frameworks and deployment options
- Includes performance optimization techniques
Cons
- Requires access to LLaMA model weights, which may not be available to everyone
- Some advanced topics may be challenging for beginners
- Limited to LLaMA models, not applicable to other language models
Code Examples
- Loading and using a LLaMA model:
from transformers import LlamaForCausalLM, LlamaTokenizer
model = LlamaForCausalLM.from_pretrained("path/to/llama/model")
tokenizer = LlamaTokenizer.from_pretrained("path/to/llama/tokenizer")
input_text = "Hello, how are you?"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids
output = model.generate(input_ids, max_length=50)
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(generated_text)
- Fine-tuning LLaMA on a custom dataset:
from transformers import Trainer, TrainingArguments
training_args = TrainingArguments(
output_dir="./results",
num_train_epochs=3,
per_device_train_batch_size=8,
learning_rate=5e-5,
)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset,
data_collator=data_collator,
)
trainer.train()
- Deploying LLaMA with ONNX Runtime:
import onnxruntime as ort
ort_session = ort.InferenceSession("path/to/llama_model.onnx")
input_name = ort_session.get_inputs()[0].name
output_name = ort_session.get_outputs()[0].name
input_ids = tokenizer(input_text, return_tensors="np").input_ids
outputs = ort_session.run([output_name], {input_name: input_ids})
generated_text = tokenizer.decode(outputs[0][0], skip_special_tokens=True)
print(generated_text)
Getting Started
To get started with the llama-cookbook:
-
Clone the repository:
git clone https://github.com/meta-llama/llama-cookbook.git cd llama-cookbook -
Install dependencies:
pip install -r requirements.txt -
Follow the examples and tutorials in the repository's README and Jupyter notebooks to start working with LLaMA models.
Competitor Comparisons
Inference code for Llama models
Pros of Llama
- Contains the core model implementation and training code
- Provides direct access to the model architecture and parameters
- Allows for fine-tuning and customization of the base model
Cons of Llama
- Requires more technical expertise to use effectively
- Less documentation and examples for quick start and common use cases
- Heavier resource requirements for running and training
Code Comparison
Llama (model definition):
class Transformer(nn.Module):
def __init__(self, params: ModelArgs):
super().__init__()
self.params = params
self.vocab_size = params.vocab_size
self.n_layers = params.n_layers
Llama Cookbook (usage example):
from llama import Llama
llm = Llama.build(
ckpt_dir="llama-2-7b/",
tokenizer_path="tokenizer.model",
max_seq_len=512,
max_batch_size=8,
)
Key Differences
- Llama focuses on the core model implementation, while Llama Cookbook provides practical examples and tutorials
- Llama Cookbook offers more accessible entry points for developers new to LLMs
- Llama is better suited for researchers and advanced users looking to modify the model architecture
- Llama Cookbook emphasizes ease of use and integration into existing projects
Code and documentation to train Stanford's Alpaca models, and generate the data.
Pros of Stanford Alpaca
- Focuses on fine-tuning LLaMA models for instruction-following tasks
- Provides a more specific and targeted approach to model improvement
- Includes a dataset of 52K instruction-following demonstrations
Cons of Stanford Alpaca
- Limited scope compared to the broader LLaMA Cookbook
- Less comprehensive documentation and examples
- Primarily centered around a single fine-tuning technique
Code Comparison
Stanford Alpaca:
def generate_prompt(instruction, input=None):
if input:
return f"Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Input:\n{input}\n\n### Response:"
else:
return f"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response:"
LLaMA Cookbook:
def format_prompt(prompt, system_prompt=DEFAULT_SYSTEM_PROMPT):
return f"""[INST] <<SYS>>
{system_prompt}
<</SYS>>
{prompt} [/INST]"""
The Stanford Alpaca code focuses on generating prompts for instruction-following tasks, while the LLaMA Cookbook example demonstrates a more general prompt formatting approach with system prompts.
Instruct-tune LLaMA on consumer hardware
Pros of Alpaca-LoRA
- Focuses specifically on fine-tuning LLaMA models using LoRA technique
- Provides a streamlined approach for creating custom language models
- Includes scripts for inference and evaluation of fine-tuned models
Cons of Alpaca-LoRA
- Limited scope compared to the broader LLaMA Cookbook
- May require more technical expertise to implement effectively
- Less comprehensive documentation and examples
Code Comparison
Alpaca-LoRA (fine-tuning script):
lora_model = PeftModel.from_pretrained(
model,
lora_weights,
torch_dtype=torch.float16,
)
lora_model.eval()
LLaMA Cookbook (loading script):
model = LlamaForCausalLM.from_pretrained(
model_path,
load_in_8bit=True,
torch_dtype=torch.float16,
device_map="auto",
)
Both repositories provide valuable resources for working with LLaMA models. Alpaca-LoRA offers a more specialized approach to fine-tuning, while the LLaMA Cookbook covers a broader range of topics and use cases. The choice between them depends on the specific needs of the project and the user's level of expertise in working with large language models.
LLM inference in C/C++
Pros of llama.cpp
- Optimized C/C++ implementation for efficient inference on various hardware
- Supports quantization for reduced memory usage and faster inference
- Includes command-line interface for easy model interaction
Cons of llama.cpp
- Focused primarily on inference, less emphasis on training or fine-tuning
- May require more technical expertise to set up and use effectively
- Limited built-in support for higher-level NLP tasks
Code Comparison
llama.cpp:
int main(int argc, char ** argv) {
gpt_params params;
if (gpt_params_parse(argc, argv, params) == false) {
return 1;
}
llama_init_backend();
...
}
llama-cookbook:
def load_model(model_id, device_map="auto"):
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map=device_map,
torch_dtype=torch.float16,
load_in_8bit=True,
)
return model
The llama.cpp example shows low-level C++ code for initializing the model, while the llama-cookbook example demonstrates high-level Python code using the Transformers library for model loading.
Inference Llama 2 in one file of pure C
Pros of llama2.c
- Lightweight and minimalistic implementation in C
- Focuses on inference, making it easier to understand and modify
- Designed for running on CPU, suitable for resource-constrained environments
Cons of llama2.c
- Limited features compared to the comprehensive Llama Cookbook
- Less documentation and examples for various use cases
- Primarily targets inference, lacking training and fine-tuning capabilities
Code Comparison
llama2.c:
float* forward(Transformer* transformer, int token, int pos) {
float* x = transformer->tok_embeddings + token * transformer->dim;
for (int l = 0; l < transformer->n_layers; l++) {
// ... (attention and feedforward operations)
}
return x;
}
Llama Cookbook (Python example):
def forward(self, tokens: torch.Tensor, start_pos: int):
_bsz, seqlen = tokens.shape
h = self.tok_embeddings(tokens)
for layer in self.layers:
h = layer(h, start_pos)
h = self.norm(h)
return self.output(h[:, -1, :]) # only return the last logits
The code comparison shows that llama2.c implements the forward pass in C, focusing on low-level operations, while the Llama Cookbook provides a higher-level Python implementation using PyTorch, offering more abstraction and flexibility.
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Pros of FastChat
- Focuses on building and serving chatbots, providing a more specialized toolkit for conversational AI
- Includes a web UI for easy interaction with chatbots
- Offers multi-model support, allowing users to work with various language models
Cons of FastChat
- Less comprehensive documentation compared to Llama Cookbook
- Narrower scope, primarily centered on chatbot applications
- May require more setup and configuration for specific use cases
Code Comparison
FastChat example (model loading):
from fastchat.model import load_model
model, tokenizer = load_model("vicuna-7b", device="cuda", num_gpus=1)
Llama Cookbook example (model loading):
from transformers import LlamaForCausalLM, LlamaTokenizer
model = LlamaForCausalLM.from_pretrained("path/to/llama/model")
tokenizer = LlamaTokenizer.from_pretrained("path/to/llama/tokenizer")
Both repositories provide tools for working with large language models, but they serve different purposes. FastChat is more focused on building and deploying chatbots, while Llama Cookbook offers a broader range of examples and tutorials for working with the Llama model family. The choice between the two depends on the specific requirements of your project and the level of customization you need.
Convert
designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
Llama Cookbook
Official Guide to building with Llama
Welcome to the official repository for helping you get started with inference, fine-tuning and end-to-end use-cases of building with the Llama Model family.
This repository covers the most popular community approaches, use-cases and the latest recipes for Llama Text and Vision models.
Latest Llama 4 recipes
- Get started with Llama API
- Integrate Llama API with WhatsApp
- 5M long context using Llama 4 Scout
- Analyze research papers with Llama 4 Maverick
- Create a character mind map from a book using Llama 4 Maverick
Repository Structure:
- 3P Integrations: Getting Started Recipes and End to End Use-Cases from various Llama providers
- End to End Use Cases: As the name suggests, spanning various domains and applications
- Getting Started: Reference for inferencing, fine-tuning and RAG examples
- src: Contains the src for the original llama-recipes library along with some FAQs for fine-tuning.
Note: We recently did a refactor of the repo, archive-main is a snapshot branch from before the refactor
FAQ:
-
Q: What happened to llama-recipes? A: We recently renamed llama-recipes to llama-cookbook.
-
Q: I have some questions for Fine-Tuning, is there a section to address these? A: Check out the Fine-Tuning FAQ here.
-
Q: Some links are broken/folders are missing: A: We recently did a refactor of the repo, archive-main is a snapshot branch from before the refactor.
-
Q: Where can we find details about the latest models? A: Official Llama models website.
Contributing
Please read CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests to us.
License
See the License file for Meta Llama 4 here and Acceptable Use Policy here
See the License file for Meta Llama 3.3 here and Acceptable Use Policy here
See the License file for Meta Llama 3.2 here and Acceptable Use Policy here
See the License file for Meta Llama 3.1 here and Acceptable Use Policy here
See the License file for Meta Llama 3 here and Acceptable Use Policy here
See the License file for Meta Llama 2 here and Acceptable Use Policy here
Top Related Projects
Inference code for Llama models
Code and documentation to train Stanford's Alpaca models, and generate the data.
Instruct-tune LLaMA on consumer hardware
LLM inference in C/C++
Inference Llama 2 in one file of pure C
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Convert
designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot