Convert Figma logo to code with AI

lifting-bits logoremill

Library for lifting machine code to LLVM bitcode

1,550
162
1,550
68

Top Related Projects

2,739

Framework for lifting x86, amd64, aarch64, sparc32, and sparc64 program binaries to LLVM bitcode

8,398

RetDec is a retargetable machine-code decompiler based on LLVM.

8,313

A powerful and user-friendly binary analysis platform!

22,613

UNIX-like reverse engineering framework and command-line toolset

Capstone disassembly/disassembler framework for ARM, ARM64 (ARMv8), Alpha, BPF, Ethereum VM, HPPA, LoongArch, M68K, M680X, Mips, MOS65XX, PPC, RISC-V(rv32G/rv64G), SH, Sparc, SystemZ, TMS320C64X, TriCore, Webassembly, XCore and X86.

Quick Overview

Remill is an open-source library for lifting machine code to LLVM bitcode. It supports multiple architectures including x86, x86_64, and AArch64, and can be used for various purposes such as binary analysis, reverse engineering, and program transformation.

Pros

  • Supports multiple architectures (x86, x86_64, AArch64)
  • Integrates well with LLVM ecosystem
  • Actively maintained and regularly updated
  • Provides a flexible API for custom use cases

Cons

  • Steep learning curve for beginners
  • Limited documentation for advanced features
  • May require significant computational resources for large binaries
  • Dependency on LLVM version can cause compatibility issues

Code Examples

  1. Lifting x86 assembly to LLVM bitcode:
#include <remill/Arch/X86/Runtime/State.h>
#include <remill/BC/Lifter.h>

int main() {
    auto arch = remill::Arch::GetArchitecture(remill::kArchX86);
    auto lifter = std::make_unique<remill::InstructionLifter>(arch.get());
    
    std::string assembly = "mov eax, 42";
    llvm::LLVMContext context;
    auto module = lifter->LiftInstructionToModule(assembly, context);
}
  1. Analyzing lifted bitcode:
#include <remill/BC/Util.h>

void analyze_bitcode(llvm::Module *module) {
    for (auto &function : module->functions()) {
        if (remill::IsLiftedFunction(function)) {
            // Analyze lifted function
            for (auto &block : function) {
                // Analyze basic block
            }
        }
    }
}
  1. Transforming lifted code:
#include <remill/BC/IntrinsicTable.h>

void transform_lifted_code(llvm::Module *module) {
    remill::IntrinsicTable intrinsics(module);
    
    for (auto &function : module->functions()) {
        if (remill::IsLiftedFunction(function)) {
            // Apply custom transformations
            // e.g., replace memory intrinsics, optimize branches, etc.
        }
    }
}

Getting Started

To get started with Remill:

  1. Clone the repository:

    git clone https://github.com/lifting-bits/remill.git
    
  2. Install dependencies (on Ubuntu):

    sudo apt-get install build-essential cmake python3-pip
    pip3 install --user --upgrade pip
    pip3 install --user --upgrade setuptools wheel
    
  3. Build Remill:

    cd remill
    mkdir build && cd build
    cmake ..
    make -j$(nproc)
    
  4. Include Remill in your project's CMakeLists.txt:

    find_package(remill REQUIRED)
    target_link_libraries(your_target remill)
    

Competitor Comparisons

2,739

Framework for lifting x86, amd64, aarch64, sparc32, and sparc64 program binaries to LLVM bitcode

Pros of McSema

  • Supports a wider range of architectures, including x86, x86_64, and ARM
  • Provides more comprehensive binary analysis capabilities
  • Offers better integration with other binary analysis tools

Cons of McSema

  • More complex setup and usage compared to Remill
  • Slower lifting process due to its comprehensive nature
  • Requires more system resources for operation

Code Comparison

McSema:

#include <remill/Arch/Arch.h>
#include <remill/BC/Util.h>
#include <mcsema/Arch/Arch.h>
#include <mcsema/BC/Util.h>

void LiftFunction(const mcsema::Arch *arch, llvm::Function *func) {
    // McSema-specific lifting code
}

Remill:

#include <remill/Arch/Arch.h>
#include <remill/BC/Util.h>

void LiftInstruction(const remill::Arch *arch, llvm::BasicBlock *block) {
    // Remill-specific lifting code
}

Both McSema and Remill are part of the lifting-bits project and share some common components. McSema builds upon Remill's foundation, offering more features and broader architecture support at the cost of increased complexity. Remill focuses on providing a simpler, more streamlined approach to instruction lifting, making it easier to use for specific tasks but with more limited capabilities compared to McSema.

8,398

RetDec is a retargetable machine-code decompiler based on LLVM.

Pros of RetDec

  • More comprehensive decompilation capabilities, supporting multiple architectures and file formats
  • Includes a graphical user interface for easier use by non-technical users
  • Actively maintained with regular updates and community support

Cons of RetDec

  • Slower decompilation process compared to Remill's lifting approach
  • Larger codebase and more complex setup, potentially making it harder to integrate into other projects
  • May produce less accurate results for certain specific use cases

Code Comparison

RetDec (C++ decompilation output):

int32_t function_401000(int32_t a1) {
    int32_t v1 = a1 * 2;
    return v1 + 5;
}

Remill (LLVM IR lifting output):

define i32 @sub_401000(i32 %a1) {
    %v1 = mul i32 %a1, 2
    %result = add i32 %v1, 5
    ret i32 %result
}

Both projects aim to analyze binary code, but RetDec focuses on full decompilation to high-level languages, while Remill specializes in lifting machine code to LLVM IR. RetDec offers a more user-friendly approach for general reverse engineering tasks, whereas Remill provides a powerful foundation for advanced binary analysis and transformation tools.

8,313

A powerful and user-friendly binary analysis platform!

Pros of angr

  • More comprehensive analysis framework with symbolic execution capabilities
  • Larger community and ecosystem of plugins/extensions
  • Supports a wider range of architectures and binary formats

Cons of angr

  • Steeper learning curve due to complexity
  • Can be slower for certain types of analysis
  • Requires more system resources, especially for large binaries

Code Comparison

angr example:

import angr

proj = angr.Project('binary')
state = proj.factory.entry_state()
simgr = proj.factory.simulation_manager(state)
simgr.explore(find=0x400000)

Remill example:

#include <remill/Arch/X86/Runtime/State.h>
#include <remill/BC/Lifter.h>

auto module = remill::LoadModuleFromFile(arch, bc_file);
auto func = remill::LiftCodeIntoModule(module, addr);

The angr code demonstrates setting up a project and running symbolic execution, while the Remill code shows lifting binary code to LLVM IR. angr provides higher-level abstractions for program analysis, whereas Remill focuses on instruction lifting and translation to LLVM IR.

22,613

UNIX-like reverse engineering framework and command-line toolset

Pros of radare2

  • Comprehensive reverse engineering framework with a wide range of features
  • Large and active community, extensive documentation, and plugins ecosystem
  • Supports a vast array of architectures and file formats

Cons of radare2

  • Steeper learning curve due to its extensive feature set
  • Can be resource-intensive for large binaries or complex analysis tasks
  • Command-line interface may be less intuitive for some users

Code comparison

radare2:

r_core_cmd(core, "aaa", 0);
r_core_cmd(core, "pdf @ main", 0);

Remill:

auto module = LoadModuleFromFile(argv[1], &context);
auto program = GenerateProgram(*module);

Key differences

Radare2 is a full-featured reverse engineering framework, while Remill focuses on lifting binary code to LLVM IR. Radare2 offers a broader set of tools for various reverse engineering tasks, whereas Remill specializes in binary-to-IR translation for further analysis or recompilation.

Radare2 is more suitable for interactive analysis and scripting, while Remill is designed to be integrated into larger binary analysis systems. Radare2 has a larger user base and more extensive documentation, but Remill's specialized focus may make it more efficient for certain binary lifting tasks.

Capstone disassembly/disassembler framework for ARM, ARM64 (ARMv8), Alpha, BPF, Ethereum VM, HPPA, LoongArch, M68K, M680X, Mips, MOS65XX, PPC, RISC-V(rv32G/rv64G), SH, Sparc, SystemZ, TMS320C64X, TriCore, Webassembly, XCore and X86.

Pros of Capstone

  • Wider architecture support (x86, ARM, MIPS, PowerPC, etc.)
  • More mature and established project with extensive documentation
  • Lightweight and easy to integrate into existing projects

Cons of Capstone

  • Primarily focused on disassembly, not lifting to intermediate representation
  • Less suitable for advanced program analysis tasks
  • May require additional tools for more complex reverse engineering workflows

Code Comparison

Capstone (disassembly example):

cs_insn *insn;
size_t count = cs_disasm(handle, code, code_size, address, 0, &insn);
for (size_t j = 0; j < count; j++) {
    printf("0x%"PRIx64":\t%s\t\t%s\n", insn[j].address, insn[j].mnemonic, insn[j].op_str);
}

Remill (lifting example):

auto lifted_block = remill::LiftCodeBlock(arch, memory, block_address);
for (const auto &inst : lifted_block->instructions) {
    std::cout << inst.Serialize() << std::endl;
}

Remill focuses on lifting machine code to an intermediate representation, which is more suitable for advanced program analysis and transformation tasks. Capstone, on the other hand, excels at disassembly and provides a simpler API for basic instruction decoding across multiple architectures.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Remill Slack Chat

Remill is a static binary translator that translates machine code instructions into LLVM bitcode. It translates AArch64 (64-bit ARMv8), SPARC32 (SPARCv8), SPARC64 (SPARCv9), x86 and amd64 machine code (including AVX and AVX512) into LLVM bitcode. AArch32 (32-bit ARMv8 / ARMv7) support is underway.

Remill focuses on accurately lifting instructions. It is meant to be used as a library for other tools, e.g. McSema.

Build Status

Build Status

Documentation

To understand how Remill works you can take a look at the following resources:

If you would like to contribute you can check out: How to contribute

API Documentation

Generate detailed API documentation using Doxygen:

# Install Doxygen (macOS)
brew install doxygen graphviz

# Install Doxygen (Ubuntu/Debian)
sudo apt-get install doxygen graphviz

# Generate documentation
doxygen

# Open docs/doxygen/html/index.html in your browser

See docs/DOCUMENTATION.md for more details on documentation style and contributing.

Getting Help

If you are experiencing undocumented problems with Remill then ask for help in the #binary-lifting channel of the Empire Hacking Slack.

Supported Platforms

Remill is supported on Linux platforms and has been tested on Ubuntu 22.04. Remill also works on macOS, and has experimental support for Windows.

Remill's Linux version can also be built via Docker for quicker testing.

Dependencies

Remill uses the following dependencies:

NameVersion
GitLatest
CMake3.21+
Ninja1+
Google Flags52e94563
Google Logv0.7.1
Google Testv1.17.0
LLVM15+
Clang15+
Intel XEDv2025.06.08
Python3+

Getting and Building the Code

We will build the project using the superbuild in dependencies/. For more details on the dependency management system, see Remill Dependency Management.

Clone the repository

git clone https://github.com/lifting-bits/remill
cd remill

Linux/macOS

# Step 1: Build dependencies (including LLVM)
cmake -G Ninja -S dependencies -B dependencies/build
cmake --build dependencies/build

# Step 2: Build remill
cmake -G Ninja -B build -DCMAKE_PREFIX_PATH:PATH=$(pwd)/dependencies/install -DCMAKE_BUILD_TYPE=Release
cmake --build build

Windows (requires clang-cl)

Note: This requires running from a Visual Studio developer prompt.

# Step 1: Build dependencies
cmake -G Ninja -S dependencies -B dependencies/build -DCMAKE_C_COMPILER=clang-cl -DCMAKE_CXX_COMPILER=clang-cl
cmake --build dependencies/build

# Step 2: Build remill
cmake -G Ninja -B build -DCMAKE_PREFIX_PATH:PATH=%CD%/dependencies/install -DCMAKE_C_COMPILER=clang-cl -DCMAKE_CXX_COMPILER=clang-cl -DCMAKE_BUILD_TYPE=Release
cmake --build build

macOS with Homebrew LLVM:

# Install LLVM via Homebrew
brew install llvm@17
LLVM_PREFIX=$(brew --prefix llvm@17)

# Build dependencies with external LLVM
cmake -G Ninja -S dependencies -B dependencies/build -DUSE_EXTERNAL_LLVM=ON "-DCMAKE_PREFIX_PATH:PATH=$LLVM_PREFIX"
cmake --build dependencies/build

# Build remill
cmake -G Ninja -B build "-DCMAKE_PREFIX_PATH:PATH=$(pwd)/dependencies/install" -DCMAKE_BUILD_TYPE=Release
cmake --build build

Linux with system LLVM:

# Build dependencies with external LLVM
cmake -G Ninja -S dependencies -B dependencies/build -DUSE_EXTERNAL_LLVM=ON
cmake --build dependencies/build

# Build remill
cmake -G Ninja -B build "-DCMAKE_PREFIX_PATH:PATH=$(pwd)/dependencies/install" -DCMAKE_BUILD_TYPE=Release
cmake --build build