borb

borb is a library for reading, creating and manipulating PDF files in python.

3,551

157

3,551

View on GitHub

Top Related Projects

PyMuPDF

8,415

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

pypdf

9,721

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files

borb

3,551

borb is a library for reading, creating and manipulating PDF files in python.

pdfminer.six

6,843

Community maintained fork of pdfminer - we fathom PDF

pikepdf

2,514

A Python library for reading and writing PDF, powered by QPDF

pdfarranger

5,066

Small python-gtk application, which helps the user to merge or split PDF documents and rotate, crop and rearrange their pages using an interactive and intuitive graphical interface.

Quick Overview

borb is a comprehensive Python library for reading, creating, and manipulating PDF files. It offers a wide range of features, from basic PDF creation to complex operations like digital signatures, form filling, and PDF/A conversion. borb aims to provide a user-friendly API while maintaining powerful functionality for PDF handling.

Pros

Extensive feature set covering most PDF-related operations
User-friendly API with intuitive method names and structure
Actively maintained with regular updates and improvements
Comprehensive documentation and examples available

Cons

May have a steeper learning curve for complex operations
Performance might be slower compared to some native PDF libraries
Large library size due to its comprehensive feature set
Some advanced features may require additional dependencies

Code Examples

Creating a simple PDF:

from borb.pdf import Document, Page, Paragraph, PDF

# Create document
doc = Document()

# Add page
page = Page()
doc.add_page(page)

# Add content
page.add(Paragraph("Hello, World!"))

# Save PDF
with open("output.pdf", "wb") as pdf_file_handle:
    PDF.dumps(pdf_file_handle, doc)

Adding an image to a PDF:

from borb.pdf import Document, Page, Image, PDF
from borb.io import HttpIO

# Create document and page
doc = Document()
page = Page()
doc.add_page(page)

# Add image from URL
img = Image(
    HttpIO.get_image_from_url(
        "https://www.example.com/image.jpg"
    )
)
page.add(img)

# Save PDF
with open("image.pdf", "wb") as pdf_file_handle:
    PDF.dumps(pdf_file_handle, doc)

Extracting text from a PDF:

from borb.pdf import Document
from borb.toolkit import SimpleTextExtraction

# Open existing PDF
doc = Document.from_file("input.pdf")

# Extract text
extraction = SimpleTextExtraction()
doc.process_page(extraction)

# Print extracted text
print(extraction.get_text())

Getting Started

To get started with borb, first install it using pip:

pip install borb

Then, you can create a simple PDF using the following code:

from borb.pdf import Document, Page, Paragraph, PDF

doc = Document()
page = Page()
doc.add_page(page)
page.add(Paragraph("My first PDF with borb!"))

with open("quickstart.pdf", "wb") as pdf_file_handle:
    PDF.dumps(pdf_file_handle, doc)

This will create a PDF file named "quickstart.pdf" with the text "My first PDF with borb!" on the first page.

Competitor Comparisons

PyMuPDF

8,415

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

Pros of PyMuPDF

Faster performance for large PDF operations
More comprehensive PDF manipulation capabilities
Better support for complex PDF structures and annotations

Cons of PyMuPDF

Steeper learning curve due to more complex API
Larger library size, which may impact deployment in some scenarios
Less focus on PDF creation from scratch compared to borb

Code Comparison

PyMuPDF example:

import fitz
doc = fitz.open("input.pdf")
page = doc[0]
text = page.get_text()
doc.close()

borb example:

from borb.pdf import Document
from borb.pdf import PDF

doc = Document()
page = doc.get_page(0)
text = page.get_text()

Both libraries offer PDF manipulation capabilities, but PyMuPDF generally provides more advanced features and better performance for complex operations. borb, on the other hand, has a simpler API and is more focused on PDF creation. The choice between the two depends on the specific requirements of your project and the level of PDF manipulation needed.

pypdf

9,721

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files

Pros of PyPDF

Lightweight and focused on PDF manipulation tasks
Extensive documentation and community support
Simple API for common PDF operations

Cons of PyPDF

Limited PDF creation capabilities
Less comprehensive feature set for advanced PDF operations
Slower performance for large-scale PDF processing

Code Comparison

PyPDF:

from pypdf import PdfReader, PdfWriter

reader = PdfReader("input.pdf")
writer = PdfWriter()

page = reader.pages[0]
writer.add_page(page)

with open("output.pdf", "wb") as output_file:
    writer.write(output_file)

borb:

from borb.pdf import Document, Page, PDF

doc = Document()
page = Page()
doc.add_page(page)

with open("output.pdf", "wb") as pdf_file_handle:
    PDF.dumps(pdf_file_handle, doc)

PyPDF focuses on simple PDF manipulation tasks with a straightforward API, while borb offers more comprehensive PDF creation and manipulation capabilities. PyPDF is ideal for basic PDF operations, whereas borb provides advanced features for complex PDF workflows. The code examples demonstrate the difference in approach, with PyPDF emphasizing reading and writing existing PDFs, and borb showcasing its document creation capabilities.

borb

3,551

borb is a library for reading, creating and manipulating PDF files in python.

Pros of borb

More comprehensive PDF manipulation capabilities
Actively maintained with regular updates
Extensive documentation and examples

Cons of borb

Larger library size
Steeper learning curve for beginners
May have more dependencies

Code Comparison

borb:

from borb.pdf import Document, Page, Paragraph

doc = Document()
page = Page()
doc.add_page(page)
page.add(Paragraph("Hello World!"))

borb>:

from borb.pdf import Document, Page, Paragraph

doc = Document()
page = Page()
doc.add_page(page)
page.add(Paragraph("Hello World!"))

In this case, the code snippets are identical, as borb> appears to be a fork or related project to borb. The main differences between the two repositories likely lie in their development focus, maintenance, and specific feature implementations rather than core functionality.

Both projects aim to provide PDF manipulation capabilities in Python, but borb seems to be the more established and actively maintained option. borb> may serve as an experimental branch or alternative implementation, but without more specific information about its purpose, it's difficult to determine its unique advantages or use cases.

pdfminer.six

6,843

Community maintained fork of pdfminer - we fathom PDF

Pros of pdfminer.six

More established project with a longer history and larger community
Focuses specifically on PDF text extraction and analysis
Provides lower-level control over PDF parsing process

Cons of pdfminer.six

Limited functionality for PDF creation or modification
Less user-friendly for beginners or those seeking quick results
Slower performance compared to some alternatives

Code Comparison

pdfminer.six:

from pdfminer.high_level import extract_text

text = extract_text('sample.pdf')
print(text)

borb:

from borb.pdf import Document

doc = Document.load("sample.pdf")
text = ""
for page in doc.pages:
    text += page.extract_text()
print(text)

Both libraries offer methods to extract text from PDF files, but borb provides a more object-oriented approach with additional features for PDF manipulation. pdfminer.six is more focused on text extraction and analysis, while borb offers a broader range of PDF-related functionalities.

pdfminer.six is better suited for projects requiring detailed PDF parsing and text analysis, whereas borb is more versatile for general PDF handling, including creation and modification tasks. The choice between the two depends on the specific requirements of your project and your familiarity with PDF structures.

pikepdf

2,514

A Python library for reading and writing PDF, powered by QPDF

Pros of pikepdf

Faster performance for large PDF operations
More comprehensive low-level PDF manipulation capabilities
Better integration with existing Python PDF libraries

Cons of pikepdf

Steeper learning curve for beginners
Less focus on high-level PDF creation and editing tasks
Requires more code for simple operations compared to borb

Code Comparison

pikepdf example:

import pikepdf

pdf = pikepdf.Pdf.open("input.pdf")
page = pdf.pages[0]
page.rotate(90)
pdf.save("output.pdf")

borb example:

from borb.pdf import Document, Page, PDF

doc = Document()
page = Page()
doc.add_page(page)
with open("output.pdf", "wb") as pdf_file_handle:
    PDF.dumps(pdf_file_handle, doc)

pikepdf is better suited for complex PDF manipulation tasks, offering more granular control over PDF structures. It excels in scenarios requiring high performance and integration with other PDF libraries. borb, on the other hand, provides a more user-friendly approach for basic PDF creation and editing, making it easier for beginners to get started with PDF operations in Python.

pdfarranger

5,066

Small python-gtk application, which helps the user to merge or split PDF documents and rotate, crop and rearrange their pages using an interactive and intuitive graphical interface.

Pros of pdfarranger

User-friendly GUI for PDF manipulation
Lightweight and focused on specific PDF tasks
Cross-platform compatibility (Linux, Windows, macOS)

Cons of pdfarranger

Limited to basic PDF operations (merging, splitting, rotating)
Lacks advanced PDF creation and modification features
Requires external dependencies (Python, GTK)

Code Comparison

pdfarranger (Python):

from pdfarranger.core import PdfArrangement

pdf = PdfArrangement()
pdf.import_file("input.pdf")
pdf.rotate_page(0, 90)
pdf.export("output.pdf")

borb (Python):

from borb.pdf import Document, Page, PDF

doc = Document()
page = Page()
doc.add_page(page)
page.add_text("Hello, World!", font_size=12)
PDF.dumps(doc, "output.pdf")

Key Differences

pdfarranger focuses on GUI-based PDF manipulation
borb offers programmatic PDF creation and modification
pdfarranger is ideal for quick, visual PDF edits
borb provides more flexibility for complex PDF operations

Both projects serve different use cases, with pdfarranger being more suitable for end-users needing simple PDF edits, while borb caters to developers requiring programmatic PDF handling.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

borb

borb is a powerful and flexible Python library for creating and manipulating PDF files.

ð Overview

borb provides a pure Python solution for PDF document management, allowing users to read, write, and manipulate PDFs. It models PDF files in a JSON-like structure, using nested lists, dictionaries, and primitives (numbers, strings, booleans, etc.). Created and maintained as a solo project, borb prioritizes common PDF use cases for practical and straightforward usage.

â¨ Features

Explore borbâs capabilities in the examples repository for practical, real-world applications, including:

PDF Metadata Management (reading, editing)
Text and Image Extraction
Adding Annotations (notes, links)
Content Manipulation (adding text, images, tables, lists)
Page Layout Management with PageLayout

â¦and much more!

ð Installation

Install borb directly via pip:

pip install borb

To ensure you have the latest version, consider the following commands:

pip uninstall borb
pip install --no-cache borb

ð Getting Started: Hello World

Create your first PDF in just a few lines of code with borb:

from pathlib import Path
from borb.pdf import Document, Page, PageLayout, SingleColumnLayout, Paragraph, PDF

# Create an empty Document
d: Document = Document()

# Create an empty Page
p: Page = Page()
d.append_page(p)

# Create a PageLayout
l: PageLayout = SingleColumnLayout(p)

# Add a Paragraph
l.append_layout_element(Paragraph('Hello World!'))

# Write the PDF
PDF.write(what=d, where_to="assets/output.pdf")

ð License

borb is dual-licensed under AGPL and a commercial license.

The AGPL (Affero General Public License) is an open-source license, but commercial use cases require a paid license, especially if you intend to:

Offer paid PDF services (e.g., PDF generation in cloud applications)
Use borb in closed-source projects
Distribute borb in any closed-source product

For more information, contact our sales team.

ð Acknowledgements

Special thanks to:

Aleksander Banasik
BenoÃ®t Lagae
Michael Klink

Your contributions and guidance have been invaluable to borb's development.

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

Top Related Projects

Quick Overview

Pros

Cons

Code Examples

Getting Started

Competitor Comparisons

Pros of PyMuPDF

Cons of PyMuPDF

Code Comparison

Pros of PyPDF

Cons of PyPDF

Code Comparison

Pros of borb

Cons of borb

Code Comparison

Pros of pdfminer.six

Cons of pdfminer.six

Code Comparison

Pros of pikepdf

Cons of pikepdf

Code Comparison

Pros of pdfarranger

Cons of pdfarranger

Code Comparison

Key Differences

Convert designs to code with AI

README

borb

ð Overview

â¨ Features

ð Installation

ð Getting Started: Hello World

ð License

ð Acknowledgements

Top Related Projects

Convert designs to code with AI

ð Overview

â¨ Features

ð Installation

ð Getting Started: Hello World

ð License

ð Acknowledgements