weaviate

Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of a cloud-native database.

14,996

1,127

14,996

532

View on GitHub View on NPM

Top Related Projects

qdrant

27,002

Qdrant - High-performance, massive-scale Vector Database and Vector Search Engine for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/

milvus

38,349

Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search

chroma

24,314

Open-source search and retrieval database for AI applications.

elasticsearch

75,392

Free and Open Source, Distributed, RESTful Search Engine

vespa

6,720

AI + Data, online. https://vespa.ai

Quick Overview

Weaviate is an open-source vector database designed to store both objects and vectors, enabling semantic search, question answering, classification, and other machine learning tasks. It provides a cloud-native database with a GraphQL interface, making it easy to integrate with various AI and ML models.

Pros

Scalable and cloud-native architecture
Supports multiple vector index types for different use cases
Provides a GraphQL API for easy integration and querying
Offers multi-modal search capabilities (text, images, audio)

Cons

Steep learning curve for beginners
Limited support for traditional relational database operations
Requires careful consideration of vector embedding choices
Resource-intensive for large-scale deployments

Code Examples

Creating a schema:

import weaviate

client = weaviate.Client("http://localhost:8080")

schema = {
    "classes": [{
        "class": "Article",
        "properties": [
            {"name": "title", "dataType": ["string"]},
            {"name": "content", "dataType": ["text"]}
        ]
    }]
}

client.schema.create(schema)

Adding data:

article = {
    "title": "Weaviate: The Vector Database",
    "content": "Weaviate is a powerful vector database..."
}

client.data_object.create(
    data_object=article,
    class_name="Article"
)

Performing a semantic search:

query = "What is a vector database?"

result = (
    client.query
    .get("Article", ["title", "content"])
    .with_near_text({"concepts": [query]})
    .with_limit(5)
    .do()
)

print(result)

Getting Started

Install Weaviate:

docker-compose up -d

Install the Python client:

pip install weaviate-client

Connect to Weaviate:

import weaviate

client = weaviate.Client("http://localhost:8080")

Create a schema, add data, and perform queries as shown in the code examples above.

Competitor Comparisons

qdrant

27,002

Qdrant - High-performance, massive-scale Vector Database and Vector Search Engine for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/

Pros of Qdrant

Written in Rust, offering high performance and memory safety
Supports filtering during search, allowing for more precise queries
Provides a simple and intuitive API for vector search operations

Cons of Qdrant

Less mature ecosystem compared to Weaviate
Fewer built-in integrations with other tools and services
Limited support for schema management and data validation

Code Comparison

Qdrant (Python client):

from qdrant_client import QdrantClient
client = QdrantClient("localhost", port=6333)
client.create_collection("my_collection", vector_size=768)
client.upsert("my_collection", [(1, [0.1, 0.2, 0.3], {"name": "John"})])

Weaviate (Python client):

import weaviate
client = weaviate.Client("http://localhost:8080")
client.schema.create_class({
    "class": "MyClass",
    "vectorizer": "text2vec-transformers"
})
client.data_object.create({"name": "John"}, "MyClass")

Both Qdrant and Weaviate are vector databases, but they have different strengths. Qdrant excels in performance and filtering capabilities, while Weaviate offers a more comprehensive ecosystem and better schema management. The choice between them depends on specific project requirements and use cases.

milvus

38,349

Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search

Pros of Milvus

Better performance for large-scale vector similarity search
More flexible deployment options (standalone, cluster, cloud-native)
Supports multiple index types for different use cases

Cons of Milvus

Steeper learning curve and more complex setup
Limited support for non-vector data types
Less integrated AI/ML capabilities out of the box

Code Comparison

Weaviate (Python client):

import weaviate

client = weaviate.Client("http://localhost:8080")
client.schema.create_class({
    "class": "Article",
    "vectorizer": "text2vec-transformers"
})

Milvus (Python client):

from pymilvus import connections, Collection, FieldSchema, CollectionSchema, DataType

connections.connect("default", host="localhost", port="19530")
fields = [
    FieldSchema("id", DataType.INT64, is_primary=True),
    FieldSchema("embedding", DataType.FLOAT_VECTOR, dim=128)
]
schema = CollectionSchema(fields, "Article")
collection = Collection("Article", schema)

Both repositories offer vector database solutions, but Milvus excels in performance and scalability for large datasets, while Weaviate provides a more integrated approach with built-in AI capabilities. Milvus offers more flexibility in deployment and indexing options, but may require more setup and configuration. Weaviate, on the other hand, offers a simpler setup process and better support for non-vector data types, making it more suitable for smaller-scale applications or those requiring a mix of vector and traditional data storage.

chroma

24,314

Open-source search and retrieval database for AI applications.

Pros of Chroma

Simpler setup and usage, ideal for quick prototyping and small-scale projects
Native Python implementation, making it more accessible for Python developers
Lightweight and easy to integrate into existing Python workflows

Cons of Chroma

Less scalable for large-scale production environments compared to Weaviate
Fewer advanced features and customization options
Limited support for complex query types and data structures

Code Comparison

Chroma:

import chromadb

client = chromadb.Client()
collection = client.create_collection("my_collection")
collection.add(documents=["document1", "document2"], metadatas=[{"source": "web"}, {"source": "book"}], ids=["1", "2"])
results = collection.query(query_texts=["search query"], n_results=2)

Weaviate:

import weaviate

client = weaviate.Client("http://localhost:8080")
client.schema.create_class({
    "class": "Document",
    "properties": [{"name": "content", "dataType": ["text"]}]
})
client.data_object.create({"content": "document1"}, "Document")
result = client.query.get("Document", ["content"]).with_near_text({"concepts": ["search query"]}).do()

elasticsearch

75,392

Free and Open Source, Distributed, RESTful Search Engine

Pros of Elasticsearch

Mature ecosystem with extensive documentation and community support
Powerful full-text search capabilities and advanced querying options
Scalable and distributed architecture for handling large datasets

Cons of Elasticsearch

Higher resource consumption and complexity in setup and maintenance
Steeper learning curve for advanced features and optimizations
Limited vector search capabilities compared to Weaviate's native support

Code Comparison

Elasticsearch query:

{
  "query": {
    "match": {
      "title": "search example"
    }
  }
}

Weaviate query:

{
  Get {
    Article(
      nearText: {
        concepts: ["search example"]
      }
    ) {
      title
    }
  }
}

Both Elasticsearch and Weaviate offer powerful search capabilities, but they differ in their approach and specialization. Elasticsearch excels in traditional full-text search and analytics, while Weaviate focuses on vector search and AI-driven data operations. The choice between them depends on specific use cases and requirements, such as the need for vector search, scalability, and integration with AI models.

vespa

6,720

AI + Data, online. https://vespa.ai

Pros of Vespa

More comprehensive feature set for large-scale applications
Better support for real-time updates and complex queries
Stronger focus on scalability and performance optimization

Cons of Vespa

Steeper learning curve and more complex setup
Requires more resources to run effectively
Less user-friendly for smaller projects or beginners

Code Comparison

Weaviate (GraphQL query):

{
  Get {
    Article(
      nearText: {
        concepts: ["news"],
        certainty: 0.7
      }
    ) {
      title
      url
    }
  }
}

Vespa (YQL query):

select title, url from articles where {
  {rank: nearestNeighbor(embedding, query_embedding)}
  and has_embedding = true
}
limit 10;

Both repositories offer vector search capabilities, but Vespa provides a more SQL-like query language (YQL) compared to Weaviate's GraphQL approach. Vespa's query syntax may be more familiar to those with SQL experience, while Weaviate's GraphQL interface might be more intuitive for developers already working with GraphQL APIs.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Weaviate

Weaviate is an open-source, cloud-native vector database that stores both objects and vectors, enabling semantic search at scale. It combines vector similarity search with keyword filtering, retrieval-augmented generation (RAG), and reranking in a single query interface. Common use cases include RAG systems, semantic and image search, recommendation engines, chatbots, and content classification.

Weaviate supports two approaches to store vectors: automatic vectorization at import using integrated models (OpenAI, Cohere, HuggingFace, and others) or direct import of pre-computed vector embeddings. Production deployments benefit from built-in multi-tenancy, replication, RBAC authorization, and many other features.

To get started quickly, have a look at one of these tutorials:

Installation

Weaviate offers multiple installation and deployment options:

See the installation docs for more deployment options, such as AWS and GCP.

Getting started

You can easily start Weaviate and a local vector embedding model with Docker. Create a docker-compose.yml file:

services:
  weaviate:
    image: cr.weaviate.io/semitechnologies/weaviate:1.32.2
    ports:
      - "8080:8080"
      - "50051:50051"
    environment:
      ENABLE_MODULES: text2vec-model2vec
      MODEL2VEC_INFERENCE_API: http://text2vec-model2vec:8080

  # A lightweight embedding model that will generate vectors from objects during import
  text2vec-model2vec:
    image: cr.weaviate.io/semitechnologies/model2vec-inference:minishlab-potion-base-32M

Start Weaviate and the embedding service with:

docker compose up -d

Install the Python client (or use another client library):

pip install -U weaviate-client

The following Python example shows how easy it is to populate a Weaviate database with data, create vector embeddings and perform semantic search:

import weaviate
from weaviate.classes.config import Configure, DataType, Property

# Connect to Weaviate
client = weaviate.connect_to_local()

# Create a collection
client.collections.create(
    name="Article",
    properties=[Property(name="content", data_type=DataType.TEXT)],
    vector_config=Configure.Vectors.text2vec_model2vec(),  # Use a vectorizer to generate embeddings during import
    # vector_config=Configure.Vectors.self_provided()  # If you want to import your own pre-generated embeddings
)

# Insert objects and generate embeddings
articles = client.collections.get("Article")
articles.data.insert_many(
    [
        {"content": "Vector databases enable semantic search"},
        {"content": "Machine learning models generate embeddings"},
        {"content": "Weaviate supports hybrid search capabilities"},
    ]
)

# Perform semantic search
results = articles.query.near_text(query="Search objects by meaning", limit=1)
print(results.objects[0])

client.close()

This example uses the Model2Vec vectorizer, but you can choose any other embedding model provider or bring your own pre-generated vectors.

Client libraries and APIs

Weaviate provides client libraries for several programming languages:

Python
JavaScript/TypeScript
Java
Go
C# (ð§ Coming soon ð§)

There are also additional community-maintained libraries.

Weaviate exposes REST API, gRPC API, and GraphQL API to communicate with the database server.

Weaviate features

These features enable you to build AI-powered applications:

â¡ Fast Search Performance: Perform complex semantic searches over billions of vectors in milliseconds. Weaviate's architecture is built in Go for speed and reliability, ensuring your AI applications are highly responsive even under heavy load. See our ANN benchmarks for more info.
ð Flexible Vectorization: Seamlessly vectorize data at import time with integrated vectorizers from OpenAI, Cohere, HuggingFace, Google, and more. Or you can import your own vector embeddings.
ð Advanced Hybrid & Image Search: Combine the power of semantic search with traditional keyword (BM25) search, image search and advanced filtering to get the best results with a single API call.
ð¤ Integrated RAG & Reranking: Go beyond simple retrieval with built-in generative search (RAG) and reranking capabilities. Power sophisticated Q&A systems, chatbots, and summarizers directly from your database without additional tooling.
ð Production-Ready & Scalable: Weaviate is built for mission-critical applications. Go from rapid prototyping to production at scale with native support for horizontal scaling, multi-tenancy, replication, and fine-grained role-based access control (RBAC).
ð° Cost-Efficient Operations: Radically lower resource consumption and operational costs with built-in vector compression. Vector quantization and multi-vector encoding reduce memory usage with minimal impact on search performance.

For a complete list of all functionalities, visit the official Weaviate documentation.

Useful resources

Demo projects & recipes

These demos are working applications that highlight some of Weaviate's capabilities. Their source code is available on GitHub.

Elysia (GitHub): Elysia is a decision tree based agentic system which intelligently decides what tools to use, what results have been obtained, whether it should continue the process or whether its goal has been completed.
Verba (GitHub): A community-driven open-source application designed to offer an end-to-end, streamlined, and user-friendly interface for Retrieval-Augmented Generation (RAG) out of the box.
Healthsearch (GitHub): An open-source project aimed at showcasing the potential of leveraging user-written reviews and queries to retrieve supplement products based on specific health effects.
Awesome-Moviate (GitHub): A movie search and recommendation engine that allows keyword-based (BM25), semantic, and hybrid searches.

We also maintain extensive repositories of Jupyter Notebooks and TypeScript code snippets that cover how to use Weaviate features and integrations:

Blog posts

Integrations

Weaviate integrates with many external services:

Category	Description	Integrations
Cloud Hyperscalers	Large-scale computing and storage	AWS, Google
Compute Infrastructure	Run and scale containerized applications	Modal, Replicate, Replicated
Data Platforms	Data ingestion and web scraping	Airbyte, Aryn, Boomi, Box, Confluent, Astronomer, Context Data, Databricks, Firecrawl, IBM, Unstructured
LLM and Agent Frameworks	Build agents and generative AI applications	Agno, Composio, CrewAI, DSPy, Dynamiq, Haystack, LangChain, LlamaIndex, N8n, Semantic Kernel
Operations	Tools for monitoring and analyzing generative AI workflows	AIMon, Arize, Cleanlab, Comet, DeepEval, Langtrace, LangWatch, Nomic, Patronus AI, Ragas, TruLens, Weights & Biases

Contributing

We welcome and appreciate contributions! Please see our Contributor guide for the development setup, code style guidelines, testing requirements and the pull request process.

Join our Slack community or Community forum to discuss ideas and get help.

License

BSD 3-Clause License. See LICENSE for details.

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

Top Related Projects

Quick Overview

Pros

Cons

Code Examples

Getting Started

Competitor Comparisons

Pros of Qdrant

Cons of Qdrant

Code Comparison

Pros of Milvus

Cons of Milvus

Code Comparison

Pros of Chroma

Cons of Chroma

Code Comparison

Pros of Elasticsearch

Cons of Elasticsearch

Code Comparison

Pros of Vespa

Cons of Vespa

Code Comparison

Convert designs to code with AI

README

Weaviate

Installation

Getting started

Client libraries and APIs

Weaviate features

Useful resources

Demo projects & recipes

Blog posts

Integrations

Contributing

License

Top Related Projects

Convert designs to code with AI

NPM DownloadsLast 30 Days