A Feature Rich Lakehouse Catalog

Pangolin is an MIT licensed Open Source Lakehouse Catalog written in Rust. Designed for enthusiasts who want multi-table consistency, catalog federation, multi-tenancy, and business metadata catalog features in one catalog. Pangolin is currently in alpha and not ready for production. Contributions are welcome!

Passion project by Alex Merced.

Pangolin Dashboard Interface

Why Pangolin?

🚀

High Performance

Built with Rust for blazing fast API responses and low resource footprint.

☁️

Multi-Cloud Ready

Deploy anywhere. First-class support for AWS S3, Azure Blob Storage, and Google Cloud Storage.

💾

Flexible Backends

Store your metadata in PostgreSQL, MongoDB, or SQLite. Choose what fits your stack.

🔒

Secure & Multi-Tenant

Built-in Role-Based Access Control (RBAC) and Multi-Tenancy support out of the box.

📋

Business Metadata

Tag, document, and organize your data assets with validatable business metadata.

🌿

Git-Like Branching

Experiment safely with Zero-Copy branching for your data catalogs.

🏷️

Tag-Based Access Control

Granular permission management using tags for easier governance at scale.

🖥️

Management UI

A beautiful, modern interface to manage your catalogs, users, and permissions.

⌨️

Powerful CLI

Full control of your catalog from the terminal for automation and power users.

🔍

Dataset Discovery

Easily find dataset across all your catalogs with robust search and filtering.

🔑

Credential Vending

Securely vend temporary credentials for S3, Azure, and GCS to compute engines.

🧊

Iceberg REST Catalog

Fully compliant implementation of the Iceberg REST Catalog specification.

PyPangolin - Official Python Client

Full-featured Python library for Pangolin with PyIceberg integration, multi-format support, and secure database connection management.
Now available on PyPI: pip install pypangolin

🐍 Complete Python Integration

PyPangolin provides a comprehensive Python interface to all Pangolin features, plus specialized support for popular data formats and database connections.

View on PyPI → Documentation →

Table Format Support

🧊

Apache Iceberg

Full PyIceberg integration with read/write operations

✅ Tested
🔺

Delta Lake

Read/write Delta tables with automatic registration

✅ Tested
🏛️

Apache Hudi

Register and manage Hudi tables

Supported
🎯

Apache Paimon

Register and track Paimon tables

Supported
📊

Parquet

Read/write Parquet files with metadata

✅ Tested
📄

CSV & JSON

Read/write structured files

✅ Tested
🎯

Lance

Read/write vector database format

✅ Tested
🌀

Vortex

High-performance columnar format

Supported

Database Connection Management

Securely store and manage database credentials with Fernet encryption

🐘

PostgreSQL

Encrypted credential storage

✅ Tested
🐬

MySQL

Secure connection sharing

✅ Tested
🍃

MongoDB

NoSQL database connections

✅ Tested
❄️

Snowflake

Cloud data warehouse

⚠️ Untested
🔴

Redshift

AWS data warehouse

⚠️ Untested
☁️

BigQuery

Google Cloud analytics

⚠️ Untested
🔷

Azure Synapse

Microsoft analytics service

⚠️ Untested

Dremio

Arrow Flight connections

✅ Tested

Additional Features

Governance & Security

RBAC, permissions, service users, and business metadata

Admin & System

Audit logging, search, token management, and system config

Federated Catalogs

Connect to remote Iceberg catalogs and create SQL views

Git Operations

Branching, merging, tagging with conflict resolution

Looking for Enterprise Scale?

Apache Polaris

The production-grade community-run lakehouse catalog. Best for large-scale open source deployments.

Visit Polaris →

Dremio Cloud

A managed version of Polaris with a built-in semantic layer, federated queries, and AI-powered autonomous optimization.

Try Dremio Cloud →

Join the Community

Discuss Pangolin, Iceframe, and Dremioframe in the #pangolin-catalog channel on the Data Lakehouse Hub Slack.

Join Slack