DuckDB for Data Exploration

At Level 12, we often need to make sense of datasets from a variety of sources. One tool in our data analysis toolkit is DuckDB.

DuckDB's speed, versatility, and unified SQL interface allow us to quickly gain insights from diverse datasets.

What is DuckDB?

DuckDB is an embeddable, columnar database system. What does this mean?

Embeddable: DuckDB can be used as a library in multiple programming languages, including Python.
Columnar: DuckDB has a columnar, vectorized query engine which allows it to more efficiently process analytical queries compared to a traditional relational database management system (RDBMS).

File Type Support

DuckDB can load these file types:

CSV
Parquet
Excel
JSON

You can mix and match datasets of varying types into DuckDB tables, then run queries combining the disparate datasets using SQL joins.

This ability to query and join multiple datasets using a single interface, namely SQL, is immensely valuable.

Data Source Support

DuckDB can read from:

Local filesystem
http endpoints
Cloud storage such as S3
Database servers such as PostgreSQL and MySQL

This allows you to compare and join local data to files in cloud storage without having to manually download them.

For example, to view a parquet file from S3 using the DuckDB CLI:

SELECT * FROM 's3://my-bucket/file.parquet';

The full power of SQL is available to read and transform the data.

Also, while DuckDB can load datasets into memory, it can save multiple tables into a single database file for later use.

Diverse Data and DuckDB

In summary, DuckDB is a powerful and flexible tool for working with diverse datasets. Whether you’re working with local files, cloud storage, or live database connections, DuckDB’s unified SQL interface simplifies the process, allowing you to extract insights quickly and efficiently.

Its ability to efficiently process analytical queries, support multiple file formats, and seamlessly integrate with various data sources makes it an excellent choice for data analysis.

Originally published on 2025-02-19 by Ben Chopson

Services

Case Studies

Technologies

Industries

DuckDB for Data Exploration

What is DuckDB?

File Type Support

Data Source Support

Diverse Data and DuckDB

Reach out to us to discuss your complex deployment needs (or to chat about Star Trek)