Data processing for AI: Building Daft for blazing fast I/O on structured & unstructured data

Data processing for AI: Building Daft for blazing fast I/O on structured & unstructured data

I/O is a consistent bottleneck for large scale data processing workloads, often more painful than the actual compute on the data. Unstructured data introduces additional unique challenges for I/O. We present Daft, a data engine that is purpose-built for processing data of any modality and at any scale. Daft is used to query data of all different shapes and sizes, from tabular (Parquet, CSV) to semi-structured (JSON) to unstructured (text, images, audio). We'll dive into the technical details that allow Daft to accomplish all of that while maximizing I/O throughput, including distributed reads of large files, memory stability via morsel-based execution, and I/O-aware query optimizations.

05 May 2025, 09:00 PM

Startup Stage

09:00 PM - 09:30 PM

Add to Calendar

About The Speakers

Kevin Wang

Kevin Wang

Founding Engineer, Eventual Computing


ChanChan Mao

ChanChan Mao

Developer Relations, Eventual Computing

Secoda

The unified data governance platform

Main Sponsor

Want to sponsor this event? Contact Us