Have Your Iceberg Cubed, Not Sorted: Meet Qbeast's OTree Index

Open table formats like Apache Iceberg and Delta Lake have a clustering problem. You partition your data, sort it, maybe throw Z-ordering at it. But partitions get imbalanced, distributions shift, and your layout drifts the moment new data arrives.

Qbeast, a Barcelona-based startup, has a different idea.

It's called the OTree multidimensional index. No fixed partition keys. No sort orders. Data gets organized into adaptive hypercubes that subdivide based on actual data distribution. Each row maps to a point in multidimensional space defined by your indexed columns. Cubes split automatically when they fill up. Two indexed columns give you four subcubes per split. Three columns give you eight. Values get normalized to the 0 to 1 range, keeping nearby data close in space.

The approach came out of research at the Barcelona Supercomputing Center. Flavio Junqueira, Qbeast's co-founder and CTO, knows distributed systems. He was a founding committer of Apache ZooKeeper at Yahoo! Research and later worked on Apache Kafka at Confluent. The OTree stays compatible with query engines like Apache Spark, which matters for real adoption.

Qbeast challenges two assumptions: that indexes exist mainly to speed up reads, and that open table formats can't use tree-based indexes.

Stop forcing data into rigid structures. Let the index adapt to the data.