Sylvain Lesage
Sylvain Lesage

@severo@mastodon.social

Dataviz freelance developer. Part-time 馃 Hugging Face.

TIL* that you can embed an index of the Parquet row group pages in the file metadata. It gives a much finer granularity when fetching parts of the Parquet file, allowing for smaller requests and faster rendering on the frontend.

It adds some weight to the metadata, which (I guess) is why it's not enabled by default in PyArrow. Note also that the PyArrow reader itself cannot make sense of this index :) I'm not sure about the current support in other clients such as DuckDB or hyparquet.

Elk Logo

Welcome to Elk!

Elk is a nimble Mastodon web client. You can login to your Mastodon account and use it to interact with the fediverse.

Expect some bugs and missing features here and there. Elk is Open Source and we're actively improving it as a community project. Join us and let's build it together!

If you'd like to report a bug, help us testing, give feedback, or contribute, reach out to us on GitHub and get involved.

To boost development, you can sponsor the Team through GitHub Sponsors. We hope you enjoy Elk!

Joaqu铆n S谩nchezDaniel RoePatakAnthony FuTAKAHASHI Shuuji涓夊挷鏅哄瓙 Kevin Deng

The Elk Team