close
close

Kotlin DataFrame ❤️ Arrow – DEV Community

Kotlin DataFrame v0.14 comes with improvements for reading the Apache Arrow format, especially for loading a DataFrame from an ArrowReader.
This improvement can be used to easily load results from analytical databases (such as DuckDB, ClickHouse) directly into Kotlin DataFrame.

Here are two examples of integrations that enable smooth data import into Kotlin DataFrames using Apache Arrow.

DuckDB

DuckDB is an Analytics database that can be embedded for use in a Kotlin notebook. DuckDB facilitates reading query results as an Arrow Stream, allowing easy loading into a Kotlin DataFrame.

Here’s a basic notebook that uses DuckDB to query data from an external parquet file and then import the results into a Kotlin dataFrame using Arrow Stream

ClickHouse

ClickHouse is a powerful, column-oriented SQL database management system (DBMS) designed for online analytical processing (OLAP).
ClickHouse allows the use of Arrow Stream as an output format.

The following notebook uses the ClickHouse client to query data in the Arrow-stream format and loads it into a kotlin dataFrame.

Conclusion

Loading Arrow data into Kotlin DataFrame is both simple and efficient, utilizing the full power of Kotlin for data analysis. This integration not only simplifies the process but also improves performance, making Kotlin DataFrame a powerful tool for processing and analyzing large data sets.