tutorial: https://2x2xplz.github.io/Dataframe100Examples/
Github repo: https://github.com/2x2xplz/Dataframe100Examples
originial SQL source tutorial: https://2x2xplz.github.io/sql-tutorial-100queries/
In early February 2024, a SQL tutorial named "SQL for Data Scientists in 100 Examples" reached the top spot on Hacker News. I was impressed with the project and felt that its examples would be a great starting point upon which to create Kotlin dataframe tutorial.
(update: the SQL tutorial authors have added and edited a lot of the original content, it no longer is a simple list of 100 examples, as it has evolved it remains a great project. However if you want to compare the 100 dataframe examples to the 100 SQL examples upon which they were based, please view the "original SQL source" linked above).
About a year ago, I was given the opportunity to speak at KotlinConf about dataframe, after employing it as the core component of a database marketing platform I'd built. It allowed me to build the entire engine purely in Kotlin without handing off data wrangling to pandas or doing too much in SQL. My talk was actually named "Replacing SQL with Kotlin's 'dataframe'" so that was a first attempt at explaining a SQL-to-dataframe migration.
My goal on this project was to clone the 100 SQL queries from the original article (update: the "original" state of the SQL project is linked above) and produce the same results via Kotlin dataframe, providing a resource for data analysts, scientists, programmers and software engineers who are looking to utilize dataframe in their projects. The original examples were relatively simple, in order to focus on the base concepts and not compose a lot of complexity. Similarly, the dataframe examples are, for the most part, basic examples. Real-world use of dataframe can quickly get much more complicated, involving a lot of grouping, pivoting, hierarchies, data conversions, object creation, etc.