DistilKaggle: a distilled dataset of Kaggle Jupyter notebooks

Published in Proceedings of the 21st International Conference on Mining Software Repositories, 2024

DistilKaggle introduces a rigorously curated dataset of Kaggle Jupyter notebooks designed to support empirical studies and machine learning research. This distilled corpus facilitates advanced research in mining software repositories by providing high-quality, real-world data science artifacts for analysis.

Recommended citation: Mostafavi Ghahfarokhi, M., Asgari, A., Abolnejadian, M., & Heydarnoori, A. (2024). "DistilKaggle: a distilled dataset of Kaggle Jupyter notebooks." Proceedings of the 21st International Conference on Mining Software Repositories, 647-651.
Download Paper