PDF DataSource for Apache Spark, allow to read PDF files directly to the DataFrame and ocr it
-
Updated
Apr 15, 2025 - Scala
PDF DataSource for Apache Spark, allow to read PDF files directly to the DataFrame and ocr it
Google BigQuery data source for Apache Spark
Allows reading ROOT TTrees into Apache Spark as DataFrames
Contains the code and examples for my article on Medium, which explains how to create a custom JDBC read-only data source in Apache Spark 3
Scala/Spark Netcdf for reading Netcdf files
Add a description, image, and links to the spark-datasource topic page so that developers can more easily learn about it.
To associate your repository with the spark-datasource topic, visit your repo's landing page and select "manage topics."