White Papers & Reports

PySpark: High-performance data processing without learning Scala

PySpark: High-performance data processing without learning Scala

This white paper discusses the advantages of using the PySpark API, which enables the use of Python to interact with the Spark programming model. It starts with a basic description of Spark and then describes PySpark, its benefits, and when it is appropriate to use instead of "pandas" open source library. It ends by encouraging the reader to get started with PySpark by using IBM Data Science Experience.