Table of Contents
Software Prerequisites #
IPython is a command shell for interactive computing in multiple programming languages, originally developed for the Python programming language, that offers introspection, rich media, shell syntax, tab completion, and history.
The Spark Python API (PySpark) exposes the Spark programming model to Python.
Spark’s Python API #
Key Differences in the Python API #
There are a few key differences between the Python and Scala APIs:
- Python is dynamically typed, so RDDs can hold objects of multiple types.
- PySpark does not yet support a few API calls, such as lookup and non-text input files, though these will be added in future releases.
IPython Configuration #
ipython profile create pyspark
Starting IPython Notebook with PySpark #