PySpark分布式机器学习与大数据分析(Distributed Machine Learning and Big Data Analysis with PySpark)
来自cslt Wiki
Apache Spark is an open source cluster computing framework. Originally developed at the University of California, Berkeley, the Spark codebase was later donated to the Apache Software Foundation that has maintained it since. Spark provides an interface for programming entire clusters with implicit data parallelism and fault-tolerance.