Main Profile

At A Glance

NIPS 2011 Big Learning - Algorithms, Systems, & Tools Workshop: Spark: In-Memory Cluster...

Big Learning Workshop: Algorithms, Systems, and Tools for Learning at Scale at NIPS 2011 Invited Talk: Spark: In-Memory Cluster Computing for Iterative and Interactive Applications by Matei Zaharia Matei Zaharia is a fifth year graduate student at UC Berkeley, working with Scott Shenker and Ion Stoica on topics in cloud computing, operating systems and networking. He is also a committer on Apache Hadoop. He is funded by a Google PhD fellowship. Before joining Berkeley, Matei got his undergraduate degree at the University of Waterloo in Canada. Abstract: MapReduce and its variants have been highly successful in supporting large-scale data-intensive cluster applications. However, these systems are inefficient for applications that share data among multiple computation stages, including many machine learning algorithms, because they are based on an acyclic data flow model. We present Spark, a new cluster computing framework that extends the data flow model with a set of in-memory storage abstractions to efficiently support these applications. Spark outperforms Hadoop by up to 30x in iterative machine learning algorithms while retaining MapReduce's scalability and fault tolerance. In addition, Spark makes programming jobs easy by integrating into the Scala programming language. Finally, Spark's ability to load a dataset into memory and query it repeatedly makes it especially suitable for interactive analysis of big data. We have modified the Scala interpreter to make it possible to use Spark interactively as a highly responsive data analytics tool. At Berkeley, we have used Spark to implement several large-scale machine learning applications, including a Twitter spam classifier and a real-time automobile traffic estimation system based on expectation maximization. We will present lessons learned from these applications and optimizations we added to Spark as a result.
Length: 40:52

Contact

Questions about NIPS 2011 Big Learning - Algorithms, Systems, & Tools Workshop: Spark: In-Memory Cluster...

Want more info about NIPS 2011 Big Learning - Algorithms, Systems, & Tools Workshop: Spark: In-Memory Cluster...? Get free advice from education experts and Noodle community members.

  • Answer

Ask a New Question