IBM has announced a major commitment to Apache Spark, potentially the most important new open source project in a decade that is being defined by data.

At the core of this commitment, IBM plans to embed Spark into its industry-leading analytics and commerce platforms, and to offer Spark as a service on IBM Cloud.

IBM will also put more than 3 500 IBM researchers and developers to work on Spark-related projects at more than a dozen labs worldwide; donate its IBM SystemML machine learning technology to the Spark open source ecosystem; and educate more than one-million data scientists and data engineers on Spark.

As data and analytics are embedded into the fabric of business and society –from popular apps to the Internet of Things (IoT) –Spark brings essential advances to large-scale data processing. First, it dramatically improves the performance of data dependent apps. Second, it radically simplifies the process of developing intelligent apps, which are fuelled by data.

To further accelerate open source innovation for the Spark ecosystem, IBM is taking the following actions:

* IBM will build Spark into the core of the company’s analytics and commerce platforms.
* IBM’s Watson Health Cloud will leverage Spark as a key underpinning for its insight platform, helping to deliver faster time to value for medical providers and researchers as they access new analytics around population health data.
* IBM will open source its breakthrough IBM SystemML machine learning technology and collaborate with Databricks to advance Spark’s machine learning capabilities.
* IBM will offer Spark as a Cloud service on IBM Bluemix to make it possible for app developers to quickly load data, model it, and derive the predictive artefact to use in their app.
* IBM will commit more than 3 500 researchers and developers to work on Spark-related projects at more than a dozen labs worldwide, and open a Spark Technology Centre in San Francisco for the Data Science and Developer community to foster design-led innovation in intelligent applications.
* IBM will educate more than 1-million data scientists and data engineers on Spark through extensive partnerships with AMPLab, DataCamp, MetiStream, Galvanize and Big Data University MOOC.

“IBM has been a decades-long leader in open source innovation. We believe strongly in the power of open source as the basis to build value for clients, and are fully committed to Spark as a foundational technology platform for accelerating innovation and driving analytics across every business in a fundamental way,” says Beth Smith, GM: Analytics Platform, IBM Analytics.

“Our clients will benefit as we help them embrace Spark to advance their own data strategies to drive business transformation and competitive differentiation.”

Spark has grown quickly in popularity among developers and data scientists as an essential platform for helping organizations more easily integrate Big Data into applications, and is quickly gaining momentum with IBM clients looking to transform business decision-making.

IBM is one of four founding members of the UC Berkeley AMPLab, where Spark was first invented in 2009, and as a result participates in multi-day research retreats, provides advice and real-world insight, and interacts closely with
AMPLab researchers on projects of mutual interest. “As a sponsor of the AMPLab, IBM contributes to the greater Spark community and provides guidance for the continued evolution and improvement of the Berkeley Data Analytics Stack, the open source platform of which Spark is a key component,” says Professor Michael Franklin, director of the UC Berkeley AMPLab.

Spark is agile, fast and easy to use. And because it is open source, it is improved continuously by a worldwide community. Over the course of the next few months, IBM scientists and engineers will work with the Apache Spark open community to rapidly accelerate access to advanced machine learning capabilities and help drive speed-to-innovation in the development of smart business apps. By contributing SystemML, IBM will help data scientists iterate faster to address the changing needs of business and to enable a growing ecosystem of app developers to apply deep intelligence into everything.