This white paper discusses the advantages of using the PySpark API, which enables the use of Python to interact with the Spark programming model. It starts with a basic description of Spark and then describes PySpark, its benefits, and when it is appropriate to use instead of "pandas" open source
As a business technology professional, you need to manage your company’s information resources 24x7 while juggling concurrent projects and staying up to speed on changes in the technology and in your chosen field. You’re stretched thin but continue to seek out professional learning opportunities
Fundamentally, machine learning is a productivity tool for data scientists. As the heart of systems that can learn from data, machine learning allows data scientists to train a model on an example data set and then leverage algorithms that automatically generalize and learn both from that example
Jeff Josten is IBM Distinguished Engineer for DB2 for z/OS Development, IBM Analytics, Platform Development. In this podcast, he discusses how the value of machine learning in enterprise applications of hybrid transaction/analytics processing. He will be speaking on this topic on February 15, 2017
J White Bear is a data scientist and software engineer at IBM. In this podcast, White Bear discusses simultaneous localization and mapping, an ongoing research area in robotics for autonomous vehicles and well-recognized as a nontrivial problem space in both industry and research.
Seth Dobrin is vice president and CDO, IBM Analytics, platform development, at IBM. In this podcast, Dobrin shares experiences using Apache Spark for data science transformation and some thoughts on a larger vision for data science transformation at scale.
Steven Astorino is Vice President, Development, IBM Private Cloud Analytics Platform. In this podcast, he discusses how machine learning is driving the evolution of data science in strategic business initiatives.
In this white paper, discover how programmers and data scientists can use SparkR to transform R into a tool for big data analytics, taking advantage of parallel processing and near-linear scaling to tackle much larger challenges than would normally be possible with other methods.
Holden Karau is a software engineer at IBM, an active open source contributor and coauthor of Learning Spark (O'Reilly Media, February 2015) and the soon to be released High Performance Spark (O'Reilly Media, March 2017). In this podcast, Karau examines how to effectively search logs from Apache