Data science is a team sport that involves specialists with complementary skills and aptitudes. Successful data science initiatives leverage high-performance team collaboration. Like the fictional sleuth and his partner, IBM’s customers in the data science community must have the right mix of
Quite often, we see that the need for data security and governance makes some organizations hesitant about migrating to the cloud. This is perfectly understandable given the types of data gathered and used by businesses today, the regulations they must adhere to on both a local and global level,
This white paper discusses the advantages of using the PySpark API, which enables the use of Python to interact with the Spark programming model. It starts with a basic description of Spark and then describes PySpark, its benefits, and when it is appropriate to use instead of "pandas" open source
In this white paper, discover how programmers and data scientists can use SparkR to transform R into a tool for big data analytics, taking advantage of parallel processing and near-linear scaling to tackle much larger challenges than would normally be possible with other methods.
Holden Karau is a software engineer at IBM, an active open source contributor and coauthor of Learning Spark (O'Reilly Media, February 2015) and the soon to be released High Performance Spark (O'Reilly Media, March 2017). In this podcast, Karau examines how to effectively search logs from Apache
Nick Pentreath is a principal engineer at IBM, a member of the Apache Spark project management committee (PMC) and author of Machine Learning with Spark (Packt Publishing, December 2014). In this podcast, Pentreath covers the basics of feature hashing and how to use it for all feature types in
Today’s businesses need a culture of collaboration that empowers knowledge workers to glean cognitive insights from data that help transform and modernize operations. See how cloud-based platforms and solutions enable data scientists and other experts to exploit artificial intelligence, machine
Emily Curtin is a software engineer at The Weather Company (now IBM) working on the data engineering platform team. Robbie Strickland is vice president, engines and pipelines, IBM Watson Data Platform, at IBM. In this podcast, they give a technical overview of how Parquet works and how recent
Now that we’re into the swing of 2017, the time is ripe for the first CrowdChat of 2017 to explore the goals, challenges and strategies that CDOs and CIOs are focused on for their organizations. Get involved and share your thoughts in this kick-off IMB Big Data CrowdChat.
Businesses have come to expect that smart rivals wielding digital technologies will disrupt their competitive landscapes. How ready is your organization to be a digital disruptor? Take a look at detailed criteria for assessing your organization’s readiness and the strategic steps you can take to
We might not all be nuclear physicists, but some of us are. Take Dr. David Farley, for example, who is a principal member of Sandia’s technical staff, with the Department of Energy. David, who works with some of the world’s brightest minds in the fields of nuclear energy and security, is a
If simplicity can fundamentally accelerate focused action, then you can significantly boost speed, productivity and effectiveness in your enterprise. Take a look at this overview of key announcements unveiled on the first day of IBM Insight at World of Watson 2016.
The combination of Jupyter Notebooks, Apache Hadoop and Apache Spark has become a killer app for data practitioners. It unlocks the ability to explore, visualize and experiment with both structured and unstructured data sets with great ease and efficiency. We spoke recently with Chris Snow at IBM