Apache Spark is arguably surpassing Apache Hadoop as the preferred big data analytics development platform. Yet, the expected specialized algorithm and model libraries that emerge from the Spark community raise the specter of platform bloat that may perhaps put Spark at risk of becoming too bloated
Apache Spark is unfamiliar to many data analytics professionals. A recent post provides high-level guidance on how they might begin to identify the applications for which Spark is well suited. This post expands on that discussion to offer further details for triggering the creative imaginations of
Separating good data from bad and taking advantage of the open source ecosystem offer key advantages for quality analytics and keen insight from valuable data. And two upcoming events offer great opportunities to learn more.
Get in on the widespread excitement over Apache Spark. Check out the highlights from a recent SparkInsight CrowdChat that tackled six key questions about this next-generation, cluster-computing, runtime processing environment and development framework for in-memory processing of advanced analytics.
An increasing number of use cases for big data and analytics can be Apache Spark's sweet spots. Take a look at several low-latency applications in which Spark is well-suited for analysis of cached, live data.
The drive toward industry openness continues at full speed, and Apache Spark is expected to become one of the centerpieces of the big data industry fabric. As a closely aligned technology with Apache Hadoop, it stands to benefit from broad adoption of core open data platform technologies.
Apache Hadoop has been around for a decade, but what is it exactly? Get a quick primer on Hadoop’s four key modules and how they enable this open source framework to handle storage for massive volumes of big data that can used for advanced analytics.
Poised for widespread commercial adoption, Apache Spark is drawing a lot of attention with its ability to perform advanced in-memory analysis of cached, unstructured data in an open source distributed-computing framework.
Skills needed for advanced analytics sometimes require knowledge of recent data-related technologies, analytics techniques and how to use them. However, many organizations realize that this skill set doesn’t negate the need for people who can frame a problem, interpret the output of an analysis and
Big data is not restricted to the domain of business users. Discover how a group of women attending a Saudi Arabia university articulated big data concepts into innovative solutions for helping business users uncover insights.
The data warehouse has never been more relevant than it is now. The DW’s role in the big data universe appears likely to grow. What the DW does, above all else (and this is far from its only role in many organizations) is serve as hub for governing your system-of-record data to be delivered into
The time has come for the analytics industry to work together to harden the Hadoop platform with a shared vision and foundation to build on to move forward. Open standards that are common to all solution providers are a path in the right direction. Enter the Open Data Platform (ODP) group of Hadoop