Big Data in Practice using Spark
If fast prototyping and processing speed are a priority in your Big Data environment, Spark will most likely be the platform of your choice. Apache Spark is an open source processing engine focusing on low latency, ease of use, and analytics. It's an alternative to the slower MapReduce approach delivered by Hadoop.
This course builds on the foundations laid in the course 'Big Data Concepts'.
In this course you will get hands-on practice on Linux with Spark and its libraries for machine learning and visualisation. You will also learn how to implement robust data processing in Scala with an SQL-style interface, and with the other APIs for Java and Python.
After successful completion of this course, you will have gained sufficient expertise to set up a big data environment, to import data into it, and to interrogate it using Spark. You will also be able to write simple Scala and SparkSQL programs that use the Mllib and GraphX libraries.
This course is also available for exclusive, one-company presentations.
What you will learn
On successful completion of this course you will be able to:
- explain the concepts of Apache Spark and its components
- set up a Big Data environment
- implement data processing in Scala using an SQL-style interface
- implement data processing with other APIs for Java and Python
- write and debug programs for data analytic problems.
Who Should Attend
Whoever wants to start practising "big data": developers, data architects, and anyone who needs to work with big data technology.
Prerequisites
Familiarity with the concepts of data stores and more specifically of "Big Data" is necessary; see our course Big Data Concepts. Additionally, somel knowledge of SQL and UNIX is useful. Experience with at least one programming language (Java, PHP, Python, Scala, C++ or C#) is a must.
Duration
2 days
Fee (per attendee)
£1470 (ex VAT)
This includes free online 24/7 access to course notes.
Hard copy course notes are available on request from rsmshop@rsm.co.uk
at £50.00 plus carriage per set.
Course Code
BDSA
Contents
Motivation for Spark & Base Concepts
The Apache Spark project and its components; Getting to learn the Spark architecture and programming model.
Data Sources
Learn how to access data residing in Hadoop HDFS, Cassandra, or Hbase.
Interfaces
Working with the several programming interfaces and the web interface; Writing and debugging programs for simple data analytic problems.
introduction to Hadoop HDFS, HBase, and Cassandra
Hadoop HDFS; Hbase; Cassandra.