Hadoop and spark Development Course Content
Course Duration: 40 Hours
Course Details & Attend Live Sessions
Module 1– Linux prerequisites required for Hadoop
- Linux Basics
Module 2 – Introduction to Big data
- What is Big data?
- Sources of Big data
- Categories of Big data
- Characteristics of Big data
- Use-cases of Big data
- Traditional RDBMS vs Hadoop
Module 3 –Introduction to Hadoop
- What is Hadoop?
- History of Hadoop
- Understanding Hadoop Architecture
- Fundamental of HDFS (Blocks, Name Node, Data Node, Secondary Name Node)
- Block Placement &Rack Awareness
- HDFS Read/Write
- Under/Over Replication
- Types of Scaling(Horizontal/Vertical)
- Drawback with 1.X Hadoop
- Introduction to 2.X Hadoop
- High Availability
Module 4 – HDFS
- Understanding Hadoop configuration files
- Hadoop Components- HDFS, MapReduce
- Overview Of Hadoop Processes
- Overview Of Hadoop Distributed File System
- The building blocks of Hadoop
- Hands-On Exercise: Using HDFS commands
Module 5 – Map Reduce 1(MRv1)
- Map Reduce Introduction
- How Map Reduce works?
- Communication between JobTracker and TaskTracker
- Anatomy of a Map Reduce Job Submission
Module 6 – MapReduce-2(YARN)
- Limitations of Current Architecture
- YARN Architecture
- Node Manager & Resource Manager
Module 7 –Hive
- Introduction to Apache Hive
- Architecture of Hive
- Hive data types
- Exploring hive meta store tables
- Types of Tables in Hive
- Partitions (Static & Dynamic)
- Buckets & Sampling
- Indexes& Views
- Developing hive scripts
- Parameter Substitution
- Difference between order& sort by, Cluster& distribute by
- Different compressions in HIVE
- File Input formats (Text file, RC, ORC, Sequence, Parquet)
- Optimization Techniques in HIVE
- Creating UDFs
- Hands-On Exercise
- Assignment on HIVE
Module 8 – Sqoop
- Introduction to SQOOP& Architecture
- Import data from RDBMS to HDFS
- Importing Data from RDBMS to HIVE
- Exporting data from HIVE to RDBMS
- Handling incremental loads using sqoop
- Hands on exercise
Module 9 – Hbase
- Introduction to HBASE
- Exploring HBASE Master & Region server
- Create table
- List table
- Disabling table
- Enabling table
- Dropping table
- Hands on exercise on HBASE
Module 10-Scala Basics
- Introduction to Functional Programming
- Interactive Shell – REPL, Data types, Variables, Expressions, Conditional statements, Loops – For comprehension
- Pattern Matching in Scala with Match expression
- Simple Functions and their variants, Tail Recursion, Functions as Objects aka Anonymous functions, Higher Order Functions
- Scala Collections and the usage of higher order methods on Collections
- Classes and Objects, Class Constructors in Scala, Case classes, Abstract and Generic Class
- Exception Handling in Scala
- Traits in Scala, Properties of Traits
- Magic Apply method
- Singleton and Companion objects
- Implicits in Scala – Implicit parameters, def, classes
Module 11-Getting started with Spark
- What is Apache Spark & Why Spark?
- Spark History
- Unification in Spark
- Spark ecosystem Vs Hadoop
- Spark with Hadoop
- Introduction to Spark’s Python and Scala Shells
- Spark Standalone Cluster Architecture and its application flow
Module 12– Programming with RDDS
- RDD Basics and its characteristics, Creating RDDs
- RDD Operations
- Transformations
- Actions
- RDD Types
- Lazy Evaluation
- Persistence (Caching)
- Module-Advanced spark programming
- Accumulators and Fault Tolerance
- Broadcast Variables
- Custom Partitioning
Module 13-Loading and saving your data
- Dealing with different file formats (Text, CSV, JSON files etc.)
- Hadoop Input and Output Formats
- Connecting to diverse Data Sources (HDFS, Hive, S3, RDBMS and NoSQL etc.)
- Module-Spark SQL
- Linking with Spark SQL
- Initializing Spark SQL
- Data Frames &Caching
- Case Classes, Inferred Schema
- Loading and Saving Data
- Apache Hive
- Data Sources/Parquet
- JSON
- JDBC/ODBC Server
- Spark SQL User Defined Functions (UDFs)
- Hive UDFs
Module 14-KAFKA
- Kafka introduction
- Kafka architecture
- Kafka fundamentals
- Kafka basics operations
Module 15 – Real Time Concepts
- 1 Project
- Roles and Responsibilities
- Real time interview questions and answers