Hadoop Installation & setup
Introduction to Big Data Hadoop, Understanding HDFS & Mapreduce
Introducing Big Data Hadoop, what do you mean by Big Data and where Hadoop comes under it, two important Hadoop ecosystem components are Map Reduce and HDFS, in-depth Hadoop Distributed File System - Replicas, Block Size, Secondary Name Node, High Availability, Depth Yarn - Resource Manager, Node Manager.
Hands-on Exercise – Working with HDFS, Repeating Data, Determining Block Size, Getting Nominal and Datanode
Deep Dive in Mapreduce
Deep understanding of the functioning of MapReduce, the process of mapping and subtraction, driver, combiners, partitioner, input format, output format, shuffle and tire work
Hands-on Exercise – Detailed method to write word count programs in MapReduce, write custom dividers, mapReduce with combiners, local job runner mode, unit test, toolroller, mapside join, reduce
Side Join, Using Counters, Joining two datasets using Map-Side Join &Reduce-Side Join
Introduction to Hive
Presenting Hadoop Hive, detals of design of Hive, contrasting Hive and Pig and RDBMS, working with Hive Query Language, making of database, table, Group by and different conditions, the different sorts of Hive tables, Hcatalog, storing the Hive Results, Hive partitioning and Buckets.
Hands-on Exercise – Creating of Hive database, how to drop database, changing the database, creating of Hive table, loading of data, dropping the table and altering it, writing hive queries to pull data using filter conditions, group by clauses, partitioning Hive tables
Advance Hive & Impala
Ordering in hive, join outline hive, working with complex information composes, hive client characterized work, presentation of impala, contrasting hive and impala, itemized engineering of impala
Hands on Exercise - working with hive questions, composing file, joining table, conveying outside table, arrangement table and gathering information in some other table.
Introduction to Pig
Apache pig introduction, its various features, different data types and schema functions available in the hive, pig, hive bag, tupals and field.
Hands on Exercise-Working in cushioning with mappers and surrounding modes, stacking information, disrupting information to 4 columns, giving information in information, collecting by channel, removing, crossing, splitting in the hole.
Flume, Sqoop & HBase
Introduction to Apache Squawop, Squaw Viewing, Basic Import and Export, How to Improve Squaw Performance, Scope Limit, Introduction to Flow and its Architecture, HBAS, CAP Theorem
Hands on Exercise - working with Flum to generate sequence number and consume it, to use Twitter data to access Twitter data, using AVRO, Hive Tables, AVRO with Pig, creating Table in HBase, deploying Disable, Scan and Enable Table.
Writing Spark Applications using Scala
The use of Scala to write Apache Spark applications, detailed study of Scala, the need for Scala, the concept of object-oriented programming, execution of scala code, gates, sets, constructors, abstract, extended objects, overriding methods such as scala in different classes , Java and scala interoperability, the concept of functional programming and unknown functions, books Comparison of Kit package, mutable and irreversible collection.
Hands on exercise- Understanding the strength of Scala for spark real-time analytics operations, writing spark applications using scala.
Comparison of HDFS with the importance of detailed Apache Spark, its various features, Hadop, Spark, Scalding, Scala Introduction, Scala and RDD, compared to various Spark components.
Hands on Exercise - Flexible distributed dataset in Spark and how it helps speed up large data processing.
RDD in Spark
The RDD operation in Spark, the Spark transformations, actions, data loading, comparing with MapReduce, Key Value Pair.
Hands-on Exercise – Using the file for RDD, using an in-memory dataset, how to deploy an RDD with HDFS, define the base RDD from the external file, deploy the RDD through the transformations, using the Map and Reduce functions, working on word count and count log severity.
Data Frames and Spark SQL
The detailed Spark SQL, the significance of SQL in Spark for working with structured data processing, Spark SQL JSON support, working with XML data, and parquet files, creating HiveContext, writing Data Frame to Hive, reading of JDBC files, the importance of Data Frames in Spark, creating Data Frames, schema manual inferring, working with CSV files, reading of JDBC tables, converting from data frame to JDBC, users defined functions in SPARC SQL, shared variables and accumulators, how to query data and data in data frames, how data frame provides benefits of both Spark RDD and Spark SQL, deploying Hive on Spark as the execution engine.
Hands-on Exercise – Data querying and transformation using Data Frames, finding out the benefits of Data Frames over Spark SQL and Spark RDD.
Machine Learning using Spark (Mlib)
The concept of repeater algorithms in Spark, analysis with spark graph processing, introduction of keins and machine learning, learning about shared variables such as broadcast variables, transmission variables, censors.
Hands on Exercise - Write spark code using MLB.
Introducing Spark Streaming, Architecture of SPARC Streaming, Working with Spark Streaming Program, Process Data Using Spark Streaming, Requesting Counting and Deistream, Multi-Batch and Sliding Window Operations and Working with Advanced Data Sources
Hands on exercise - Spark streaming deployment and output checks for data in motion is according to the requirement.
Hadoop Administration – Multi Node Cluster Setup using Amazon EC2
Create a four node Hadop Cluster Setup, run MapProd jobs on the headop cluster, run the working MPRADUS code working with Clodera Manager setup.
Hands-on Exercise – The method to build a multi-node Hadoop cluster using an Amazon EC2 instance, working with the Cloudera Manager.
Hadoop Administration – Cluster Configuration
Overview of the Hadop configuration, the significance of the headop configuration file, the configuration of various parameters and values, HDFS parameter and the mepreads parameter, establishing the Hadop environment, 'Exclude configuration files, administration and maintenance of name nodes, data node directory structures and Files, file system image and edit log
Hands on Exercise - MapReduce vs. Programs to Tuning Performance
Hadoop Administration – Maintenance, Monitoring and Troubleshooting
Introduction to the checkpoint procedure, name hub disappointment and recuperation process, protected mode, metadata and information reinforcement, to guarantee different potential issues and arrangements, to perceive what to see, how to include and expel hubs.
Hands on Exercise - How to go about ensuring the mepradus file system recovery for different different scenarios, how to use log and stack trail using the Job Scheduler, using the JMX monitor of the Hadoop cluster, the job scheduler, cluster, mapridges Finding Job Submission Flow, Receiving FIFO Schedule, Fair Scheduler and its Configuration
ETL Connectivity with Hadoop Ecosystem
How ETL tools work in the introduction of Big Data Industry, ETL and Data Warehousing. Working with major usage cases of Big Data in the ETL industry, till the end ETL is showing large data integration with POC ETL equipment.
Hands on Exercise - Connecting to HDFS with ETL equipment and transferring data from local system to HDFS, moving data from HD to DFS, working with HV with ETL tool, making map, reducing jobs in ETL equipment
Project Solution Discussion and Cloudera Certification Tips & Tricks
Tips to work towards the solution of the headship project solution, its problem statement and potential solution results, preparation for claudera certification, to focus on scoring the highest points, suggesting to break the questions of Hadop interview.
Hands on Exercise - Getting the right solution based on the real-world high value project Big Data Hadop app and the criteria set by the Intellipat team.
Following topics will be available only in self-paced Mode.
Hadoop Application Testing
Why Testing is Important, Unit Testing, Integration Testing, Performance Testing, Diagnostics, Knightley QA Test, Benchmark and End and Test, Functional Testing, Release Certification Tests, Security Testing, Scalability Testing, Commissioning and Decommissioning of Data Nodes Testing, Reliability testing, Release testing
Roles and Responsibilities of Hadoop Testing Professional
Understanding the requirement, preparation of ETL tests (HDFS, HIVE, HBASE) in every stage, completion of test, test estimation, test case, test data, test bed creation, test execution, fault reporting, defect disposal, daily status report delivery, Loading input (logs / files / records, etc.) using data scope, reconciliation, user authorization and authentication tests (group, user, etc.) using SQL / Flu Eshadikar etc.), but the development team or manager and bug reports to driving to stop them, to validate new features and issues to consolidate all defects and defect reports, core Hdop.
Framework called MR Unit for Testing of Map-Reduce Programs
Report to the development team or manager to report flaws and to stop them, consolidate all the defects and make bug reports, responsible for creating a test framework called MR unit for testing mapped-cum programs.
Computerization testing utilizing the OOZIE, Data approval utilizing the inquiry surge device.
Test anticipate HDFS redesign, Test mechanization and result
IBM Project Solution Discussion and Cloudera Certification Tips & Tricks
Suggestions for working on the solution of Hadop IBM Project Solutions, its problem statement and potential solution results, preparation of claudera certification, to focus on scoring the highest score, to break the question of Hadop Interview
Hands on Exercise - Getting the right solution based on the standards set by a real-world IBM project Big Data Hadoop app and the IBM team.