Call:(+91) 8218653603

 (+91) 8218653603

  • Sign In
  • |
  • Sign Up
Apache Spark Scala Training Courses in Delhi NCR | Yami Cosmo

Big Data Hadoop Spark Storm Scala

Join With Our Courses To Develop Yourself.


Courses Overview

This is a Combo Course at Yami Cosmo Services, Green Park, Delhi is created to give you an edge in the Big Data Hadoop. You will be trained in the Harapp Architectures, Component Component Mepreads, HDFS, HBS and others. Efficiency in Apache Hurricane, Apache Spark, and Scala Programming Language. It is an all-in-one course designed to give a 360-degree overview of Hadoop Architecture using the real-time projects along with the real-time processing of unbound data streams using Apache Storm and creating applications in Spark with Scala programming

Hadoop Installation & setup

Introduction to Big Data Hadoop, Understanding HDFS & Mapreduce

Introducing Big Data Hadoop, what do you mean by Big Data and where Hadoop comes under it, two important Hadoop ecosystem components are Map Reduce and HDFS, in-depth Hadoop Distributed File System - Replicas, Block Size, Secondary Name Node, High Availability, Depth Yarn - Resource Manager, Node Manager.

Hands-on Exercise – Working with HDFS, Repeating Data, Determining Block Size, Getting Nominal and Datanode

Deep Dive in Mapreduce

Deep understanding of the functioning of MapReduce, the process of mapping and subtraction, driver, combiners, partitioner, input format, output format, shuffle and tire work

Hands-on Exercise – Detailed method to write word count programs in MapReduce, write custom dividers, mapReduce with combiners, local job runner mode, unit test, toolroller, mapside join, reduce

Side Join, Using Counters, Joining two datasets using Map-Side Join &Reduce-Side Join

Introduction to Hive

Presenting Hadoop Hive, detals of design of Hive, contrasting Hive and Pig and RDBMS, working with Hive Query Language, making of database, table, Group by and different conditions, the different sorts of Hive tables, Hcatalog, storing the Hive Results, Hive partitioning and Buckets.

Hands-on Exercise – Creating of Hive database, how to drop database, changing the database, creating of Hive table, loading of data, dropping the table and altering it, writing hive queries to pull data using filter conditions, group by clauses, partitioning Hive tables

Advance Hive & Impala

Ordering in hive, join outline hive, working with complex information composes, hive client characterized work, presentation of impala, contrasting hive and impala, itemized engineering of impala

Hands on Exercise - working with hive questions, composing file, joining table, conveying outside table, arrangement table and gathering information in some other table.

Introduction to Pig

Apache pig introduction, its various features, different data types and schema functions available in the hive, pig, hive bag, tupals and field.

Hands on Exercise-Working in cushioning with mappers and surrounding modes, stacking information, disrupting information to 4 columns, giving information in information, collecting by channel, removing, crossing, splitting in the hole.

Flume, Sqoop & HBase

Introduction to Apache Squawop, Squaw Viewing, Basic Import and Export, How to Improve Squaw Performance, Scope Limit, Introduction to Flow and its Architecture, HBAS, CAP Theorem


Hands on Exercise - working with Flum to generate sequence number and consume it, to use Twitter data to access Twitter data, using AVRO, Hive Tables, AVRO with Pig, creating Table in HBase, deploying Disable, Scan and Enable Table.

Writing Spark Applications using Scala

The use of Scala to write Apache Spark applications, detailed study of Scala, the need for Scala, the concept of object-oriented programming, execution of scala code, gates, sets, constructors, abstract, extended objects, overriding methods such as scala in different classes , Java and scala interoperability, the concept of functional programming and unknown functions, books Comparison of Kit package, mutable and irreversible collection.


Hands on exercise- Understanding the strength of Scala for spark real-time analytics operations, writing spark applications using scala.

Spark framework

Comparison of HDFS with the importance of detailed Apache Spark, its various features, Hadop, Spark, Scalding, Scala Introduction, Scala and RDD, compared to various Spark components.


Hands on Exercise - Flexible distributed dataset in Spark and how it helps speed up large data processing.

RDD in Spark

The RDD operation in Spark, the Spark transformations, actions, data loading, comparing with MapReduce, Key Value Pair.

Hands-on Exercise – Using the file for RDD, using an in-memory dataset, how to deploy an RDD with HDFS, define the base RDD from the external file, deploy the RDD through the transformations, using the Map and Reduce functions, working on word count and count log severity.

Data Frames and Spark SQL

The detailed Spark SQL, the significance of SQL in Spark for working with structured data processing, Spark SQL JSON support, working with XML data, and parquet files, creating HiveContext, writing Data Frame to Hive, reading of JDBC files, the importance of Data Frames in Spark, creating Data Frames, schema manual inferring, working with CSV files, reading of JDBC tables, converting from data frame to JDBC, users defined functions in SPARC SQL, shared variables and accumulators, how to query data and data in data frames, how data frame provides benefits of both Spark RDD and Spark SQL, deploying Hive on Spark as the execution engine.

Hands-on Exercise – Data querying and transformation using Data Frames, finding out the benefits of Data Frames over Spark SQL and Spark RDD.

Machine Learning using Spark (Mlib)

The concept of repeater algorithms in Spark, analysis with spark graph processing, introduction of keins and machine learning, learning about shared variables such as broadcast variables, transmission variables, censors.

Hands on Exercise - Write spark code using MLB.

Spark Streaming

Introducing Spark Streaming, Architecture of SPARC Streaming, Working with Spark Streaming Program, Process Data Using Spark Streaming, Requesting Counting and Deistream, Multi-Batch and Sliding Window Operations and Working with Advanced Data Sources


Hands on exercise - Spark streaming deployment and output checks for data in motion is according to the requirement.

Hadoop Administration – Multi Node Cluster Setup using Amazon EC2

Create a four node Hadop Cluster Setup, run MapProd jobs on the headop cluster, run the working MPRADUS code working with Clodera Manager setup.

Hands-on Exercise – The method to build a multi-node Hadoop cluster using an Amazon EC2 instance, working with the Cloudera Manager.

Hadoop Administration – Cluster Configuration

Overview of the Hadop configuration, the significance of the headop configuration file, the configuration of various parameters and values, HDFS parameter and the mepreads parameter, establishing the Hadop environment, 'Exclude configuration files, administration and maintenance of name nodes, data node directory structures and Files, file system image and edit log


Hands on Exercise - MapReduce vs. Programs to Tuning Performance

Hadoop Administration – Maintenance, Monitoring and Troubleshooting

Introduction to the checkpoint procedure, name hub disappointment and recuperation process, protected mode, metadata and information reinforcement, to guarantee different potential issues and arrangements, to perceive what to see, how to include and expel hubs.


Hands on Exercise - How to go about ensuring the mepradus file system recovery for different different scenarios, how to use log and stack trail using the Job Scheduler, using the JMX monitor of the Hadoop cluster, the job scheduler, cluster, mapridges Finding Job Submission Flow, Receiving FIFO Schedule, Fair Scheduler and its Configuration

ETL Connectivity with Hadoop Ecosystem

How ETL tools work in the introduction of Big Data Industry, ETL and Data Warehousing. Working with major usage cases of Big Data in the ETL industry, till the end ETL is showing large data integration with POC ETL equipment.


Hands on Exercise - Connecting to HDFS with ETL equipment and transferring data from local system to HDFS, moving data from HD to DFS, working with HV with ETL tool, making map, reducing jobs in ETL equipment

Project Solution Discussion and Cloudera Certification Tips & Tricks

Tips to work towards the solution of the headship project solution, its problem statement and potential solution results, preparation for claudera certification, to focus on scoring the highest points, suggesting to break the questions of Hadop interview.


Hands on Exercise - Getting the right solution based on the real-world high value project Big Data Hadop app and the criteria set by the Intellipat team.

Following topics will be available only in self-paced Mode.

Hadoop Application Testing

Why Testing is Important, Unit Testing, Integration Testing, Performance Testing, Diagnostics, Knightley QA Test, Benchmark and End and Test, Functional Testing, Release Certification Tests, Security Testing, Scalability Testing, Commissioning and Decommissioning of Data Nodes Testing, Reliability testing, Release testing

Roles and Responsibilities of Hadoop Testing Professional

Understanding the requirement, preparation of ETL tests (HDFS, HIVE, HBASE) in every stage, completion of test, test estimation, test case, test data, test bed creation, test execution, fault reporting, defect disposal, daily status report delivery, Loading input (logs / files / records, etc.) using data scope, reconciliation, user authorization and authentication tests (group, user, etc.) using SQL / Flu Eshadikar etc.), but the development team or manager and bug reports to driving to stop them, to validate new features and issues to consolidate all defects and defect reports, core Hdop.

Framework called MR Unit for Testing of Map-Reduce Programs

Report to the development team or manager to report flaws and to stop them, consolidate all the defects and make bug reports, responsible for creating a test framework called MR unit for testing mapped-cum programs.

Unit Testing

Computerization testing utilizing the OOZIE, Data approval utilizing the inquiry surge device.

Test Execution

Test anticipate HDFS redesign, Test mechanization and result


IBM Project Solution Discussion and Cloudera Certification Tips & Tricks

Suggestions for working on the solution of Hadop IBM Project Solutions, its problem statement and potential solution results, preparation of claudera certification, to focus on scoring the highest score, to break the question of Hadop Interview


Hands on Exercise - Getting the right solution based on the standards set by a real-world IBM project Big Data Hadoop app and the IBM team.


  • Duration: 40 Days
  • Services

    Technical Support Project, Consultancy Monitoring and Control Smart Metering Data Logging, Dedicated Graphical Interface

    Corporate Training, Industrial Training, Campus Training, Classroom Training, Bootcamp Training, Online Training

    Data Science, Machine Learning, Robotics, Business Intelligance, Finance Controlling, Water Treatment and Power Plants

    Domestic Tech. / Non Tech. and International - Tech. only


    ISO 9001-1015 Yami Cosmo Services Pvt. Ltd Copyright© 2017. TeghDeveloperTechnlogies All right reserved.