All you need to know about Big Data, and learn Hadoop, HDFS, MapReduce, Hive & Pig by designing Data Pipeline.
The main objective of this course is to help you understand the complex architectures of Hadoop and its components, guide you in the right direction to start and start working quickly with Hadoop and its components.
It covers everything you need as a Big Data Beginner. Learn more about Big Data Market, different job roles, technology trends, Hadoop history, HDFS, Hadoop Ecosystem, Hive and Pig. In this course, we will see how a beginner should start with Hadoop. This course is accompanied by a number of practical examples that will help you learn Hadoop quickly.
The course consists of 5 sections and focuses on the following topics:
Great data at a glance: Discover Big Data and the various roles required in the Big Data market. Know the major trends in data wages around the world. Discover the most popular technologies and their trends in the market.
Start with Hadoop: Understand Hadoop and its complex architecture. Learn Hadoop Ecosystem with simple examples. Know different versions of Hadoop (Hadoop 1.x against Hadoop 2.x), different Hadoop Vendors on the market and Hadoop on Cloud. Understand how Hadoop uses the ELT approach. Learn how to install Hadoop on your machine. We will see execution of HDFS commands from the command line to handle HDFS.
First steps with Hive: Understand what kind of problem Hive solves in Big Data. Learn about its architectural design and how it works. Know the data patterns in the hive, the different file formats supported by the hive, Hive queries etc. We will see the queries running in Hive.
Getting Started with Pig: Understanding how Pig solves problems in Big Data. Learn about its architectural design and how it works. Understand how Pig Latin works in Pig. You will understand the differences between SQL and Pig Latin. Demos on running different queries in Pig.
Use cases: The actual applications of Hadoop are really important to better understand Hadoop and its components, so we will learn by designing an example of data pipeline in Hadoop to process large data. In addition, understand how companies adopt the modern data architecture, namely Data Lake in their data infrastructure.