This course will teach you to leverage Pig and Hive for big data to prepare & analyze large data sets on Hadoop to make more informed and timely business decisions. You will learn to increase productivity by avoiding low-level Java coding characteristic of MapReduce, and rapidly begin extracting business value for competitive advantage. In this Pig & Hive for Big Data training course, you will learn to gain access to previously inaccessible data, gather and feed data into Hadoop for storage, transform and filter data using Pig, and extract value using Hive and Spark SQL.
TRAINING AT YOUR SITE
Our FlexVouchers help you lock in your training budgets without having to commit to a traditional 1 voucher = 1 course classroom-only attendance. FlexVouchers expand your purchasing power to modern blended solutions and services that are completely customizable. For details, please call 888-843-8733 or chat live.
Mar 24 - 27 9:00 AM - 4:30 PM EDT Herndon, VA / Online (AnyWare) Reserve Your Seat
May 5 - 8 9:00 AM - 4:30 PM EDT Greenbelt,MD / Online (AnyWare) Reserve Your Seat
Jun 16 - 19 9:00 AM - 4:30 PM EDT New York / Online (AnyWare) Reserve Your Seat
Sep 1 - 4 9:00 AM - 4:30 PM EDT Greenbelt,MD / Online (AnyWare) Reserve Your Seat
Sep 22 - 25 9:00 AM - 4:30 PM EDT Herndon, VA / Online (AnyWare) Reserve Your Seat
Nov 10 - 13 9:00 AM - 4:30 PM EST Greenbelt,MD / Online (AnyWare) Reserve Your Seat
Dec 15 - 18 9:00 AM - 4:30 PM EST New York / Online (AnyWare) Reserve Your Seat
Guaranteed to RunWhen you see the "Guaranteed to Run" icon next to a course event, you can rest assured that your course event — date, time, location — will run. Guaranteed.
Storing data in HDFS
Parallel processing with MapReduce
Automating data transfer
Describing characteristics of Apache Pig
Structuring unstructured data
Transforming data with Relational Operators
Filtering data with Pig
Leveraging business advantages of Hive
Organizing data in Hive Data Warehouse
Designing data layout for maximum performance
Performing joins on unstructured data
Pushing HiveQL to the limit
Deploying Hive in production
Streamlining storage management with HCatalog
Hadoop programming at the low level is done in Java. Pig and Hive provide ease of programming by allowing the programmer to write scripts in a simpler language, Pig Latin or HiveQL. Those scripts are compiled and optimized internally and equivalent Java code generated and executed without the programmer having to write the Java code.
Apache Pig is a platform for analyzing large data sets. Programs are written in a high-level, Pig Latin. They are converted by Pig's infrastructure into sequences of Java MapReduce programs which are then executed on Hadoop. Without writing Java one can use Pig to leverage Hadoop's ability to process data in parallel.
Apache Hive is data warehouse software that translates commands written in a SQL-like language, HiveQL, into Hadoop MapReduce jobs that are then executed on Hadoop. Without writing Java one can use Hive to leverage Hadoop's ability to process data in parallel.
Pig is typically used early in the data pipeline to clean and structure data. Hive is typically used later when there is structure and well-defined fields. Since Hive has the concepts of tables, rows and columns it integrates easily with BI tools.
Yes! We know your busy work schedule may prevent you from getting to one of our classrooms which is why we offer convenient online training to meet your needs wherever you want, including online training.