Programming Hive |
Hive, with its Hive Query Language (HiveQL) has SQL like structure for accessing data in Hadoop.
This book is simple & easy to learn. Each topic is explained separately and extensively with examples. Hive programming makes it easier for developers to port SQL based applications to Hadoop.
This book covers the :
--> Overview of Hadoop & MapReduce.
-->The basic difference between Hive and other programming languages such as Pig programming language,HBase etc.
--> How to install hive & how to configure it with Hadoop.
--> How to start hive.
--> What are the commands of Hive and how to run these commands in the system.
Through this post I’m reviewing the most comprehensive and detailed book available on this topic today, that is, guide to using Hive:“Programming Hive” by Capriolo, Wampler and Rutherglen.
- Title: Programming Hive
- Authors: Capriolo, Wampler & Rutherglen
- Publisher: O’Reilly Media
- Edition: 1st edition
- Publication date: October 2012
- Hive versions: Up to version 0.9.0
The authors clearly have a lot of real-world experience working with Hive. Edward Capriolo is a committer on the Hive project. The other authors, Dean Wampler and Jason Rutherglen, both work for Think Big Analytics where they have supported numerous big data projects.
The authors also provide an introduction to several of the other key software products in the Hadoop ecosystem. This gives the reader the ability to determine if Hive is the best tool for a given work.
The book shines in its treatment of advanced topics such as :
--> View creation table design and its relationship to physical storage options.
--> Working with the Hadoop streaming environment.
--> Setting up the Hive web interface using the Hive Thrift service for remote access to Hive from other processes including JDBC and ODBC.
--> Integration with Amazon Web Services.
--> Use of HCatalog to make Hive metadata available to users.
HCatlog is a table and storage management layer that enables the users with different processing tools such as MapReduce and Pig, to read and write the data on the grid. HCatlog represents that relational view of the Hadoop Distributed File System. Users dont need to worry where and in what format their data are stored such as RCFile format, text files , sequence files and ORC files.
HCatlog allows the users to reading and writing the files in any format in which a SerDe(Serialixation-Deseialization) can be written.
By default, HCatlog supports RCFile, CSV file, JSON, sequence files. To use custom format, you need to define InputFormat, OutputFormat and SerDe Format.
HCatlog uses Hive Command Line Interface(CLI) to issuing data definition and metadata exploration commands.
HCatalog |
ConversionConversion EmoticonEmoticon