Hadoop Use in Industries







Data generation has witnessed an exponential growth in recent times. It has become increasingly difficult for enterprises to gather large chunks of data, and they have found the existing centralized architecture solutions to be inadequate for processing the data. The centralized environment was no more suitable due to time constraints and issues related to efficiency, performance and increased infrastructural costs.


With the advent of distributed architecture, organizations can now process and extract relevant information from huge databases. One of the hugely successful open source framework in this regard is Apache Hadoop.



Apache Hadoop is a Java based programming software used for processing large data sets in distributed computing environment. It has its own file system, known as the HDFS, or Hadoop Distributed File System, and uses the MapReduce algorithm. Hadoop is by far the most popular framework, with companies like Google, Yahoo, IBM, making use of the software for applications such as search engine, advertising, information gathering and processing.


Hadoop in industry

 


At Facebook, Hadoop has conventionally been used with Hive for storage and analysis of huge data sets. The main focus of the analysis is to maximize the throughput as well as the efficiency of the system. The analysis occurs in batch jobs in the off line mode, and the read and write processes from disk for large amounts of data is done by workloads sequentially. 


YELP is a multinational corporation in sun Francisco, California that operates an "Online Urban Guide" and business review site. It is initially depended on RAID's(Redundant Array of Independent Disks) for storing their logs, along with a single local instance of Hadoop. 

Yelp replaced the RAIDs with Amazon Simple Storage Service ( Amazon S3) and transferred all Hadoop jobs to Amazon Elastic MapReduce. 


Yelp uses Amazon S3 for storing daily logs and photos (which creates around 100 GB logs per day) and Amazon Elastic MapReduce to power around 20 separate batch scripts, mostly processing the log and may other companies uses hadoop such as Yahoo, Cloudera, etc.  
 
yelp


Previous
Next Post »

Popular Posts