Overview on Big Data

We live in the data age. It's not easy to measure the total volume of data stored electronically , but the International Data Corporation(IDC) estimate put the size of "digital universe" at 0.18 zettaBytes in 2006 and growth in 2011 has increased by 1.8 zettabytes. The volume of data which is being made publicly available increases every year, too. Organizations no longer have to manage their own data.

1 zettabytes= 1000 exabytes= 1 million petabytes=1 billion terabytes

The trend is for every individual's data footprint to grow, but perhaps more important is that the amount of data generated by machines will be even greater than that generated by people.

Initiatives such as Public Data Sets on Amazon Web Services(http://aws.amazon.com/publicdataset/), InfoChimps.org (http://infochimps.org/) , are places where the data can be freely shared by anyone are available to download and analyse.
It is said that "more data usually beats better algorithms".


Big Data Contains
Big Data


Big data is a popular term used to describe the exponential growth and availability of data, both structured and unstructured.
Big data may be as important to business and society as the Internet has become, because more data may lead to more accurate analysis. It has the potential to help companies improve operations and make faster, more intelligent decisions.



bar-graph-of-big-data-growth.jpg

Is Big Data a Volume or a Technology ? 

The term reference to  the volume of data  isn't always the case. The term big data, especially when used by vendors, may refer to the technology (which includes tools and processes) which an organization requires to handle large amounts of data and storage facilities. The term big data is believed to have originated with web search companies who needed to query very large distributed aggregations of loosely-structured data.


Example of Big Data :


Big data might be petabytes (1,024 terabytes) or exabytes (1,024 petabytes) of data consisting of billions to trillions of records of millions of people—all from different sources (e.g. Web, sales, customer contact center, social media, mobile data and so on). The data is typically loosely structured data that is often incomplete and inaccessible.


Different souces of big data
Sources of Big Data


Various distributed file system allow to combine data from multiple sources, but doing this correctly is very challenging. Map-reduce provides a programming model that removes the problem from disk read and writes, transforming it into a computation over sets of keys and values.

Hadoop provides a reliable shared storage and analysis system. The storage is provided by the Hadoop Distributed File System(HDFS) and analysis by MapReduce. MapReduce is a batch query processor, and has the ability to run an ad-hoc query against whole data-sets and gets the result in reasonable time is trans-formative.

Hadoop_technology-of-feature.jpg

Hadoop


Previous
Next Post »

Popular Posts