Hadoop Distributed File System(HDFS)

When a data set outgrows the storage capacity of single machine, it becomes necessary to partition it across a number of separated machines. File Systems that manage the data system online is referred as Distributed File System.

Hadoop Comes with a file system called  Hadoop Distributed File System. Hadoop is a file system which is used for storing very large data files with streaming data access patterns, running on clusters of commodity hardware.


Very Large Files mean files that are of hundreds of megabytes, gigabytes or terabytes in size.
 

Streaming Data access - HDFS is built around the idea - Write once, read as many times as you can.

Commodity Hardware means Hadoop does not require expensive, highly reliable hardware. Its design runs on clusters of commodity hardware, for which the chance of node failure across the cluster is very high, at least for large clusters.


What is Block ?

 A disk has a block size, which is minimum amount of data that it can read or write. File System for a single

Popular Posts