Some of the most important technologies and terms
you will meet if you are looking at getting into Big Data. To more clarity we
should require to know specific terms and technologies that we use in IT
industries.
Cloud Computing
Cloud computing or computing in the cloud means that
software applications are running or data is processing on remote servers,
rather than locally. Cloud computing is the delivery of an on-demand computing
resources-everything from application to the data centres. We can deploy applications on the cloud based
in different ways: Public, Private and Hybrid.
To
know more visit: http://en.wikipedia.org/wiki/Cloud_computing
Distributed File System
Distributed File System is a file system that will
hold data in distributed fashion. When you store files in DFS, data will split
into fragments and each fragment will store in separate machine.
To
know more visit:
NoSQL
NoSQL is the next generation database mostly
addresses some of the points: Horizontal Scalability, open-source,
non-relational database, flexible schema, distributed, automatic sharding, easy
replication support. NoSQL databases provides document, graph, key-value pair,
and wide-column stores.
To
know more visit: http://nosql-database.org/
Analytics
Analytics is the discovery to find out the patterns
in data using knowledge. Analytics is a multidimensional discipline. There is
extensive use of mathematics and statistics, the use of predictive models and
descriptive techniques to gain valuable knowledge from data. The insights from
data are used to recommend actions or for decision making.
To
know more visit: http://en.wikipedia.org/wiki/Analytics
Hadoop
Hadoop is an open-source cloud computing platform,
provides distributed storage across different machines on the cluster, along
with it provides computational layer for processing datasets that are stored in
servers. Hadoop was written in Java and powered by Apache Software foundation.
It is Fault-tolerant, Horizontal scalability, provides replication factor, high
throughput, robust, accessible.
Open-source Tools
There is tremendous insight hiding in your existing
data. Apache Software Foundation provides some of the open-source tools for
integration, analytics and visualization of Big Data. Using tools, we can aggregate the data,
organize it and extract useful insights.
To
know more visit:
MongoDB
MongoDB is one of the NoSQL database which provides
document-oriented database. MongoDB’s document data model makes it easy for use
to store data of any structure and dynamically modify the schema. It provides
scale up or scale out of machines horizontally. MongoDB’s query provides field
level operators, data types and in-place updates.
To
know more visit: https://www.mongodb.org/
R
R is an open-source software used for statistical
computing and graphics. The R-language is widely used among statisticians and
data miners for developing statistical software.
Apache Spark
Apache Spark is an open-source cluster computing
engine for large scale data processing and allows to load data in cluster’s
memory. It was written in Java, Python and Scala. The three major use cases of
Apache Spark: Fog Computing, Cloud Computing and Streaming data analysis.
To
know more visit: https://spark.apache.org/
MapReduce
MapReduce is a data processing framework for Hadoop
Ecosystem that is meant for analysing the large datasets that are stored in
Hadoop’s storage. MapReduce model processing split job into fragments, each
task will execute on the data blocks.
To
know more visit:
HANA
SAP HANA is an in-memory, column-oriented,
relational database management system. HANA is designed to handle both high
transaction rates and complex query processing on the same platform.
To
know more visit: http://hana.sap.com/platform.html
Amazon Web Services
Amazon Web Services (AWS) provides remote computing
services that makes up a cloud computing platform offered by Amazon. The most
central and well-known services of Amazon EC2 and S3. These products are
marketed to companies as a service large computing capacity much faster and
cheaper than the client company building the physical servers to run the application.
Machine Learning Algorithms
Machine learning is the science of getting computers
to act without being explicitly programmed. Machine learning has given us self-driving
cars, speech recognition, effective web search and etc.
To
know more visit:
Natural Language Processing
Software algorithms designed to allow computers to
more accurately understand everyday human speech, allowing us to interact more
naturally and efficient with them.
To
know more visit: http://en.wikipedia.org/wiki/Natural_language_processing
ConversionConversion EmoticonEmoticon