Web Integration blog
Distributed Data Processing Using Apache Software Foundation Tools
This article will focus on the Apache Software Foundation technologies that make processing large amounts of data easier. I am going to describe the Apache Hadoop ecosystem, MapReduce model for distributed processing and its implementation used in Hadoop. Furthermore, I am going to introduce HDFS file system, Hbase column-oriented database, Hive data warehouse or centralized service for configuration management, distributed synchronization and group service called Zookeeper.