转载自:
As for hadoop, you have to understand, that Hadoop can mean two things, depending on the context. A bit like the term “Linux”, if your familiar with that.
only the core: The real “Hadoop” is only a file system for decentralized storage of very large files + request framework to these files via Map/Reduce.
the whole ecosystem: This includes the core and all other tools that have been put onto hadoop for data analytics. Flume, Hbase, Hive, Kafka, Spark, Zookeeper are terms belonging to this category. Flink also might be, I am not sure.
Cassandra might also belong to the second category, because “Hadoop integration was added way back in version 0.6 of Cassandra”.
To understand the whole ecosystem better, you have to understand how this is all structured:
From bottom to top:
bottom layer: Here you have your distributed file system and the Map/Reduce request framework. HDFS is the name of the file system, you will also see this term a lot. On top of HDFS, you can use HBase, which is a column oriented database on top of HDFS ¹.
middle layer, execution engines: In the middle we have several different engines, which can query the hadoop file system for information. Actually, some people put Map/Reduce on a second layer, because the Hadoop environment now also includes Tez and Spark. Tez speeds up queries by using graphs for map/reduce execution, I think. And Spark is an in-memory engine.
top layer, user abstractions: On top of the execution engines you have the user API/abstractions. This includes apache Hive (SQL like queries) and Pig (in my eyes a mixture of SQL and programming language). But there are also more specialized abstractions like MLlib, which is a library for machine learning on top of a hadoop system using Spark as middle layer.
Somewhere aside, we also have management tools for this whole ecosystem. Managing servers, managing the task execution order (job scheduling) and so on. This is where Kafka and Zookeeper belong to.
¹ I currently do not understand the relationship between HBase vs. ORC files or Parquet.