Hadoop面试和学习小结
随着大数据的盛行,Hadoop也流行起来。面过一些公司,包括开发Hadoop :如Cloudera, Hortonworks, MapR, Teradata, Greenplum, Amazon EMR, 使用Hadoop的除了Google,数不胜数了.
Hadoop 2.0转型基本无可阻挡,今年下半年要正式发布了,它的出现让大家知识体系都 要更新了。Hadoop1.0搞了8年才发布,2.0不到2年就出来了。2.0的核心是YARN,它的 诞生还是有趣的故事
YARN介绍
Yarn from Hortonworks
Yarn from IBM developerworks
Hadoop Ecosystem at a Glance
SQL is what’s next for Hadoop: Here’s who’s doing it
All SQL-on-Hadoop Solutions are missing the point of Hadoop
Hadoop Summit, San Jose
Hadoop: The Definitive Guide, 3rd Edition
tomwhite/hadoop-book · GitHub
Big Data beyond MapReduce: Google’s Big Data papers
知道MapReduce大致流程,Map, Shuffle, Reduce
知道Combiner, partition作用,设置Compression
搭建Hadoop集群,Master/Slave 都运行那些服务 NameNode, DataNode, JobTracker, TaskTracker
Pig, Hive 简单语法,UDF写法
When to use Pig Latin versus Hive SQL?
Online Feedback Publishing System
Introduction to Apache Hive Online Training
Hadoop 2.0新知识; HDFS2 HA,Snapshot, ResourceManager,ApplicationsManager, NodeManager
Hadoop 生态系统
SQL on Hadoop
Hadoop Summit
书籍和Paper
“Hadoop: The Definitive Guide”: 里面内容非常好,既有高屋建瓴,又有微观把握,基本适用于1.X版本。比如mapreduce各个子阶段,Join在里面也有代码实现,第三版
Google的三辆马车,GFS, MapReduce, BigTable Google的新三辆马车:Caffeine、Pregel、Dremel
SIGMOD, VLDB Top DB conference
入门: