本文共 4264 字,大约阅读时间需要 14 分钟。
概念:
是一个可靠的、可伸缩的、分布式计算的开源软件。
是一个框架,允许跨越计算机集群的大数据及分布式处理,使用简单的编程模型(mapreduce)可从单台服务器扩展至几千台主机,每个节点提供了计算和存储功能。不依赖于硬件处理HA,在应用层面实现特性4V:
volumn 体量大
velocity 速度快variaty 样式多value 价值密度低模块:
hadoop common 公共类库,支持其他模块
HDFS hadoop distributed file system,hadoop分布式文件系统Hadoop yarn 作业调度和资源管理框架hadoop mapreduce 基于yarn系统的大数据集并行处理技术。
主机名称 | IP地址 | 安装节点应用 |
---|---|---|
hadoop-1 | 172.20.2.203 | namenode/datanode/nodemanager |
hadoop-2 | 172.20.2.204 | secondarynode/datanode/nodemanager |
hadoop-3 | 172.20.2.205 | resourcemanager/datanode/nodemanager |
a.配置java环境
yum install java-1.8.0-openjdk.x86_64 java-1.8.0-openjdk-devel -ycat >/etc/profile.d/java.sh<
b.修改主机名添加hosts
hostname hadoop-1cat >>/etc/hosts<
c.创建用户及目录
useradd hadoopecho "hadoopwd" |passwd hadoop --stdinmkdir -pv /data/hadoop/hdfs/{nn,snn,dn}chown -R hadoop:hadoop /data/hadoop/hdfs/mkdir -p /var/log/hadoop/yarnmkdir -p /dbapps/hadoop/logschmod g+w /dbapps/hadoop/logs/chown -R hadoop.hadoop /dbapps/hadoop/
d.配置hadoop环境变量
cat>/etc/profile.d/hadoop.sh<
e.下载并解压软件包
mkdir /software cd /software wget -c http://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.7.5/hadoop-2.7.5.tar.gztar -zxf hadoop-2.6.5.tar.gz -C /usr/localln -sv /usr/local/hadoop-2.6.5/ /usr/local/hadoopchown hadoop.hadoop /usr/local/hadoop-2.6.5/ -R
f.hadoop用户免密钥配置
su - hadoopssh-keygen -t rsafor num in `seq 1 3`;do ssh-copy-id -i /home/hadoop/.ssh/id_rsa.pub hadoop@hadoop-$num;done
配置master节点
hadoop-1节点运行namenode/datanode/nodemanager,修改hadoop-1的hadoop配置文件
core-site.xml
(定义namenode节点)
cat>/usr/local/hadoop/etc/hadoop/core-site.xml <EOF fs.defaultFS hdfs://hadoop-1:8020 true
hdfs-site.xml
修改replication为data节点数目 (定义secondary节点)
cat >/usr/local/hadoop/etc/hadoop/hdfs-site.xml <EOF dfs.namenode.secondary.http-address hadoop-2:50090 dfs.replication 2 dfs.namenode.name.dir file:///data/hadoop/hdfs/nn dfs.datanode.data.dir file:///data/hadoop/hdfs/dn fs.checkpoint.dir file:///data/hadoop/hdfs/snn fs.checkpoint.edits.dir file:///data/hadoop/hdfs/snn
添加mapred-site.xml
cat >/usr/local/hadoop/etc/hadoop/mapred-site.xml <EOF mapreduce.framework.name yarn
yarn-site.xml
修改对应values为master的主机名(定义resourcemanager节点)
cat >/usr/local/hadoop/etc/hadoop/yarn-site.xml<EOF yarn.resourcemanager.address hadoop-3:8032 yarn.resourcemanager.scheduler.address hadoop-3:8030 yarn.resourcemanager.resource-tracker.address hadoop-3:8031 yarn.resourcemanager.admin.address hadoop-3:8033 yarn.resourcemanager.webapp.address hadoop-3:8088 yarn.nodemanager.aux-services mapreduce_shuffle yarn.nodemanager.auxservices.mapreduce_shuffle.class org.apache.hadoop.mapred.ShuffleHandler yarn.resourcemanager.scheduler.class org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
slaves
(定义数据节点)
cat >/usr/local/hadoop/etc/hadoop/slaves<
同样的步骤操作hadoop-2/3,建议将hadoop-1的文件直接分发至hadoop-2/3
在NameNode机器上(hadoop-1)执行格式化:
hdfs namenode -format
在namenode hadoop-1执行start-all.sh
启动服务
yarn jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.5.jar pi 2 10
HDFS-NameNode
url:
YARN-ResourceManager
url:
转载于:https://blog.51cto.com/kaliarch/2119093