首页澳门新葡亰官方网站 › CentOS 6.4+Hadoop2.2.0 斯Parker伪分布式安装

CentOS 6.4+Hadoop2.2.0 斯Parker伪分布式安装

CentOS 6.4+Hadoop2.2.0 Spark伪分布式安装

Hadoop版本是2.2.0的稳定版本 下载地址
spark版本:spark-0.9.1-bin-hadoop2 
下载地址
这里的spark有三个版本:

    For Hadoop 1 (HDP1, CDH3): find an Apache mirror or direct file
download
    For CDH4: find an Apache mirror or direct file download
    For Hadoop 2 (HDP2, CDH5): find an Apache mirror or direct file
download
我的hadoop版本是hadoop2.2.0的,所以下载的是for hadoop2

关于spark的介绍可以参看
Apache Spark is a fast and general engine for large-scale data
processing.

spark运行时需要scala环境,这里下载最新版本的scala 

scala是一种可伸缩的语言是一种多范式的编程语言,一种类似java的编程,设计初衷是要集成面向对象编程和函数式编程的各种特性。Scala是在JVM上运行,Scala是一种纯粹的面向对象编程语言,而又无缝地结合了命令式和函数式的编程风格

ok 开始配置spark:

我是在hadoop的安装用户下面安装的,所以这里直接编辑/home/hadoop/.bashrc

[[email protected]
~]$ cat .bashrc
# .bashrc

# Source global definitions
if [ -f /etc/bashrc ]; then
. /etc/bashrc
fi

# User specific aliases and functions
export HADOOP_HOME=/home/hadoop/hadoop
export HBASE_HOME=/home/hadoop/hbase
export HIVE_HOME=/home/hadoop/hive
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_HOME=/etc/home/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
export SCALA_HOME=/home/hadoop/scala
export SPARK_HOME=/home/hadoop/spark

export
PATH=${PATH}:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HBASE_HOME/bin:$HIVE_HOME/bin:$SCALA_HOME/bin:$SPARK_HOME/bin
export CLASSPATH=$CLASSPATH:$HADOOP/lib:$HBASE_HOME/lib

1.scala安装:
将scala解压到hadoop根目录下
ln -ls scala-2.11.0 scala#建立软链接
lrwxrwxrwx.  1 hadoop hadoop        12 May 21 09:15 scala ->
scala-2.11.0
drwxrwxr-x.  6 hadoop hadoop      4096 Apr 17 16:10 scala-2.11.0

编辑.bashrc  加入  export SCALA_HOME=/home/hadoop/scala
export
PATH=${PATH}:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HBASE_HOME/bin:$HIVE_HOME/bin:$SCALA_HOME/bin:$SPARK_HOME/bin
保存 并使环境变量生效  source  .bashrc 
验证安装:
[[email protected]
~]$ scala -version
Scala code runner version 2.11.0 -- Copyright 2002-2013, LAMP/EPFL
能够正常显示版本说明安装成功

2:spark配置:
tar -xzvf  spark-0.9.1-bin-hadoop2.tgz
ln -s spark-0.9.1-bin-hadoop2 spark
然后配置.bashrc 
export SPARK_HOME=/home/hadoop/spark
export
PATH=${PATH}:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HBASE_HOME/bin:$HIVE_HOME/bin:$SCALA_HOME/bin:$SPARK_HOME/bin

编辑完成source .bashrc 使环境变量生效

spark-env.sh配置:
spark-env.sh是不存在的 需要从 cat spark-env.sh.template >>
spark-env.sh 生成

然后编辑spark-env.sh

加入一下内容
export SCALA_HOME=/home/hadoop/scala
export JAVA_HOME=/usr/java/jdk
export SPARK_MASTER=localhost
export SPARK_LOCAL_IP=localhost
export HADOOP_HOME=/home/hadoop/hadoop
export SPARK_HOME=/home/hadoop/spark
export
SPARK_LIBARY_PATH=.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib:$HADOOP_HOME/lib/native
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop

保存退出

3.启动spark
跟hadoop的目录结构相似 在spark下面的sbin里边放了启动和关闭的shell文件
-rwxrwxr-x. 1 hadoop hadoop 2504 Mar 27 13:44 slaves.sh
-rwxrwxr-x. 1 hadoop hadoop 1403 Mar 27 13:44 spark-config.sh
-rwxrwxr-x. 1 hadoop hadoop 4503 Mar 27 13:44 spark-daemon.sh
-rwxrwxr-x. 1 hadoop hadoop 1176 Mar 27 13:44 spark-daemons.sh
-rwxrwxr-x. 1 hadoop hadoop  965 Mar 27 13:44 spark-executor
-rwxrwxr-x. 1 hadoop hadoop 1263 Mar 27 13:44 start-all.sh
-rwxrwxr-x. 1 hadoop hadoop 2384 Mar 27 13:44 start-master.sh
-rwxrwxr-x. 1 hadoop hadoop 1520 Mar 27 13:44 start-slave.sh
-rwxrwxr-x. 1 hadoop hadoop 2258 Mar 27 13:44 start-slaves.sh
-rwxrwxr-x. 1 hadoop hadoop 1047 Mar 27 13:44 stop-all.sh
-rwxrwxr-x. 1 hadoop hadoop 1124 Mar 27 13:44 stop-master.sh
-rwxrwxr-x. 1 hadoop hadoop 1427 Mar 27 13:44 stop-slaves.sh
[[email protected]
sbin]$ pwd
/home/hadoop/spark/sbin

这里只需要运行start-all就可以了~~~
[[email protected]
sbin]$ ./start-all.sh
rsync from localhost
rsync: change_dir "/home/hadoop/spark-0.9.1-bin-hadoop2/sbin/localhost"
failed: No such file or directory (2)
rsync error: some files/attrs were not transferred (see previous errors)
(code 23) at main.c(1039) [sender=3.0.6]
starting org.apache.spark.deploy.master.Master, logging to
/home/hadoop/spark/logs/spark-hadoop-org.apache.spark.deploy.master.Master-1-localhost.out
localhost: rsync from localhost
localhost: rsync: change_dir
"/home/hadoop/spark-0.9.1-bin-hadoop2/localhost" failed: No such file or
directory (2)
localhost: rsync error: some files/attrs were not transferred (see
previous errors) (code 23) at main.c(1039) [sender=3.0.6]
localhost: starting org.apache.spark.deploy.worker.Worker, logging to
/home/hadoop/spark/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-localhost.out

通过jps查看启动是否成功:
[[email protected]
sbin]$ jps
4706 Jps
3692 DataNode
3876 SecondaryNameNode
4637 Worker
4137 NodeManager
4517 Master
4026 ResourceManager
3587 NameNode

可以看到有一个Master跟Worker进程 说明启动成功
可以通过

4 运行spark自带的程序 
首先需要进入spark下面的bin目录 :
[[email protected]
sbin]$ ll ../bin/
total 56
-rw-rw-r--. 1 hadoop hadoop 2601 Mar 27 13:44 compute-classpath.cmd
-rwxrwxr-x. 1 hadoop hadoop 3330 Mar 27 13:44 compute-classpath.sh
-rwxrwxr-x. 1 hadoop hadoop 2070 Mar 27 13:44 pyspark
-rw-rw-r--. 1 hadoop hadoop 1827 Mar 27 13:44 pyspark2.cmd
-rw-rw-r--. 1 hadoop hadoop 1000 Mar 27 13:44 pyspark.cmd
-rwxrwxr-x. 1 hadoop hadoop 3055 Mar 27 13:44 run-example
-rw-rw-r--. 1 hadoop hadoop 2046 Mar 27 13:44 run-example2.cmd
-rw-rw-r--. 1 hadoop hadoop 1012 Mar 27 13:44 run-example.cmd
-rwxrwxr-x. 1 hadoop hadoop 5151 Mar 27 13:44 spark-class
澳门新葡亰娱乐官网,-rwxrwxr-x. 1 hadoop hadoop 3212 Mar 27 13:44 spark-class2.cmd
-rw-rw-r--. 1 hadoop hadoop 1010 Mar 27 13:44 spark-class.cmd
-rwxrwxr-x. 1 hadoop hadoop 3184 Mar 27 13:44 spark-shell
-rwxrwxr-x. 1 hadoop hadoop  941 Mar 27 13:44 spark-shell.cmd

run-example org.apache.spark.examples.SparkLR spark://localhost:7077

run-example org.apache.spark.examples.SparkPi spark://localhost:7077

Hadoop2.5.2 HA高可靠性集群搭建(Hadoop+Zookeeper)

Hadoop2.7完全分布式集群搭建以及任务测试   

一步步教你Hadoop多节点集群安装配置

6.4+Hadoop2.2.0 Spark伪分布式安装
Hadoop版本是2.2.0的稳定版本 下载地址 spark版本:spark-0.9.1-bin-hadoop2
下载地址....

(PS:在使用spark1.6的时候可以使用的scala版本是2.10.x)

安装spark环境

下载地址

上传spark-2.0.0-bin-hadoop2.6.tgz到master的hadoop用户installer目录下

解压缩

[hadoop@master installer]$ tar -zxvf spark-2.0.0-bin-hadoop2.6.tgz

[hadoop@master installer]$ mv spark-2.0.0-bin-hadoop2.6 spark2

[hadoop@master installer]$ cd spark2/

[hadoop@master spark2]$ ls

bin  conf  data  examples  jars  LICENSE  licenses  NOTICE  python  R
 README.md  RELEASE  sbin  yarn

[hadoop@master spark2]$ pwd

/home/hadoop/installer/spark2

 

[hadoop@master ~]$ vim .bashrc

 

# .bashrc

 

# Source global definitions

if [ -f /etc/bashrc ]; then

        . /etc/bashrc

fi

 

# User specific aliases and functions

export JAVA_HOME=/usr/java/jdk1.7.0_79

export HADOOP_HOME=/home/hadoop/installer/hadoop2

export SCALA_HOME=/home/hadoop/installer/scala

export SPARK_HOME=/home/hadoop/installer/spark2

export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_HOME}/lib/native

export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"

export
CLASSPATH=$CLASSPATH:$HADOOP_HOME/lib:$JAVA_HOME/lib:$SCALA_HOME/lib

export
PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$SCALA_HOME/bin:$SPARK_HOME/bin:$SPARK_HOME/sbin

 

[hadoop@master ~]$ . .bashrc

 

[hadoop@master ~]$ scp .bashrc slave:~

.bashrc
                                                                                           100%
 621     0.6KB/s   00:00

在slave机器上执行

[hadoop@slave ~]$ . .bashrc

 

我本地的环境是:

Spark wordcount

 

[hadoop@master ~]$ spark-shell

Setting default log level to "WARN".

To adjust logging level use sc.setLogLevel(newLevel).

16/11/04 11:05:07 WARN util.NativeCodeLoader: Unable to load
native-hadoop library for your platform... using builtin-java classes
where applicable

16/11/04 11:05:09 WARN spark.SparkContext: Use an existing SparkContext,
some configuration may not take effect.

Spark context Web UI available at

Spark context available as 'sc' (master = local[*], app id =
local-1478228709028).

Spark session available as 'spark'.

Welcome to

      ____              __

     / __/__  ___ _____/ /__

    _ / _ / _ `/ __/  '_/

   /___/ .__/_,_/_/ /_/_   version 2.0.0

      /_/

澳门新葡亰登录,         

Using Scala version 2.11.8 (Java HotSpot(TM) Client VM, Java 1.7.0_79)

Type in expressions to have them evaluated.

Type :help for more information.

 

scala> val file = sc.textFile("hdfs://master:9000/data/wordcount")

16/11/04 11:05:14 WARN util.SizeEstimator: Failed to check whether
UseCompressedOops is set; assuming yes

file: org.apache.spark.rdd.RDD[String] =
hdfs://master:9000/data/input/wordcount MapPartitionsRDD[1] at
textFile at <console>:24

 

scala> val count=file.flatMap(line => line.split(" ")).map(word
=> (word,1)).reduceByKey(_+_)

count: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[4] at
reduceByKey at <console>:26

scala> count.collect()

res0: Array[(String, Int)] = Array((package,1), (this,1),
(Version"](),
(Because,1), (Python,2), (cluster.,1), (its,1), ([run,1), (general,2),
(have,1), (pre-built,1), (YARN,,1), (locally,2), (changed,1),
(locally.,1), (sc.parallelize(1,1), (only,1), (Configuration,1),
(This,2), (basic,1), (first,1), (learning,,1),
([Eclipse](),
(documentation,3), (graph,1), (Hive,2), (several,1), (["Specifying,1),
("yarn",1), (page](),
([params]`.,1), ([project,2), (prefer,1), (SparkPi,2),
(<), (engine,1), (version,1), (file,1),
(documentation...

scala>

 

1.安装jdk

解压jdk安装包到任意目录:

$ cd /home/yfl/Spark

$ tar -xzvf jdk-8u111-linux-x64.tar.gz

$ sudo vim /etc/profile

编辑/etc/profile文件,在最后加上java环境变量:

export JAVA_HOME=/home/yfl/Spark/jdk1.8.0_111/

export JRE_HOME=/home/yfl/Spark/jdk1.8.0_111/jre

export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH

export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH

保存并更新/etc/profile:

$ source /etc/profile

查看是否成功:

$ java -version


安装scala环境

下载地址

上传scala-2.10.5.tgz到master和slave机器的hadoop用户installer目录下

两台机器都要做

 

[hadoop@master installer]$ ls

hadoop2  hadoop-2.6.0.tar.gz  scala-2.10.5.tgz

解压

[hadoop@master installer]$ tar -zxvf scala-2.10.5.tgz

[hadoop@master installer]$ mv scala-2.10.5 scala

[hadoop@master installer]$ cd scala

[hadoop@master scala]$ pwd

/home/hadoop/installer/scala

 

配置环境变量:

[hadoop@master ~]$ vim .bashrc

 

# .bashrc

 

# Source global definitions

if [ -f /etc/bashrc ]; then

        . /etc/bashrc

fi

 

# User specific aliases and functions

export JAVA_HOME=/usr/java/jdk1.7.0_79

export HADOOP_HOME=/home/hadoop/installer/hadoop2

export SCALA_HOME=/home/hadoop/installer/scala

export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_HOME}/lib/native

export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"

export
CLASSPATH=$CLASSPATH:$HADOOP_HOME/lib:$JAVA_HOME/lib:$SCALA_HOME/lib

export
PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$SCALA_HOME/bin

[hadoop@master ~]$ . .bashrc

 

2.配置ssh localhost

确保安装好ssh:

$ sudo apt-get update

$ sudo apt-get install openssh-server

$ sudo /etc/init.d/ssh start

生成并添加密钥:

$ ssh-keygen -t rsa

$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

$ chmod 0600 ~/.ssh/authorized_keys

如果已经生成过密钥,只需执行后两行命令。 测试ssh localhost

$ ssh localhost

$ exit


安装gcc

[root@master ~]# mkdir /RHEL5U4

[root@master ~]# mount /dev/cdrom /media/

[root@master media]# cp -r * /RHEL5U4/

[root@master ~]vim /etc/yum.repos.d/iso.repo

 

[rhel-Server]

Name=5u4_Server

Baseurl=file:///RHEL5U4/Server

Enable=1

Gpgcheck=0

Gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-redhat-release

 

yum clean all

yum install gcc

 

转载本站文章请注明出处:澳门新葡亰官方网站 http://www.radioritmo-bl.com/?p=605

上一篇:

下一篇:

相关文章