Docker容器嵌入式开发：Ubuntu上配置Spark环境的基本步骤

源代码杀手 2024-06-27 13:37:02 阅读 78

在这里插入图片描述

一、环境配置

以下是在Ubuntu上配置Spark环境的基本步骤：

下载Spark：

从提供的链接中下载Spark压缩包。您可以使用wget命令从命令行下载：

wget https://dlcdn.apache.org/spark/spark-3.5.1/spark-3.5.1-bin-hadoop3.tgz

解压Spark：

使用以下命令解压下载的压缩包：

tar -xvf spark-3.5.1-bin-hadoop3.tgz

移动Spark文件夹：

将解压后的Spark文件夹移动到您想要安装的位置，例如/opt目录：

sudo mv spark-3.5.1-bin-hadoop3 /opt/spark-3.5.1

配置环境变量：

打开.bashrc文件以编辑环境变量配置：

vim ~/.bashrc

在文件末尾添加以下行以设置Spark的环境变量：

export SPARK_HOME=/opt/spark-3.5.1

export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin

保存并关闭文件。然后运行以下命令使更改生效：

source ~/.bashrc

配置Spark默认日志级别（可选操作）：

您可以选择配置Spark的默认日志级别。编辑$SPARK_HOME/conf/log4j.properties文件，并将以下行的日志级别更改为您想要的级别：

log4j.rootCategory=INFO, console

例如，将其更改为ERROR可仅显示错误日志。

启动Spark集群：

使用以下命令启动Spark集群：

start-all.sh

这将启动Spark的Master和Worker节点。

在这里插入图片描述

验证Spark安装：

打开一个新的终端窗口，并运行以下命令以验证Spark是否正确安装：

spark-shell

如果一切顺利，将会打印出Spark的启动信息，并进入Spark的交互式Shell。您可以在Shell中运行简单的Spark命令以测试Spark的功能。

在这里插入图片描述

操作记录：

spark-3.5.1-bin-hadoop3/R/lib/SparkR/Meta/features.rds

spark-3.5.1-bin-hadoop3/R/lib/SparkR/doc/

spark-3.5.1-bin-hadoop3/R/lib/SparkR/doc/index.html

spark-3.5.1-bin-hadoop3/R/lib/SparkR/doc/sparkr-vignettes.html

spark-3.5.1-bin-hadoop3/R/lib/SparkR/doc/sparkr-vignettes.Rmd

spark-3.5.1-bin-hadoop3/R/lib/SparkR/doc/sparkr-vignettes.R

spark-3.5.1-bin-hadoop3/R/lib/SparkR/help/

spark-3.5.1-bin-hadoop3/R/lib/SparkR/help/SparkR.rdx

spark-3.5.1-bin-hadoop3/R/lib/SparkR/help/paths.rds

spark-3.5.1-bin-hadoop3/R/lib/SparkR/help/SparkR.rdb

spark-3.5.1-bin-hadoop3/R/lib/SparkR/help/AnIndex

spark-3.5.1-bin-hadoop3/R/lib/SparkR/help/aliases.rds

spark-3.5.1-bin-hadoop3/R/lib/SparkR/NAMESPACE

spark-3.5.1-bin-hadoop3/R/lib/SparkR/tests/

spark-3.5.1-bin-hadoop3/R/lib/SparkR/tests/testthat/

spark-3.5.1-bin-hadoop3/R/lib/SparkR/tests/testthat/test_basic.R

spark-3.5.1-bin-hadoop3/R/lib/SparkR/INDEX

spark-3.5.1-bin-hadoop3/R/lib/SparkR/DESCRIPTION

spark-3.5.1-bin-hadoop3/R/lib/SparkR/profile/

spark-3.5.1-bin-hadoop3/R/lib/SparkR/profile/general.R

spark-3.5.1-bin-hadoop3/R/lib/SparkR/profile/shell.R

spark-3.5.1-bin-hadoop3/R/lib/SparkR/worker/

spark-3.5.1-bin-hadoop3/R/lib/SparkR/worker/daemon.R

spark-3.5.1-bin-hadoop3/R/lib/SparkR/worker/worker.R

spark-3.5.1-bin-hadoop3/R/lib/SparkR/html/

spark-3.5.1-bin-hadoop3/R/lib/SparkR/html/00Index.html

spark-3.5.1-bin-hadoop3/R/lib/SparkR/html/R.css

spark-3.5.1-bin-hadoop3/R/lib/SparkR/R/

spark-3.5.1-bin-hadoop3/R/lib/SparkR/R/SparkR.rdx

spark-3.5.1-bin-hadoop3/R/lib/SparkR/R/SparkR.rdb

spark-3.5.1-bin-hadoop3/R/lib/SparkR/R/SparkR

spark-3.5.1-bin-hadoop3/R/lib/sparkr.zip

(tfv1) root@98031e181845:/data/myapp2024/myBigDataApplicationDevelopment# sudo mv spark-3.5.1-bin-hadoop3 /opt/spark-3.5.1

(tfv1) root@98031e181845:/data/myapp2024/myBigDataApplicationDevelopment# geidt ~/.bashrc

bash: geidt: command not found

(tfv1) root@98031e181845:/data/myapp2024/myBigDataApplicationDevelopment# gedit ~/.bashrc

(gedit:47848): dbind-WARNING **: 12:36:11.367: Couldn't connect to accessibility bus: Failed to connect to socket /run/user/1000/at-spi/bus_1: No such file or directory

(gedit:47848): GLib-GIO-CRITICAL **: 12:36:12.611: g_dbus_proxy_new_sync: assertion 'G_IS_DBUS_CONNECTION (connection)' failed

(gedit:47848): dconf-WARNING **: 12:36:13.481: failed to commit changes to dconf: Failed to execute child process “dbus-launch” (No such file or directory)

(gedit:47848): dconf-WARNING **: 12:36:13.513: failed to commit changes to dconf:

** (gedit:47848): WARNING **: 12:36:32.706: Set document metadata failed: Setting attribute metadata::gedit-spell-language not supported

** (gedit:47848): WARNING **: 12:36:32.706: Set document metadata failed: Setting attribute metadata::gedit-encoding not supported

** (gedit:47848): WARNING **: 12:36:33.149: Set document metadata failed: Setting attribute metadata::gedit-spell-language not supported

** (gedit:47848): WARNING **: 12:36:33.149: Set document metadata failed: Setting attribute metadata::gedit-encoding not supported

** (gedit:47848): WARNING **: 12:36:33.159: Could not load theme icon text-x-generic: Icon 'text-x-generic' not present in theme Yaru-magenta

** (gedit:47848): WARNING **: 12:36:33.209: Set document metadata failed: Setting attribute metadata::gedit-spell-language not supported

** (gedit:47848): WARNING **: 12:36:33.209: Set document metadata failed: Setting attribute metadata::gedit-encoding not supported

^C

(tfv1) root@98031e181845:/data/myapp2024/myBigDataApplicationDevelopment# ^C

(tfv1) root@98031e181845:/data/myapp2024/myBigDataApplicationDevelopment# source ~/.bashrc

(tfv1) root@98031e181845:/data/myapp2024/myBigDataApplicationDevelopment# start-all.sh

starting org.apache.spark.deploy.master.Master, logging to /opt/spark-3.5.1/logs/spark--org.apache.spark.deploy.master.Master-1-98031e181845.out

localhost: ssh: connect to host localhost port 22: Connection refused

(tfv1) root@98031e181845:/data/myapp2024/myBigDataApplicationDevelopment# sudo service ssh status

* sshd is not running

(tfv1) root@98031e181845:/data/myapp2024/myBigDataApplicationDevelopment# sudo service ssh start

* Starting OpenBSD Secure Shell server sshd [ OK ]

(tfv1) root@98031e181845:/data/myapp2024/myBigDataApplicationDevelopment# start-all.sh

org.apache.spark.deploy.master.Master running as process 48691. Stop it first.

localhost: @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

localhost: @ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @

localhost: @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

localhost: IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!

localhost: Someone could be eavesdropping on you right now (man-in-the-middle attack)!

localhost: It is also possible that a host key has just been changed.

localhost: The fingerprint for the ED25519 key sent by the remote host is

localhost: SHA256:rqXRWozShnlcYdzcZVX/VKCYTRWRbpW66Bb/7bTci0U.

localhost: Please contact your system administrator.

localhost: Add correct host key in /root/.ssh/known_hosts to get rid of this message.

localhost: Offending ED25519 key in /root/.ssh/known_hosts:1

localhost: remove with:

localhost: ssh-keygen -f "/root/.ssh/known_hosts" -R "localhost"

localhost: Password authentication is disabled to avoid man-in-the-middle attacks.

localhost: Keyboard-interactive authentication is disabled to avoid man-in-the-middle attacks.

localhost: root@localhost: Permission denied (publickey,password).

(tfv1) root@98031e181845:/data/myapp2024/myBigDataApplicationDevelopment# ssh-keygen -f "/root/.ssh/known_hosts" -R "localhost"

# Host localhost found: line 1

/root/.ssh/known_hosts updated.

Original contents retained as /root/.ssh/known_hosts.old

(tfv1) root@98031e181845:/data/myapp2024/myBigDataApplicationDevelopment# ssh-keygen -f "/root/.ssh/known_hosts" -R "localhost"

Host localhost not found in /root/.ssh/known_hosts

(tfv1) root@98031e181845:/data/myapp2024/myBigDataApplicationDevelopment# ssh-keygen -f "/root/.ssh/known_hosts" -R "localhost"

Host localhost not found in /root/.ssh/known_hosts

(tfv1) root@98031e181845:/data/myapp2024/myBigDataApplicationDevelopment# start-all.sh

org.apache.spark.deploy.master.Master running as process 48691. Stop it first.

localhost: Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.

root@localhost's password:

localhost: Permission denied, please try again.

root@localhost's password:

localhost: Permission denied, please try again.

root@localhost's password:

localhost: starting org.apache.spark.deploy.worker.Worker, logging to /opt/spark-3.5.1/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-98031e181845.out

(tfv1) root@98031e181845:/data/myapp2024/myBigDataApplicationDevelopment# spark-shell

Setting default log level to "WARN".

To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).

24/04/09 04:40:54 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Spark context Web UI available at http://98031e181845:4040

Spark context available as 'sc' (master = local[*], app id = local-1712637656564).

Spark session available as 'spark'.

Welcome to

____ __

/ __/__ ___ _____/ /__

_\ \/ _ \/ _ `/ __/ '_/

/___/ .__/\_,_/_/ /_/\_\ version 3.5.1

/_/

Using Scala version 2.12.18 (OpenJDK 64-Bit Server VM, Java 11.0.19)

Type in expressions to have them evaluated.

Type :help for more information.

scala>

二、问题解决

start-all.sh

starting org.apache.spark.deploy.master.Master, logging to /opt/spark-3.5.1/logs/spark--org.apache.spark.deploy.master.Master-1-98031e181845.out

localhost: ssh: connect to host localhost port 22: Connection refused

您遇到的错误表明在启动Spark时出现了SSH连接被拒绝的问题。这通常是由于SSH服务未启动或者防火墙设置导致的。以下是一些可能的解决方法：

确保SSH服务正在运行：

使用以下命令检查SSH服务的状态：

sudo service ssh status

如果SSH服务未运行，请使用以下命令启动它：

sudo service ssh start

检查防火墙设置：

如果您的系统上启用了防火墙，可能会阻止SSH连接。您可以尝试临时关闭防火墙并重新启动Spark，然后查看是否仍然出现连接问题。但请记住，在生产环境中，安全性是至关重要的，关闭防火墙可能会带来安全风险。

检查SSH端口：

默认情况下，SSH服务使用22端口。确保您的系统上的SSH服务正在监听端口22，并且未被其他进程占用。

检查主机名：

确保您的主机名配置正确，以便SSH能够正确地连接到localhost。

检查Spark配置：

检查Spark的配置文件，确保其中的master节点配置正确。您可以在$SPARK_HOME/conf/spark-defaults.conf中指定master节点的地址。

尝试以上解决方法中的一种或多种，并重新启动Spark，以查看是否解决了连接问题。

tart-all.sh