In this tutorial, we will go through the steps of installing HDFS on Manjaro. HDFS is a part of the Apache Hadoop project and is used for storing and processing large datasets. Follow the instructions below to get started.
Before we begin, ensure that you have the following prerequisites.
Follow the steps below to install Hadoop on your Manjaro system.
Download the latest stable release of Hadoop from the Apache Hadoop website. You can download it using the following command in your terminal.
wget https://downloads.apache.org/hadoop/common/hadoop-3.3.1/hadoop-3.3.1.tar.gz
Extract the downloaded archive using the following command.
tar -xzf hadoop-3.3.1.tar.gz
Move the extracted archive to the /opt/ directory using the following command.
sudo mv hadoop-3.3.1 /opt/
Set the HADOOP_HOME environment variable by adding the following line to your .bashrc file.
export HADOOP_HOME=/opt/hadoop-3.3.1
You can open .bashrc using the following command.
nano ~/.bashrc
Refresh your environment variables using the following command.
source ~/.bashrc
Follow the steps below to configure Hadoop.
Create a directory for Hadoop to store its data files using the following command.
mkdir -p /opt/hadoop-3.3.1/data/hdfs/namenode
mkdir -p /opt/hadoop-3.3.1/data/hdfs/datanode
Edit the hadoop-env.sh file using the following command.
nano $HADOOP_HOME/etc/hadoop/hadoop-env.sh
Add the following line at the end of the file and save it.
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
Edit the core-site.xml file using the following command.
nano $HADOOP_HOME/etc/hadoop/core-site.xml
Add the following lines between the configuration tags and save the file.
<property>
  <name>fs.defaultFS</name>
  <value>hdfs://localhost:9000</value>
</property>
Edit the hdfs-site.xml file using the following command.
nano $HADOOP_HOME/etc/hadoop/hdfs-site.xml
Add the following lines between the configuration tags and save the file.
<property>
  <name>dfs.replication</name>
  <value>1</value>
</property>
<property>
  <name>dfs.namenode.name.dir</name>
  <value>file:/opt/hadoop-3.3.1/data/hdfs/namenode</value>
</property>
<property>
  <name>dfs.datanode.data.dir</name>
  <value>file:/opt/hadoop-3.3.1/data/hdfs/datanode</value>
</property>
Edit the mapred-site.xml file using the following command.
nano $HADOOP_HOME/etc/hadoop/mapred-site.xml
Add the following lines between the configuration tags and save the file.
<property>
  <name>mapreduce.framework.name</name>
  <value>yarn</value>
</property>
<property>
  <name>mapreduce.application.classpath</name>
  <value>/opt/hadoop-3.3.1/share/hadoop/mapreduce/*:/opt/hadoop-3.3.1/share/hadoop/mapreduce/lib/*</value>
</property>
Edit the yarn-site.xml file using the following command.
nano $HADOOP_HOME/etc/hadoop/yarn-site.xml
Add the following lines between the configuration tags and save the file.
<property>
  <name>yarn.nodemanager.aux-services</name>
  <value>mapreduce_shuffle</value>
</property>
Before starting HDFS, you need to format the NameNode. Follow the instructions below to format the NameNode.
Run the following command in your terminal.
hdfs namenode -format
Follow the steps below to start and stop HDFS.
To start HDFS, run the following command in your terminal.
start-dfs.sh
To stop HDFS, run the following command in your terminal.
stop-dfs.sh
Congratulations! You have successfully installed HDFS on your Manjaro system. You can now start using HDFS for storing and processing large datasets.
If you want to self-host in an easy, hands free way, need an external IP address, or simply want your data in your own hands, give IPv6.rs a try!
Alternatively, for the best virtual desktop, try Shells!