Boost Your Big Data Skills: Hadoop Installation on Ubuntu


Ubuntu blog

Ubuntu is a very powerful Linux-based operating system that is famous for its stability, security, and user-friendliness. Regardless of whether you are a beginner or advanced, Ubuntu will grant you a smooth experience in the world of personal computing, development, and even playing. Being updated regularly with an overreaching robust community, Ubuntu remains growing and more popular among users worldwide.

The first reason why people prefer Ubuntu is flexibility. This means it runs on everything from very low-cost machines to top of the line servers: this is a good fit for all sorts of use cases. If you would like to run a game or some resource-heavy application, Gaming RDP on Ubuntu can provide you with the means to do so seamlessly. The users could easily access their machines with remote desktop protocols and run their favorite games without worrying about hardware limitations. Ubuntu offers multiple gaming client platforms, including Steam, to ensure that users have access to an entire spectrum of games.

To install Hadoop on Ubuntu, follow these steps:

  1. Prepare Your Environment: …
  2. Install Java: …
  3. Create a Hadoop User: …
  4. Installing SSH: …
  5. Creating and Setting Up SSH Key-Based Authentication: …
  6. Installing Hadoop on Ubuntu: …
  7. Configure Environment Variables. …
  8. Initialize Hadoop Distributed File System (HDFS)

 Requirements for Hadoop Installation on Ubuntu

  1. Java Installation: Install Java (JDK 8 or higher) as Hadoop requires Java to run.
  2. SSH Setup: Configure passwordless SSH between nodes for Hadoop’s distributed system.
  3. Sufficient Disk Space: Ensure adequate storage for Hadoop data, and RAM for processing tasks.                 

Hadoop Installation  

To install Hadoop on Ubuntu, install Java, set up passwordless SSH, download Hadoop, configure environment variables (HADOOP_HOME), format the HDFS namenode, and start Hadoop’s services using start-dfs.sh and start-yarn.sh.

Steps 1 : Prepare Your Environment:

To prepare Ubuntu, update packages, install essential tools, configure settings, and set up development or system-specific requirements.

    sudo apt update

sudo apt upgrade

Steps 2 : Install Java

To install Java on Ubuntu

sudo apt update

then sudo apt install

 default-jdk,

verify with java –

version.ll default-jdk

Steps 3: Create a Hadoop User: …

  1. Create the user:
    bash



    sudo adduser hadoopuser


  2. Grant user privileges: Add the user to the sudo group (optional, if you need admin privileges):
    bash



    sudo usermod -aG sudo hadoopuser

  3. Switch to the new user:
    bash



    su – hadoopuser

  4. Set up environment for Hadoop:
    • Add Hadoop-related environment variables (e.g., HADOOP_HOME, JAVA_HOME) in the user’s ~/.bashrc file.
    • Example:bash
      Copy code


      export HADOOP_HOME=/opt/hadoop
    • export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
    • export PATH=$PATH:$HADOOP_HOME/bin

  5. Apply changes:
    bash



    source ~/.bashrc

This sets up the Hadoop user. You can now install and configure Hadoop as needed.

Create a Hadoop user

  1. Create a new user:
    bash

    sudo useradd -m -s /bin/bash hadoop


  2. Set a password for the user:
    bash


    sudo passwd hadoop

  3. Add the user to the sudo group (optional if needed for administrative tasks):
    bash


    sudo usermod -aG sudo hadoop


  4. Switch to the new user:
    bash


    su – hadoop


  5. Set up Hadoop directory and permissions (if you need to give the user specific access):
    bash



    sudo mkdir -p /usr/local/hadoop
  6. sudo chown -R hadoop:hadoop /usr/local/hadoop

Now, the hadoop user is created and ready for setting up Hadoop-related tasks.

To set up SSH key-based authentication for a user on Ubuntu, follow these steps:

1. Generate SSH Key Pair on Local Machine

  • On the machine you want to connect from (e.g., your local machine), generate the SSH key pair:bash
    Copy code


    ssh-keygen -t rsa -b 4096 -C “your_email@example.com”


  • This will create a public and private key pair in ~/.ssh/ directory (by default).
  • Press Enter to accept the default file location (~/.ssh/id_rsa).

2. Copy Public Key to Remote Machine

  • On the local machine, copy the public key to the remote machine (where you want to set up SSH key-based authentication):
    bash
    Copy code


    ssh-copy-id hadoop@remote-server-ip


  • Replace hadoop with the actual username on the remote machine and remote-server-ip with the IP address or hostname of the remote machine.
  • If ssh-copy-id is unavailable, you can manually copy the key:
    bash



    cat ~/.ssh/id_rsa.pub | ssh hadoop@remote-server-ip “mkdir -p ~/.ssh && cat >> ~/.ssh/authorized_keys”


3. Verify Key-Based Authentication

  • Now, you should be able to log into the remote machine without a password:bash
    Copy code


    ssh hadoop@remote-server-ip


  • If everything is configured correctly, you should log in directly without being prompted for a password.

4. Optional: Disable Password Authentication

  • To increase security, you can disable password authentication on the remote machine:
    1. Open the SSH configuration file:bash



      sudo nano /etc/ssh/sshd_config


    2. Find and change or add the following lines:bash
      Copy code


      PasswordAuthentication no
    3. ChallengeResponseAuthentication no.
    4. Restart the SSH service:bash
      Copy code


      sudo systemctl restart ssh

1. Install Java

Hadoop requires Java to run. If you haven’t installed Java, you can install it using the following command:

bash

Copy code

sudo apt update

sudo apt install openjdk-8-jdk

Check Java version to verify the installation:

bash

Copy code

java -version

2. Create a Hadoop User (Optional)

Create a dedicated Hadoop user (if you haven’t already):

bash

Copy code

sudo useradd -m -s /bin/bash hadoop

sudo passwd hadoop

Switch to the hadoop user:

bash

Copy code

su – hadoop

3. Download Hadoop

Go to the Apache Hadoop releases page and copy the link for the latest stable version. Alternatively, you can download Hadoop using wget:

bash

Copy code

wget https://downloads.apache.org/hadoop/common/stable/hadoop-x.y.z.tar.gz

Replace x.y.z with the version number you want to download.

4. Extract Hadoop Files

Extract the downloaded tarball:

bash

Copy code

tar -xzvf hadoop-x.y.z.tar.gz

Move it to /usr/local/hadoop (or another directory if you prefer):

bash

Copy code

sudo mv hadoop-x.y.z /usr/local/hadoop

5. Set Up Hadoop Environment Variables

Edit the .bashrc file to include Hadoop environment variables:

bash

Copy code

nano ~/.bashrc

Add the following lines at the end of the file:

bash

Copy code

# Hadoop Environment Variables

export HADOOP_HOME=/usr/local/hadoop

export HADOOP_INSTALL=$HADOOP_HOME

export HADOOP_MAPRED_HOME=$HADOOP_HOME

export HADOOP_COMMON_HOME=$HADOOP_HOME

export HADOOP_HDFS_HOME=$HADOOP_HOME

export YARN_HOME=$HADOOP_HOME

export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

Source the .bashrc file to apply changes:

bash

Copy code

source ~/.bashrc

6. Configure Hadoop

Edit the Hadoop configuration files located in the $HADOOP_HOME/etc/hadoop/ directory.

  1. Edit hadoop-env.sh: Open the hadoop-env.sh file to set Java home:
    bash
    Copy code


    nano $HADOOP_HOME/etc/hadoop/hadoop-env.sh



  2. Add the following line to specify the Java path:
    bash
    Copy code


    export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64


  3. Edit core-site.xml:
    Set the Hadoop filesystem URI. Open core-site.xml:
    bash
    Copy code


    nano $HADOOP_HOME/etc/hadoop/core-site.xml



  4. Add the following configuration:
    xml
    Copy code


    <configuration>
  5.     <property>
  6.         <name>fs.defaultFS</name>
  7.         <value>hdfs://localhost:9000</value>
  8.     </property>
  9. </configuration>


To configure environment variables for Hadoop on Ubuntu, follow these steps:

1. Edit the .bashrc File

The .bashrc file contains environment settings that apply to your user session. Open the .bashrc file for editing:

bash

Copy code

nano ~/.bashrc

2. Add Hadoop Environment Variables

At the end of the .bashrc file, add the following lines to configure Hadoop-related environment variables:

bash

# Hadoop Environment Variables

export HADOOP_HOME=/usr/local/hadoop

export HADOOP_INSTALL=$HADOOP_HOME

export HADOOP_MAPRED_HOME=$HADOOP_HOME

export HADOOP_COMMON_HOME=$HADOOP_HOME

export HADOOP_HDFS_HOME=$HADOOP_HOME

export YARN_HOME=$HADOOP_HOME

export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

Make sure to replace /usr/local/hadoop with the actual path where you installed Hadoop (if different).

3. Set Java Home

Hadoop requires Java, so you need to define the JAVA_HOME variable. Add the following line (adjust the path if necessary):

bash

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

This is assuming you are using OpenJDK 8. If you’re using a different Java version, adjust the path accordingly.

4. Apply the Changes

After saving the .bashrc file, apply the changes:

bash

Copy code

source ~/.bashrc

5. Verify the Configuration

To ensure that the environment variables are correctly configured, check the variables with the following commands:

  • Hadoop Version:
    bash



    hadoop version


  • Java Version:
    bash
    Copy code


    java -version


You should see the correct Hadoop and Java versions printed on the terminal.

To initialize the Hadoop Distributed File System (HDFS) on your Ubuntu system, follow these steps:

1. Format the Namenode

Before you start HDFS, you need to format the Namenode. This step initializes the HDFS filesystem. You should only do this once after installing Hadoop and setting up the configuration files.

Run the following command to format the Namenode:

bash

Copy code

hdfs namenode -format

This will format the Namenode and prepare HDFS for use. The output should indicate that the format was successful, something like this:

arduino

Copy code

Formatting using sector size 512 bytes.

DFS Namenode in safe mode

2. Start HDFS Daemons

Now that the Namenode has been formatted, you need to start the HDFS daemons: the Namenode and Datanode.

Run the following command to start the daemons:

bash

Copy code

start-dfs.sh

This will start both the NameNode and DataNode.

You should see output indicating that the daemons are starting:

bash

Copy code

starting namenode, logging to /usr/local/hadoop/logs/hadoop-hadoop-namenode-<hostname>.log

starting datanode, logging to /usr/local/hadoop/logs/hadoop-hadoop-datanode-<hostname>.log

3. Verify the Daemons Are Running

To check if the HDFS daemons are running correctly, use the jps command to see the list of running Java processes:

bash

Copy code

jps

You should see the following processes:

  • NameNode
  • DataNode
  • SecondaryNameNode (optional, depending on your configuration)

Example output:

Copy code

12345 NameNode

23456 DataNode

34567 SecondaryNameNode

4. Access the NameNode Web UI

To verify that HDFS is running correctly, you can check the NameNode Web UI. Open a browser and go to:

arduino

Copy code

http://localhost:50070

This page provides information about the HDFS filesystem, including storage usage, active and dead DataNodes, and other useful information.

5. Create Directories in HDFS

Once the HDFS daemons are running, you can start interacting with HDFS. For example, you can create directories in HDFS using the hdfs dfs -mkdir command:

bash

Copy code

hdfs dfs -mkdir /user/hadoop

This will create a directory /user/hadoop in the HDFS filesystem.

6. Verify Directory Creation

To verify that the directory was created successfully, list the contents of the HDFS root directory:

bash

Copy code

hdfs dfs -ls /

You should see the newly created /user/hadoop directory listed.

7. Stop HDFS Daemons

Once you’re done working with HDFS, you can stop the HDFS daemons using the following command:

bash

Copy code

stop-dfs.shU

This will stop the NameNode and DataNode daemons.

This source is so friendly, user-friendly, and on community support. It’s strong with respect to stability, security, and flexibility for both personal computing and server environments. Ubuntu is based on Linux and is an open-source operating system. It has access to a very big repository of software and tools, all of which install easily through its package manager, APT.

Also, Ubuntu is full of applications compatible with many programs; this will allow you access to streaming services that can be accessed easily. Whether you want to watch movies, play games, or even simply take virtual meetings, Ubuntu provides a whole environment. By using the Linux RDP for remote desktop connection, the user can connect to remote servers or even machines to gain efficient management and support for carrying remote tasks. Besides this, there is also an option of Bluestacks RDP through which you may run any Android application on a remote desktop working as a powerful tool in app testing or development. Being in places such as Germany RDP, this provides remote desktop access in a pretty hassle-free manner and in a more secured high-performance connection. It is a place where a user can access it from virtually anywhere to either work or play. Ubuntu’s flexibility makes it usable not only for the geeky developers, enterprises, and casual users.

Leave a Reply