Photo by chuttersnap on Unsplash

Simplest HDFS Operations in 5 minutes

Jayden Chua
3 min readFeb 13, 2018

After spending some time over the weekend to install Hadoop locally on my MacOS, I now wanted to understand a little more about how I can perform simple file operations on Hadoop Distributed File System (HDFS).

Let’s start by jumping in immediately to perform some operations first and after which we will then take a step back to understand briefly what is happening behind the scenes. The operations we are interested in the next 5 minutes will be,

  1. Creating new directories
  2. Listing files and directories
  3. Copying files between local file system and HDFS
  4. Remove files and directories

HDFS Commands

Before moving forward, I assume you have installed Hadoop 2.8.2 on MacOS using homebrew. With that, let’s proceed.

Most of the HDFS commands are located in the bin directory of the Hadoop installation. In my case, my Hadoop installation is found in /usr/local/Cellar/hadoop/2.8.2/ alternatively, you can arrive on the same folder from symlink created /usr/local/opt/hadoop/ .

To keep things simple, we will use /usr/local/opt/hadoop/ for the hadoop installation folder for the rest of this article.

You can see the list of HDFS commands available by going to the bin directory in /usr/local/opt/hadoop/ .

$ cd /usr/local/opt/hadoop
$ ls -la

Here you can see the following commands.

List of commands in the bin directory of Hadoop

Mainly, the hadoop fsand hdfs dfsare the commands that allow us to work with the filesystem. With hadoop fs it allows us to work with filesystem other than HDFS, with hdfs dfs it only allows us to work on the HDFS filesystem . In this article, it is alright to you use both interchangeably, but to keep things simple we will only be using hdfs dfs for now.

$ cd /usr/local/opt/hadoop# To show list all the commands we can use 
$ bin/hdfs dfs
# Help on the commands
$ bin/hdfs dfs -help

To use hdfs dfs anywhere you can add the following to your /etc/profile

export HADOOP_HOME="/usr/local/opt/hadoop"
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

Create new directories

To create new directories we can use the hdfs dfs -mkdir <path>

# To create folder name 'test_new_directory'
$ hdfs dfs -mkdir /test_new_directory
# To create a folder name 'new_sub_dir' under 'test_new_directory'
$ hdfs dfs -mkdir /test_new_directory/new_sub_dir

Listing directories

To list directories we can use the hdfs dfs -ls <path>

# To list the directories in root
$ hdfs dfs -ls /
# To list the directories in /test_new_directory
$ hdfs dfs -ls /test_new_directory

Copying files between local and hdfs

We can use the hdfs dfs -copyFromLocal <path_on_local> <path_on_hdfs> or hdfs dfs -put <path_on_local> <path_on_hdfs> to copy files locally to hdfs.

# Create a new test file
$ touch test.txt
# To copy files from local to hdfs using copyFromLocal command
$ hdfs dfs -copyFromLocal test.txt /test_new_directory
# Alternatively, using the put command achieves the same thing
$ hdfs dfs -put test.txt /test_new_directory

Something interesting when you list the contents of the directory. Apart from the usual such as permission, usergroup and date-time it was last updated, you can see a new column. This column, in the red box, shows the number of file replicas. In this case, the number of file replication is 1.

Number of file replicas

To copy files from hdfs to local, we can use hdfs dfs -copyToLocal <path_on_hdfs> <path_on_local> or hdfs dfs -put <path_on_hdfs> <path_on_local>.

# Using copyToLocal command
$ hdfs dfs -copyToLocal /test_new_directory/test.txt ~/my_folder
# Using get command
$ hdfs dfs -get /test_new_directory/test.txt ~/my_folder

To copy files between folders in hdfs, we can use the hdfs dfs </path_from> <path_to>

# Copy contents from test1 to test2 directory
$ hdfs dfs -cp /test1/test.txt /test2

Removing Files and Directories

Finally, to clean up after ourselves. To remove empty directories we can use hdfs dfs -rmdir <path_to_empty_directory> and hdfs dfs -rm <path_to_file> .

# Removes all files in test directory, recursively with force
$ hdfs dfs -rm -r -f /test/
# Removes empty directory
$ hdfs dfs -rmdir /test/

--

--

Jayden Chua
Jayden Chua

Written by Jayden Chua

An avid web developer constantly looking for new web technologies to dabble in, more information can be found on bit.ly/jayden-chua

No responses yet