jump to navigation

Working with Hadoop command lines May 30, 2015

Posted by Mich Talebzadeh in Uncategorized.

In my earlier blog I covered installation and configuration of Hadoop. We will now focus with working with Hadoop.

When I say working with Hadoop, I infer working with HDFS and MapReduce Engine. If you are familiar with Unix/Linux commands then it will be easier to find your way through hadoop commands.

Let us start first and look at the root directory of HDFS. Hadoop commands have notation as follows:

hdfs dfs

To see the files under “/” we do

hdfs dfs -ls /
Found 5 items
drwxr-xr-x   - hduser supergroup          0 2015-04-26 18:35 /abc
drwxr-xr-x   - hduser supergroup          0 2015-05-03 09:08 /system
drwxrwx---   - hduser supergroup          0 2015-04-14 06:46 /tmp
drwxr-xr-x   - hduser supergroup          0 2015-04-14 09:51 /user
drwxr-xr-x   - hduser supergroup          0 2015-04-24 20:42 /xyz

If you want to go one level further, you can do

hdfs -ls /user
Found 2 items
drwxr-xr-x   - hduser supergroup          0 2015-05-16 08:41 /user/hduser
drwxr-xr-x   - hduser supergroup          0 2015-04-22 16:25 /user/hive

The complete list of hadoop file system shell commands are given here.

You can run the same command from a remote host as long as you have hadoop software installed on remote host. You qualify the target by specifying the hostname and port number on which hadoop is running

hdfs dfs -ls hdfs://rhes564:9000/user
Found 2 items
drwxr-xr-x - hduser supergroup 0 2015-05-16 08:41 hdfs://rhes564:9000/user/hduser
drwxr-xr-x - hduser supergroup 0 2015-04-22 16:25 hdfs://rhes564:9000/user/hive

You can put a remote file in hdfs as follows. As an example, first create a simple text file with remote hostname in it

echo hostname > hostname.txt

Now put that file in hadoop on rhes564

hdfs dfs -put hostname.txt hdfs://rhes564:9000/user/hduser

Check that the file is stored in HDFS OK

hdfs dfs -ls hdfs://rhes564:9000/user/hduser
-rw-r--r--   2 hduser supergroup          9 2015-05-30 19:49 hdfs://rhes564:9000/user/hduser/hostname.txt

Note the full pathname of the file in HDFS. We can go ahead and delete that txt file.

hdfs dfs -rm /user/hduser/hostname.txt
15/05/30 21:10:23 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
Deleted /user/hduser/hostname.txt

To remove a full directory you can perform

hdfs dfs -rm -r /abc
Deleted /abc

To see the sizes of files and directories under / you can do

 hdfs dfs -du /
0            /system
4033091893   /tmp
22039558766  /user


No comments yet — be the first.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: