java.lang.IllegalArgumentException: Wrong FS: expected: file:/// – Hadoop

Today while writing a java program to interact with the Hadoop File System(HDFS) I was getting the below mentioned exception (Wrong FS: expected: file:///).

Note : My hadoop cluster is running on Cloudera CDH 5.

i.e – Exception in thread “main” java.lang.IllegalArgumentException: Wrong FS: hdfs://ibc.techidiocy.com:8020/user/saurav/input.txt, expected: file:///

And below is the command that I was trying to execute in my local system to read a file from HDFS.

java -jar HDFSInteraction.jar hdfs://ibc.techidiocy.com:8020/user/saurav/input.txt

Here is the complete stack trace.

Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: hdfs://ibc.techidiocy.com:8020/user/saurav/input.txt, expected: file:///
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:644)
at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:79)
at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:506)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:722)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:501)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:398)
at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:137)
at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:765)

From the stack trace it was clear to me that somehow it is not initializing the Configuration object properly because of that my FileSystem is pointing to the local file system instead of hadoop file system (hdfs) and in majority of the cases it happened because it is not able to find out the config files core-site.xml and hdfs-site.xml.

Below is my java code where i was trying to add these two files as resources so that they can be picked while my configuration object is initialized.

Configuration configuration = new Configuration();
configuration.addResource("/etc/hadoop/conf/core-site.xml");
configuration.addResource("/etc/hadoop/conf/hdfs-site.xml");
FileSystem hdfsFileSystem = FileSystem.get(configuration);
FSDataInputStream fsDataInputStream = hdfsFileBroker.getHdfsFileSystem().open(new Path("/user/saurav/input.txt"));
InputStreamReader inputStreamReader = new InputStreamReader(fsDataInputStream);
BufferedReader bufferedReader = new BufferedReader(inputStreamReader);
while((line=bufferedReader.readLine)!=null){
// some file processing code here.
}
bufferedReader .close();
//TODO Exception handling

My assumption was it will pick up these files from the HADOOP_CLASSPATH that was already set in my local file system. I tried multiple options to make it work , but somehow I was not able to make it work.

Finally I looked at the JavaDoc for the org.apache.hadoop.conf.Configuration and org.apache.hadoop.fs.FileSystem class , and interestingly I found 2 solutions to the above mentioned problem.

Solution 1 : I found this solution when I was looking at the java doc for the Configuration class where other overloaded versions of addResource() methods were present and one of them takes a Path object , that refers to the absolute location of the file on local file system. So , I gave a try to that and this time I was successfully able to read the file.
Below are the changes that I made to use the other overloaded version.

Configuration configuration = new Configuration();
configuration.addResource(new Path("/etc/hadoop/conf/core-site.xml"));
configuration.addResource(new Path("/etc/hadoop/conf/hdfs-site.xml"));
FileSystem hdfsFileSystem = FileSystem.get(configuration);

Observe the difference carefully in the 1st case I was passing the core-site.xml and hdfs-site.xml location as a String and in the 2nd case I am passing it as Path object that will look into the local file system using absolute path.

Solution 2 : I found this solution when I was looking at FileSystem class java doc , which provides you two different ways of creating a FileSystem instance , one of the way we have already seen above where we are passing the Configuration object to the static get method of the FileSystem class , in the other overloaded version it allows you to pass the URI of the underlying hdfs that you want to point using your file system instance.

With this solution you can get rid of adding resources to the Configuration object. Hence , after these changes my code looks like this.

Configuration configuration = new Configuration();
FileSystem hdfsFileSystem = FileSystem.get(new URI("hdfs://ibc.techidiocy.com:8020"),configuration);

I have tested both of these solutions and both of them worked fine in my case. Meanwhile I was looking into the 1st solution where I was trying to initialize the Configuration by reading the config files present in the class path .Till now no luck :( .

Above presented solutions will also work in the case when you are getting the below mentioned exception and you are sure that file is present in your hdfs.

java.io.FileNotFoundException: File /user/saurav/input.txt does not exist
at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:511)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:722)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:501)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:398)
at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:137)
at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:765)

┬áReason : Your Configuration object is not initialized properly because of that it is trying to find this file in your local file system where it doesn’t exists because file “/user/saurav/input.txt” exists on HDFS.

Hope it will help someone beginner like me in Hadoop.

Cheers :)

Let'sConnect

Saurabh Jain

A Developer working on Enterprise applications ,Distributed Systems, Hadoop and BigData.This blog is about my experience working mostly on Java technologies ,NoSQL ,git , maven and Hadoop ecosystem.
Let'sConnect

Share and Enjoy

  • Facebook
  • Twitter
  • Delicious
  • LinkedIn
  • StumbleUpon
  • Add to favorites
  • Email
  • RSS
Add Comment Register



Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>