hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeffrey Denton <den...@clemson.edu>
Subject Re: TestDFSIO with FS other than defaultFS
Date Fri, 03 Oct 2014 14:43:19 GMT
Jay,

I have not tried the bigtop hcfs tests. Any tips on how to get started with
those?

Our configuration looks similar except for the Gluster specific options and
both *fs.default.name <http://fs.default.name> *(and *fs.defaultFS*) as we
don't want OrangeFS to be the default fs for this Hadoop cluster. I don't
think the problem is caused by a configuration issue as the tera* suite
works.

The problem is with how TestDFSIO determines the "fs" instance:

FileSystem fs = FileSystem.get(config);

This basically forces the fs to be fs.defaultFS. Shouldn't TestDFSIO be
capable of handling a non-default URI set via:
 -Dtest.build.data=ofs://test/user/$USER/TestDFSIO

I think TestDFSIO should use:

FileSystem get(URI uri, Configuration conf)

with *uri* being the test.build.data property, if specified, or a sensible
default based on the defaultFS scheme and authority as well as the rest of
the desired URI.

This means test.build.dir should always be treated as a *URI* rather than a
*String* so that the default value returned by the method getBaseDir, in
class TestDFSIO, can be based off of the defaultFS. Currently, this isn't
the case:

private static String getBaseDir(Configuration conf) {
    return conf.get("test.build.data","/benchmarks/TestDFSIO");
}

Thoughts?

Thanks,
Jeff


On Thu, Oct 2, 2014 at 4:02 PM, Jay Vyas <jayunit100.apache@gmail.com>
wrote:

> Hi jeff.  Wrong fs means that your configuration doesn't know how to bind
> ofs to the OrangeFS file system class.
>
> You can debug the configuration using fs.dumpConfiguration(....), and you
> will likely see references to hdfs in there.
>
> By the way, have you tried our bigtop hcfs tests yet? We now support over
> 100 Hadoop file system compatibility tests...
>
> You can see a good sample of what parameters should be set for a hcfs
> implementation here:
> https://github.com/gluster/glusterfs-hadoop/blob/master/conf/core-site.xml
>
> On Oct 2, 2014, at 12:42 PM, Jeffrey Denton <denton@clemson.edu> wrote:
>
> Hello all,
>
> I'm trying to run TestDFSIO using a different file system other than the
> configured defaultFS and it doesn't work for me:
>
> $ hadoop org.apache.hadoop.fs.TestDFSIO
> -Dtest.build.data=ofs://test/user/$USER/TestDFSIO -write -nrFiles 1
> -fileSize 10240
>
> 14/10/02 11:24:19 INFO fs.TestDFSIO: TestDFSIO.1.7
>
> 14/10/02 11:24:19 INFO fs.TestDFSIO: nrFiles = 1
>
> 14/10/02 11:24:19 INFO fs.TestDFSIO: nrBytes (MB) = 10240.0
>
> 14/10/02 11:24:19 INFO fs.TestDFSIO: bufferSize = 1000000
>
> 14/10/02 11:24:19 INFO fs.TestDFSIO: baseDir =
> ofs://test/user/denton/TestDFSIO
>
> 14/10/02 11:24:19 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
>
> 14/10/02 11:24:20 WARN hdfs.BlockReaderLocal: The short-circuit local
> reads feature cannot be used because libhadoop cannot be loaded.
>
> 14/10/02 11:24:20 INFO fs.TestDFSIO: creating control file: 10737418240
> bytes, 1 files
>
> *java.lang.IllegalArgumentException: Wrong FS:
> ofs://test/user/denton/TestDFSIO/io_control, expected: hdfs://dsci*
>
> at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:643)
>
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:191)
>
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:102)
>
> at
> org.apache.hadoop.hdfs.DistributedFileSystem$11.doCall(DistributedFileSystem.java:595)
>
> at
> org.apache.hadoop.hdfs.DistributedFileSystem$11.doCall(DistributedFileSystem.java:591)
>
> at
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:591)
>
> at org.apache.hadoop.fs.TestDFSIO.createControlFile(TestDFSIO.java:290)
>
> at org.apache.hadoop.fs.TestDFSIO.run(TestDFSIO.java:751)
>
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at org.apache.hadoop.fs.TestDFSIO.main(TestDFSIO.java:650)
>
> At Clemson University, we're running HDP-2.1 (Hadoop 2.4.0.2.1) on 16
> data nodes and 3 separate master nodes for the resource manager and two
> namenodes; however, for this test, the data nodes are really being used
> to run the map tasks with job output being written to 16 separate OrangeFS
> servers.
>
> Ideally, we would like the 16 HDFS data nodes and two namenodes to be the
> defaultFS, but would also like the capability to run jobs using other
> OrangeFS installations.
>
> The above error does not occur when OrangeFS is configured to be the
> defaultFS. Also, we have no problems running teragen/terasort/teravalidate
> when OrangeFS IS NOT the defaultFS.
>
> So, is it possible to run TestDFSIO using a FS other than the defaultFS?
>
> If you're interested in the OrangeFS classes, they can be found here
> <http://www.orangefs.org/svn/orangefs/branches/denton.hadoop2.trunk/src/client/hadoop/orangefs-hadoop2/src/main/java/org/apache/hadoop/fs/ofs/>
> :
>
> I have not yet run any of the FS tests
> <http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/filesystem/testing.html>
> released with 2.5.1 but hope to soon.
>
> Regards,
>
> Jeff Denton
> OrangeFS Developer
> Clemson University
> denton@clemson.edu
>
>
>
>
>
>
>
>

Mime
View raw message