Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C188F17F20 for ; Fri, 3 Oct 2014 14:43:53 +0000 (UTC) Received: (qmail 17752 invoked by uid 500); 3 Oct 2014 14:43:48 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 17620 invoked by uid 500); 3 Oct 2014 14:43:47 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 17609 invoked by uid 99); 3 Oct 2014 14:43:47 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Oct 2014 14:43:47 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.218.42] (HELO mail-oi0-f42.google.com) (209.85.218.42) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Oct 2014 14:43:21 +0000 Received: by mail-oi0-f42.google.com with SMTP id a141so864757oig.15 for ; Fri, 03 Oct 2014 07:43:19 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=EVpNvHI89IDRwxzQKNINODjyZBV0DcgHz8JJk1N2crc=; b=JipxkFxXLUGPX5AYqaKKTu+mIs+Pq4LVVIcRbpy1BSqVJsMboDO58XjhhrsNDUKut2 ymUGCJdg7dz3RSmfQWzIxll71c8HHrLaIAiJwu+xEJmJTHiwulA6LxuEs/YIa+aCk87c dOdvLxhWf7XdYUas0KZQCrXBqyEJ2rrjnRZEzJl6n3dlYVhu5alfeUjzYUWCc2s3uRZQ 6WN7I96uYcRLkimj+ojiIIPJv4bZttOPch1xXk3jU8N2Oc9ZCAXFgxyzpQh3OZ79SM1x 8/5pPfJBKFHl1JiLgvvxQrKuIkFjignb4TJqA3cWOvIy/UJF/lMMjpg9vve3hnZCZpa5 sPlQ== X-Gm-Message-State: ALoCoQlhitC4awzZJbkR7pAPyIAcMUjgaTp3dqBuYfgGm7YDjey5+G76pG2DWicbBrntjKOKmGsx MIME-Version: 1.0 X-Received: by 10.182.135.201 with SMTP id pu9mr3964089obb.81.1412347399797; Fri, 03 Oct 2014 07:43:19 -0700 (PDT) Sender: denton@g.clemson.edu Received: by 10.202.10.82 with HTTP; Fri, 3 Oct 2014 07:43:19 -0700 (PDT) In-Reply-To: <47F224B0-54A7-4BDC-9D8B-08EF34A6DDF2@gmail.com> References: <47F224B0-54A7-4BDC-9D8B-08EF34A6DDF2@gmail.com> Date: Fri, 3 Oct 2014 10:43:19 -0400 X-Google-Sender-Auth: 6a6s0Yjp9y4rl43Q5jziz1THuv4 Message-ID: Subject: Re: TestDFSIO with FS other than defaultFS From: Jeffrey Denton To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=089e01160420109383050485c194 X-Virus-Checked: Checked by ClamAV on apache.org --089e01160420109383050485c194 Content-Type: text/plain; charset=UTF-8 Jay, I have not tried the bigtop hcfs tests. Any tips on how to get started with those? Our configuration looks similar except for the Gluster specific options and both *fs.default.name *(and *fs.defaultFS*) as we don't want OrangeFS to be the default fs for this Hadoop cluster. I don't think the problem is caused by a configuration issue as the tera* suite works. The problem is with how TestDFSIO determines the "fs" instance: FileSystem fs = FileSystem.get(config); This basically forces the fs to be fs.defaultFS. Shouldn't TestDFSIO be capable of handling a non-default URI set via: -Dtest.build.data=ofs://test/user/$USER/TestDFSIO I think TestDFSIO should use: FileSystem get(URI uri, Configuration conf) with *uri* being the test.build.data property, if specified, or a sensible default based on the defaultFS scheme and authority as well as the rest of the desired URI. This means test.build.dir should always be treated as a *URI* rather than a *String* so that the default value returned by the method getBaseDir, in class TestDFSIO, can be based off of the defaultFS. Currently, this isn't the case: private static String getBaseDir(Configuration conf) { return conf.get("test.build.data","/benchmarks/TestDFSIO"); } Thoughts? Thanks, Jeff On Thu, Oct 2, 2014 at 4:02 PM, Jay Vyas wrote: > Hi jeff. Wrong fs means that your configuration doesn't know how to bind > ofs to the OrangeFS file system class. > > You can debug the configuration using fs.dumpConfiguration(....), and you > will likely see references to hdfs in there. > > By the way, have you tried our bigtop hcfs tests yet? We now support over > 100 Hadoop file system compatibility tests... > > You can see a good sample of what parameters should be set for a hcfs > implementation here: > https://github.com/gluster/glusterfs-hadoop/blob/master/conf/core-site.xml > > On Oct 2, 2014, at 12:42 PM, Jeffrey Denton wrote: > > Hello all, > > I'm trying to run TestDFSIO using a different file system other than the > configured defaultFS and it doesn't work for me: > > $ hadoop org.apache.hadoop.fs.TestDFSIO > -Dtest.build.data=ofs://test/user/$USER/TestDFSIO -write -nrFiles 1 > -fileSize 10240 > > 14/10/02 11:24:19 INFO fs.TestDFSIO: TestDFSIO.1.7 > > 14/10/02 11:24:19 INFO fs.TestDFSIO: nrFiles = 1 > > 14/10/02 11:24:19 INFO fs.TestDFSIO: nrBytes (MB) = 10240.0 > > 14/10/02 11:24:19 INFO fs.TestDFSIO: bufferSize = 1000000 > > 14/10/02 11:24:19 INFO fs.TestDFSIO: baseDir = > ofs://test/user/denton/TestDFSIO > > 14/10/02 11:24:19 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > > 14/10/02 11:24:20 WARN hdfs.BlockReaderLocal: The short-circuit local > reads feature cannot be used because libhadoop cannot be loaded. > > 14/10/02 11:24:20 INFO fs.TestDFSIO: creating control file: 10737418240 > bytes, 1 files > > *java.lang.IllegalArgumentException: Wrong FS: > ofs://test/user/denton/TestDFSIO/io_control, expected: hdfs://dsci* > > at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:643) > > at > org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:191) > > at > org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:102) > > at > org.apache.hadoop.hdfs.DistributedFileSystem$11.doCall(DistributedFileSystem.java:595) > > at > org.apache.hadoop.hdfs.DistributedFileSystem$11.doCall(DistributedFileSystem.java:591) > > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > > at > org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:591) > > at org.apache.hadoop.fs.TestDFSIO.createControlFile(TestDFSIO.java:290) > > at org.apache.hadoop.fs.TestDFSIO.run(TestDFSIO.java:751) > > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at org.apache.hadoop.fs.TestDFSIO.main(TestDFSIO.java:650) > > At Clemson University, we're running HDP-2.1 (Hadoop 2.4.0.2.1) on 16 > data nodes and 3 separate master nodes for the resource manager and two > namenodes; however, for this test, the data nodes are really being used > to run the map tasks with job output being written to 16 separate OrangeFS > servers. > > Ideally, we would like the 16 HDFS data nodes and two namenodes to be the > defaultFS, but would also like the capability to run jobs using other > OrangeFS installations. > > The above error does not occur when OrangeFS is configured to be the > defaultFS. Also, we have no problems running teragen/terasort/teravalidate > when OrangeFS IS NOT the defaultFS. > > So, is it possible to run TestDFSIO using a FS other than the defaultFS? > > If you're interested in the OrangeFS classes, they can be found here > > : > > I have not yet run any of the FS tests > > released with 2.5.1 but hope to soon. > > Regards, > > Jeff Denton > OrangeFS Developer > Clemson University > denton@clemson.edu > > > > > > > > --089e01160420109383050485c194 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Jay,

I have not tried the bi= gtop hcfs tests. Any tips on how to get started with those?
<= br>
Our configuration looks similar except for the Gluster specific op= tions and both=C2=A0fs.default.name (and=C2=A0fs.defaultFS) as we don't wan= t OrangeFS to be the default fs for this Hadoop cluster. I don't think = the problem is caused by a configuration issue as the tera* suite works.

The problem is with how TestDFSIO determines the &quo= t;fs" instance:
FileSystem fs =3D FileSystem.get(config);
This basically forces the fs to be fs.defaultFS. Shouldn&= #39;t TestDFSIO be capable of handling a non-default URI set via:
=C2=A0-Dtest.build.data=3Dofs://test/user/$USER/TestDFSIO

I think TestDFSIO should use:

FileSystem get(URI uri, Configuration conf)

with uri being the test.build.data property, if= specified, or a sensible default based on the defaultFS scheme and authori= ty as well as the rest of the desired URI.

This me= ans test.build.dir should always be treated as a URI rather than a <= i>String so that the default value returned by the method=C2=A0getBaseDir, in class TestDFSIO, can be based off o= f the defaultFS. Currently, this isn't the case:

private static String getBaseDir(Configuration conf) {
<= font face=3D"courier new, monospace">=C2=A0 =C2=A0 return conf.get("te= st.build.data","/benchmarks/TestDFSIO");
}

Thoughts?

Thanks,
Jeff

<= br>
On Thu, Oct 2, 2014 at 4:02 PM, Jay Vyas <jayunit100.apache@gmail.com> wrote:
Hi jeff.=C2=A0 Wrong fs means that = your configuration doesn't know how to bind ofs to the OrangeFS file sy= stem class.

You can debug the configuration us= ing fs.dumpConfiguration(....), and you will likely see references to hdfs = in there.

By the way, have you tried our bigtop hc= fs tests yet? We now support over 100 Hadoop file system compatibility test= s...

You can see a good sample of what parameters = should be set for a hcfs implementation here: https://github.com/gluster/glusterfs-hadoop/blob/master/conf/core-site.xml=

On Oct 2, 2014, at 12:42 PM, Jeff= rey Denton <dent= on@clemson.edu> wrote:

<= div dir=3D"ltr">Hello all,

I'm trying to run TestDFS= IO using a different file system other than the configured defaultFS and it= doesn't work for me:

$ hadoop org.apache.hadoop.fs.TestDF= SIO -Dtest.build.data=3Dofs://test/user/$USER/TestDFSIO -write -nrFiles 1 -= fileSize 10240
14/10/02 11:24:19 INFO fs.TestDFSIO: TestDFSIO.1.7=
<= /blockquote>14/= 10/02 11:24:19 INFO fs.TestDFSIO: nrFiles =3D 1
14/10/02 11:24:19 INFO f= s.TestDFSIO: nrBytes (MB) =3D 10240.0
14/10/02 11:24:19 INFO fs.TestDFSI= O: bufferSize =3D 1000000
14/10/02 11:24:19 INFO fs.TestDFSIO: baseDir = =3D ofs://test/user/denton/TestDFSIO
14/10/02 11:24:19 WARN util.NativeC= odeLoader: Unable to load native-hadoop library for your platform... using = builtin-java classes where applicable
14/10/02 11:24:20 WARN hdfs.BlockR= eaderLocal: The short-circuit local reads feature cannot be used because li= bhadoop cannot be loaded.
14/10/02 11:24:20 INFO fs.TestDFSIO: creating = control file: 10737418240 bytes, 1 files
java.lang.IllegalArgumentExc= eption: Wrong FS: ofs://test/user/denton/TestDFSIO/io_control, expected: hd= fs://dsci
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.ja= va:643)
at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(Dist= ributedFileSystem.java:191)
at org.apache.hadoop.hdfs.DistributedFileSys= tem.access$000(DistributedFileSystem.java:102)
at org.apache.hadoop.hdf= s.DistributedFileSystem$11.doCall(DistributedFileSystem.java:595)
at org= .apache.hadoop.hdfs.DistributedFileSystem$11.doCall(DistributedFileSystem.j= ava:591)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSyst= emLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.= delete(DistributedFileSystem.java:591)
at org.apache.hadoop.fs.TestDFSIO= .createControlFile(TestDFSIO.java:290)
at org.apache.hadoop.fs.TestDFSIO= .run(TestDFSIO.java:751)
at org.apache.hadoop.util.ToolRunner.run(ToolRu= nner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:8= 4)
at org.apache.hadoop.fs.= TestDFSIO.main(TestDFSIO.java:650)

At Clemson Universi= ty, we're running HDP-2.1 (Hadoop=C2=A02.4.0.2.1)=C2=A0on 16 data nodes and 3 sepa= rate master nodes for the resource manager and two namenodes; however, for = this test, the data nodes are really being used to=C2=A0run the map tasks w= ith job output being written to 16 separate OrangeFS servers.

Ideally, we would like the 16 HDFS data nodes and two namenodes to be= the defaultFS, but would also like the capability to run jobs using other = OrangeFS installations.=C2=A0

The above error does= not occur when OrangeFS is configured to be the defaultFS. Also, we have n= o problems running teragen/terasort/teravalidate when OrangeFS IS NOT the d= efaultFS.

So, is it possible to run TestDFSIO usin= g a FS other than the defaultFS?

If you're int= erested in the OrangeFS classes, they can be found he= re:

I have not yet run any of the FS tests released with 2.5.1 but= hope to soon.

Regards,

Jeff Denton<= /div>
OrangeFS Developer
Clemson University

--089e01160420109383050485c194--