incubator-crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christian Tzolov <christian.tzo...@gmail.com>
Subject Re: writeTextFile gives back exception
Date Fri, 22 Jun 2012 14:15:23 GMT
I'm on a holiday and can not contribute to what looks like active
refactoring in the project :)
Just wanted to mention that the inconsistency between the Mem and the MR
pipeline implementations spreads beyond the writeText method. There are
couple of tests marked as ignored because of this problem.
The question is do we need the Mempipeline implementation? Are there any
non test related use cases for the Memory implementation?
Cheers,
Chris
On Jun 21, 2012 6:33 PM, "Josh Wills" <jwills@cloudera.com> wrote:

> The inconsistency on file writes for MRPipeline vs. MemPipleine should
> now be fixed. Thanks for the report, Rahul.
>
> On Thu, Jun 21, 2012 at 7:31 AM, Josh Wills <jwills@cloudera.com> wrote:
> > Yes, they really should. I'll fix the MemPipeline one to be able to
> > correctly write output to directories.
> >
> > On Thu, Jun 21, 2012 at 3:23 AM, Rahul Sharma <rahul0208@gmail.com>
> wrote:
> >> Hi Everyone,
> >>
> >> I believe, Pipeline types are not completely inter-changeable. I wrote
> >> testcases for MRPipeline but the I changed the type to MemPipeiine.
> >> All things went fine but while creating the output file using
> >> writeTextFile, it gave an error with the following stacktrace :
> >>
> >> 1    [main] ERROR com.cloudera.crunch.impl.mem.MemPipeline  -
> >> Exception writing target: Text(/home/rahul/crunchOut)
> >> java.io.FileNotFoundException: /home/rahul/crunchOut (Is a directory)
> >>        at java.io.FileOutputStream.open(Native Method)
> >>        at java.io.FileOutputStream.<init>(FileOutputStream.java:194)
> >>        at org.apache.hadoop.fs.RawLocalFileSystem
> >> $LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:189)
> >>        at org.apache.hadoop.fs.RawLocalFileSystem
> >> $LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:185)
> >>        at
> >> org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:
> >> 256)
> >>        at
> >> org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:
> >> 237)
> >>        at org.apache.hadoop.fs.ChecksumFileSystem
> >> $ChecksumFSOutputSummer.<init>(ChecksumFileSystem.java:336)
> >>        at
> >> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:
> >> 382)
> >>        at
> >> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:
> >> 365)
> >>        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:584)
> >>        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:565)
> >>        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:472)
> >>        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:464)
> >>        at
> com.cloudera.crunch.impl.mem.MemPipeline.write(MemPipeline.java:
> >> 148)
> >>        at
> >> com.cloudera.crunch.impl.mem.MemPipeline.writeTextFile(MemPipeline.java:
> >> 178)
> >>
> >>
> >> Now, when I looked it out, basically the code there  in the
> >> writeTextFile function expects a file while I was passing a folder,
> >> which is required for the MRPipeline. If I pass a file location in
> >> MemPipeline it works but breaks for MRPipeline stating back the
> >> following exception :
> >>
> >> 1 job failure(s) occurred:
> >> com.mylearning.crunch.FirstTest: SeqFile(/tmp/crunch1711673673/
> >> p1)+top1map+GBK+combine+top1reduce+asText+Text(/home/rahul/crunchOut/
> >> sample.txt)(class com.mylearning.crunch.FirstTest0):
> >> java.io.IOException: Mkdirs failed to create /home/rahul/crunchOut/
> >> sample.txt
> >>        at
> >> org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:
> >> 253)
> >>        at
> >> org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:
> >> 237)
> >>        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:565)
> >>        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:472)
> >>        at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:223)
> >>        at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:157)
> >>        at
> >> org.apache.hadoop.fs.RawLocalFileSystem.rename(RawLocalFileSystem.java:
> >> 287)
> >>        at
> >> org.apache.hadoop.fs.ChecksumFileSystem.rename(ChecksumFileSystem.java:
> >> 429)
> >>
> >> Basically internally, the getDestFile(Path src, Path dir, int index)
> >> in crunchJob class expects the path to be directory and not a file.
> >>
> >> Shouldn't the two implementations for writeTextFile be in sync  ?
> >>
> >> regards
> >> Rahul
> >
> >
> >
> > --
> > Director of Data Science
> > Cloudera
> > Twitter: @josh_wills
>
>
>
> --
> Director of Data Science
> Cloudera
> Twitter: @josh_wills
>

Mime
View raw message