crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Everett Anderson <ever...@nuna.com>
Subject Re: CrunchJobHooks.CompletionHook Inefficiency on S3NativeFileSystem
Date Tue, 24 Nov 2015 02:42:59 GMT
Josh, not to steal the thread, but I'm quite curious -- did something drive
you to using S3 instead of HDFS?

For me, I've been surprised how brittle HDFS seems out of the box in the
face of even mild load. :( We've spent a lot of time turning knobs to make
our data nodes stay responsive.


On Mon, Nov 23, 2015 at 5:45 PM, Josh Wills <josh.wills@gmail.com> wrote:

> (I don't know the answer to this, but as I also now run Crunch on top of
> S3, I'm interested in a solution.)
>
> On Mon, Nov 23, 2015 at 5:22 PM, Jeff Quinn <jeff@nuna.com> wrote:
>
>> Hey All,
>>
>> We have run in to a pretty frustrating inefficiency inside of
>> the CrunchJobHooks.CompletionHook#handleMultiPaths.
>>
>> This method loops over all of the partial output files and moves them to
>> their ultimate destination directories,
>> calling org.apache.hadoop.fs.FileSystem#rename(org.apache.hadoop.fs.Path,
>> org.apache.hadoop.fs.Path) on each partial output in a loop.
>>
>> This is no problem when the org.apache.hadoop.fs.FileSystem in question
>> is HDFS where #rename is a cheap operation, but when an implementation such
>> as S3NativeFileSystem is used it is extremely inefficient, as each
>> iteration through the loop makes a single blocking S3 API call, and this
>> loop can be extremely long when there are many thousands of partial output
>> files.
>>
>> Has anyone dealt with this before / have any ideas to work around?
>>
>> Thanks!
>>
>> Jeff
>>
>>
>>
>> *DISCLAIMER:* The contents of this email, including any attachments, may
>> contain information that is confidential, proprietary in nature, protected
>> health information (PHI), or otherwise protected by law from disclosure,
>> and is solely for the use of the intended recipient(s). If you are not the
>> intended recipient, you are hereby notified that any use, disclosure or
>> copying of this email, including any attachments, is unauthorized and
>> strictly prohibited. If you have received this email in error, please
>> notify the sender of this email. Please delete this and all copies of this
>> email from your system. Any opinions either expressed or implied in this
>> email and all attachments, are those of its author only, and do not
>> necessarily reflect those of Nuna Health, Inc.
>
>
>

-- 
*DISCLAIMER:* The contents of this email, including any attachments, may 
contain information that is confidential, proprietary in nature, protected 
health information (PHI), or otherwise protected by law from disclosure, 
and is solely for the use of the intended recipient(s). If you are not the 
intended recipient, you are hereby notified that any use, disclosure or 
copying of this email, including any attachments, is unauthorized and 
strictly prohibited. If you have received this email in error, please 
notify the sender of this email. Please delete this and all copies of this 
email from your system. Any opinions either expressed or implied in this 
email and all attachments, are those of its author only, and do not 
necessarily reflect those of Nuna Health, Inc.

Mime
View raw message