hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dieter Plaetinck <dieter.plaeti...@intec.ugent.be>
Subject Re: Are hadoop fs commands serial or parallel
Date Mon, 23 May 2011 10:12:32 GMT
On Fri, 20 May 2011 10:11:13 -0500
Brian Bockelman <bbockelm@cse.unl.edu> wrote:

> 
> On May 20, 2011, at 6:10 AM, Dieter Plaetinck wrote:
> 
> > What do you mean clunky?
> > IMHO this is a quite elegant, simple, working solution.
> 
> Try giving it to a user; watch them feed it a list of 10,000 files;
> watch the machine swap to death and the disks uselessly thrash.
> 
> > Sure this spawns multiple processes, but it beats any
> > api-overcomplications, imho.
> > 
> 
> Simple doesn't imply scalable, unfortunately.
> 
> Brian

True, I assumed if anyone wants this, he knows what he's doing (i.e.
the files could be small and already in the Linux block cache).
Because why would anyone read files in parrallel if that causes disk
seeks all over the place? Ideally, you should tune for 1 sequential read
per disk at the time. In that respect, I definitely agree that some
clever logic in userspace to optimize disk reads (across a bunch of
different possible hardware setups) would be beneficial.

Dieter

Mime
View raw message