hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Bowyer <mattbowy...@googlemail.com>
Subject Re: sub 60 second performance
Date Sun, 10 May 2009 23:53:19 GMT
Thanks Jason, how can I get access to the particular block?

do you mean create a static map inside the task (add the values).. and check
if populated on the next run?

or is there a more elegant/tried&tested solution?

thanks again

On Mon, May 11, 2009 at 12:41 AM, jason hadoop <jason.hadoop@gmail.com>wrote:

> You can cache the block in your task, in a pinned static variable, when you
> are reusing the jvms.
>
> On Sun, May 10, 2009 at 2:30 PM, Matt Bowyer <mattbowyers@googlemail.com
> >wrote:
>
> > Hi,
> >
> > I am trying to do 'on demand map reduce' - something which will return in
> > reasonable time (a few seconds).
> >
> > My dataset is relatively small and can fit into my datanode's memory. Is
> it
> > possible to keep a block in the datanode's memory so on the next job the
> > response will be much quicker? The majority of the time spent during the
> > job
> > run appears to be during the 'HDFS_BYTES_READ' part of the job. I have
> > tried
> > using the setNumTasksToExecutePerJvm but the block still seems to be
> > cleared
> > from memory after the job.
> >
> > thanks!
> >
>
>
>
> --
> Alpha Chapters of my book on Hadoop are available
> http://www.apress.com/book/view/9781430219422
> www.prohadoopbook.com a community for Hadoop Professionals
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message