Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: core-user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of mattbowyers@googlemail.com
 designates 209.85.200.174 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=googlemail.com; s=gamma;
        h=mime-version:date:message-id:subject:from:to:content-type;
        b=ivZ50eLWPaKS38BENlYTl5pt4IN67rg6jG0e7HRetZYqxgC0qBqKPdE05TqDvuEBFG
         g2NGVkYx3/0yKdYqnJVImACPjUiTJaqLqUuAw1gioUbTujIOEZdiMJRaxAIzMALYPYer
         JHJikYCk95XmVODNZDrvR1CwZiPs+6TOFHUik=
MIME-Version: 1.0
Date: Sun, 10 May 2009 22:30:10 +0100
Message-ID: <e1429e600905101430ied7860w5d2a3010c8f0f248@mail.gmail.com>
Subject: sub 60 second performance
From: Matt Bowyer <mattbowyers@googlemail.com>
To: core-user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=000e0cd32c48fce45704699590f9

--000e0cd32c48fce45704699590f9
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

Hi,

I am trying to do 'on demand map reduce' - something which will return in
reasonable time (a few seconds).

My dataset is relatively small and can fit into my datanode's memory. Is it
possible to keep a block in the datanode's memory so on the next job the
response will be much quicker? The majority of the time spent during the job
run appears to be during the 'HDFS_BYTES_READ' part of the job. I have tried
using the setNumTasksToExecutePerJvm but the block still seems to be cleared
from memory after the job.

thanks!

--000e0cd32c48fce45704699590f9--