Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: common-user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of markq2011@gmail.com
 designates 209.85.213.48 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:date:message-id:subject:from:to:content-type;
        b=kVJbENmx5C9uMiaC8AA8lGHVjQK1Cr4S35h7KmA2uLTl+d8NI0e2Gtseg6SHcoIxGl
         SuQ3il1Bu17tFnQMOBsFJc+TvBYlWmCkJaJmCtLTJPR3UcYsYsVCS5yMw9QpkKjeTBFm
         HqpuYY2vAAiH9bN+hWYhQwD/XZgF8a0Trtkik=
MIME-Version: 1.0
Date: Tue, 26 Apr 2011 11:49:57 -0700
Message-ID: <BANLkTinAKzt=nTkccXch2AVY0fCpPdEhgQ@mail.gmail.com>
Subject: Reading from File
From: Mark question <markq2011@gmail.com>
To: common-user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=0016368e1ee26a2b0704a1d6cadd

--0016368e1ee26a2b0704a1d6cadd
Content-Type: text/plain; charset=ISO-8859-1

Hi,

   My mapper opens a file and read records using next() . However, I want to
stop reading if there is no memory available. What confuses me here is that
even though I'm reading record by record with next(), hadoop actually reads
them in dfs.block.size. So, I have two questions:

1. Is it true that even if I set dfs.block.size to 512 MB, then at least one
block is loaded in memory for mapper to process (part of inputSplit)?

2. How can I read multiple records from a sequenceFile at once and will it
make a difference ?

Thanks,
Mark

--0016368e1ee26a2b0704a1d6cadd--