hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Stempin <bstem...@rightaction.com>
Subject Re: Question about the usage of Seekable within the LineRecordReader
Date Wed, 19 Feb 2014 20:06:15 GMT
Hi Yong,
The *LineRecordReader* has a *FSDataInputStream* named *fileIn.*  It then
has a separate *Seekable* named *filePosition*, which is set equal to
*fileIn.*  *filePosition.seek()* is never called.  In the constructor,
*fileIn.seek()* is called, but never again.  For the rest of the class, the
only call made to *filePosition* is *getPos()*.  As I mentioned in the
first email, this seems redundant.

The question comes from this bit of code:

>   private long getFilePosition() throws IOException {
>     long retVal;
>     if (isCompressedInput() && null != filePosition) {
>       retVal = filePosition.getPos();
>     } else {
>       retVal = pos;
>     }
>     return retVal;
>   }


That's the only place *filePosition* is used.  If there's also a field name
*pos* that tracks the same thing, then why use the *filePosition* at all?
 Isn't that just duplicate work?

Thanks for giving me time,
Brian


On Wed, Feb 19, 2014 at 2:55 PM, java8964 <java8964@hotmail.com> wrote:

> Hi, Brian:
>
> I hope I understand your question correctly. Here is my view what provided
> from the Seekable interface.
>
> The Seekable interface also defines the "seek(long pos)" method, which
> allows the client to seek to a specified position in the underline
> InputStream.
>
> In the RecordReader, it will get the start position and an instance of the
> inputSplit, but the underline input stream is not open or available yet.
>
> The RecordReader will find the correct start position of the stream, and
> use Seekable interface to "seek" the specified start position, and start to
> read the bytes from there, to translates following bytes data into  <K, V>
> pairs.
>
> Without Seekable interface, there is no way to "seek" to the correct
> starting position.
>
> Yong
>
> ------------------------------
> Date: Wed, 19 Feb 2014 14:39:00 -0500
> Subject: Question about the usage of Seekable within the LineRecordReader
> From: bstempin@rightaction.com
> To: user@hadoop.apache.org
>
>
> Hi List,
> In order to write my own record reader, I'm taking a look at the
> *LineRecordReader* in v 2.2.0.  I notice that it uses *Seekable* in order
> to tell where it is in the file when using something other than an
> *InputStream*.  As far as I can see, the only reason its used is to get
> the current position within the file (within *getFilePosition()* ).
>
> My question is:  Why?  It looks like the file position is already tracked
> by the *pos* field.  Is there a reason to use *Seekable.getPos()* instead
> of looking at *pos*?
>
> Thanks for the help,
> Brian
>

Mime
View raw message