apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mohit Jotwani <mo...@datatorrent.com>
Subject Re: Reading large HDFS files record by record
Date Fri, 29 Apr 2016 09:57:44 GMT
+1

Regards,
Mohit

On Thu, Apr 28, 2016 at 4:29 PM, Yogi Devendra <devendra.vyavahare@gmail.com
> wrote:

> Hi,
>
> My usecase involves reading from HDFS and emit each record as a separate
> tuple. Record can be either fixed length record or separator based record
> (such as newline).  Expected output is byte[] for each record.
>
> I am planning to solve this as follows:
> - New operator which extends BlockReader.
> - It will have configuration option to select mode for FIXED_LENGTH,
> SEPARATOR_BASED.
> - Use appropriate ReaderContext based on mode.
>
> Reason for having different operator than BlockReader is because output
> port signature is different than BlockReader. This new operator can be used
> in conjunction with FileSplitter.
>
> Any feedback?
>
> ~ Yogi
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message