hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Devaraj Das <d...@yahoo-inc.com>
Subject Re: [VOTE] port HADOOP-6218 (Split TFile by Record Sequence Number) to hadoop 0.20/0.21
Date Mon, 12 Oct 2009 22:57:36 GMT
After an offline discussion with Hong and others on this subject, it seems to make sense. +1

On 10/12/09 3:55 PM, "Hong Tang" <htang@yahoo-inc.com> wrote:

HADOOP-6218 exposed the internal "Location" object as a global Record
Sequence Number (RecNum). The feature is useful in a number of ways:
(1) support progress reporting for upper layers (object file, zebra);
(2) use RecNum as cursor by a secondary index; (3) support aligned
split across multiple parallel TFiles. Given that TFile is still at
its early stage of being adopted, I suggest that we port the patch
back to hadoop 0.20/0.21 now.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message