hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ankur C. Goel" <gan...@yahoo-inc.com>
Subject Re: How to maintain record boundaries
Date Fri, 11 May 2012 22:34:41 GMT
Record reader implementations are typically written to honor record
boundaries. This means that while reading a split data they will continue
reading if the end of split has reached BUT end of record is yet to be


On 5/11/12 5:15 AM, "Shreya.Pal@cognizant.com" <Shreya.Pal@cognizant.com>

>When we store data into HDFS, it gets broken into small pieces and
>distributed across the cluster based on Block size for the file.
>While processing the data using MR program I want a particular record as
>a whole without it being split across nodes, but the data has already
>been split and stored in HDFS when I loaded the data.
>How would I make sure that my record doesn't get split, how would my
>Input format make a difference now ?
>This e-mail and any files transmitted with it are for the sole use of the
>intended recipient(s) and may contain confidential and privileged
>information. If you are not the intended recipient(s), please reply to
>the sender and destroy all copies of the original message. Any
>unauthorized review, use, disclosure, dissemination, forwarding, printing
>or copying of this email, and/or any action taken in reliance on the
>contents of this e-mail is strictly prohibited and may be unlawful.

View raw message