Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 744CED750 for ; Wed, 29 Aug 2012 23:00:32 +0000 (UTC) Received: (qmail 72322 invoked by uid 500); 29 Aug 2012 23:00:28 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 72028 invoked by uid 500); 29 Aug 2012 23:00:27 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Delivered-To: moderator for user@hadoop.apache.org Received: (qmail 33890 invoked by uid 99); 29 Aug 2012 07:57:46 -0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of airbots@gmail.com designates 209.85.212.48 as permitted sender) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=7xS1ootVXb8Bc5RL+310+QQX155I3j2F5pkyTdfwMo8=; b=WpMKODU3zGZhY+HM2ny7R+YUnAMA52aD7e4Fwxl/Lrf07aOs3rGAffxnmHIRSol2Qt kmdlHRSmBowuakrmdw8mJ3QMDJFv2LQCAHrW8vSKSaN8ZCNhtjJZ2ef8QTHDQ/e01Aal GqRbgF9FQ18ChbB8+oquo4I8ojn57Dj/7MP9MtJfQ7Oxiv0uBAAihFZOmU8OxHOBH3Mm I+73QeHwNa49EyGVHUJS5rTmz//wz6BaHbsviRbLeduf2+rhRyKUfC7EMrA3WfJ23WBO TLzE6WOS9+HsHnQIOxt/UqD1FHrCvOdQBj99VsUmEmSVTB7gWVu4qG8u/E/LgEL+hcsW e/tA== MIME-Version: 1.0 In-Reply-To: References: Date: Wed, 29 Aug 2012 02:57:18 -0500 Message-ID: Subject: Re: Custom InputFormat errer From: Chen He To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=bcaec5015f2b71361c04c862e812 --bcaec5015f2b71361c04c862e812 Content-Type: text/plain; charset=ISO-8859-1 Hi Harsh Thank you for your reply. Do you mean I need to change the FileSplit to avoid those errors I mentioned happen? Regards! Chen On Wed, Aug 29, 2012 at 2:46 AM, Harsh J wrote: > Hi Chen, > > Does your record reader and mapper handle the case where one map split > may not exactly get the whole record? Your case is not very different > from the newlines logic presented here: > http://wiki.apache.org/hadoop/HadoopMapReduce > > On Wed, Aug 29, 2012 at 11:13 AM, Chen He wrote: > > Hi guys > > > > I met a interesting problem when I implement my own custom InputFormat > which > > extends the FileInputFormat.(I rewrite the RecordReader class but not the > > InputSplit class) > > > > My recordreader will take following format as a basic record: (my > > recordreader extends the LineRecordReader. It returns a record if it > meets > > #Trailer# and contains #Header#. I only have one input file that is > composed > > of many of following basic record) > > > > #Header# > > .....(many lines, may be 0 lines or 1000 lines, it varies) > > #Trailer# > > > > Everything works fine if above basic input unit in a file is integer > times > > of mapper. For example, I use 2 mappers and there are two basic records > in > > my input file. Or I use 3 mappers and there are 6 basic units in the > input > > file. > > > > However, if I use 4 mappers but there are 3 basic units in the input > > file(not integer times). The final output is incorrect. The "Map Input > > Bytes" in the job counter is also less than the input file size. How can > I > > fix it? Do I need to rewrite the inputSplit? > > > > Any reply will be appreciated! > > > > Regards! > > > > Chen > > > > -- > Harsh J > --bcaec5015f2b71361c04c862e812 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Hi Harsh

Thank you for your reply. Do you= mean I need to change the FileSplit to avoid those errors I mentioned happ= en?

Regards!

Chen

On Wed, Aug 29, 2012 at 2:46 AM, Harsh J <= harsh@cloudera.com> wrote:
Hi Chen,

Does your record reader and mapper handle the case where one map split
may not exactly get the whole record? Your case is not very different
from the newlines logic presented here:
http://wiki.apache.org/hadoop/HadoopMapReduce

On Wed, Aug 29, 2012 at 11:13 AM, Chen He <airbots@gmail.com> wrote:
> Hi guys
>
> I met a interesting problem when I implement my own custom InputFormat= which
> extends the FileInputFormat.(I rewrite the RecordReader class but not = the
> InputSplit class)
>
> My recordreader will take following format as a basic record: (my
> recordreader extends the LineRecordReader. It returns a record if it m= eets
> #Trailer# and contains #Header#. I only have one input file that is co= mposed
> of many of following basic record)
>
> #Header#
> .....(many lines, may be 0 lines or 1000 lines, it varies)
> #Trailer#
>
> Everything works fine if above basic input unit in a file is integer t= imes
> of mapper. For example, I use 2 mappers and there are two basic record= s in
> my input file. Or I use 3 mappers and there are 6 basic units in the i= nput
> file.
>
> However, if I use 4 mappers but there are 3 basic units in the input > file(not integer times). The final output is incorrect. The "Map = Input
> Bytes" in the job counter is also less than the input file size. = How can I
> fix it? Do I need to rewrite the inputSplit?
>
> Any reply will be appreciated!
>
> Regards!
>
> Chen



--
Harsh J

--bcaec5015f2b71361c04c862e812--