Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5578FDD10 for ; Wed, 29 Aug 2012 07:47:00 +0000 (UTC) Received: (qmail 96607 invoked by uid 500); 29 Aug 2012 07:46:55 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 96530 invoked by uid 500); 29 Aug 2012 07:46:55 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 96514 invoked by uid 99); 29 Aug 2012 07:46:55 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Aug 2012 07:46:55 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FSL_RCVD_USER,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of harsh@cloudera.com designates 209.85.214.176 as permitted sender) Received: from [209.85.214.176] (HELO mail-ob0-f176.google.com) (209.85.214.176) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Aug 2012 07:46:47 +0000 Received: by obbtb18 with SMTP id tb18so542112obb.35 for ; Wed, 29 Aug 2012 00:46:27 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:x-gm-message-state; bh=SP4sZRn9Ilg5IlOqFGGuWi7p2/Sk04TRACb80qKUDLo=; b=j9AVhRQQaaixwzMKyvN8lgr2K0/fhOYsbTBHpxwkB2Q2XtI91UXK0MjKi1TQ44ZRfZ D16CY5Fs5hN0jG+kte6FPXkybkOjA+LDw/qTGgZuc1hTFDLpKVOUZ7L+P2i/w7bCnN+H IHE9Ghk5dZtzpL0P7AELhMCSJ/uEC7HWYwIPhykDXKmJHXV1R0C6kW5QhUQcm9LBrdqm 9Sijx5RhE29oU1ptd8bQJjbu+UvHayIZCtw4Ubtj9cm1HswBqvD6xnsW/Ex+8EE/okhV RpKZlrU3MLIgro8hOP3q5s7VqNbsS3b8GV1S+Nw3ubLXlEx8eubRuQTKLHFzh1nqHgfH bzdg== Received: by 10.182.50.103 with SMTP id b7mr525438obo.15.1346226386952; Wed, 29 Aug 2012 00:46:26 -0700 (PDT) MIME-Version: 1.0 Received: by 10.76.11.168 with HTTP; Wed, 29 Aug 2012 00:46:06 -0700 (PDT) In-Reply-To: References: From: Harsh J Date: Wed, 29 Aug 2012 13:16:06 +0530 Message-ID: Subject: Re: Custom InputFormat errer To: user@hadoop.apache.org Content-Type: text/plain; charset=ISO-8859-1 X-Gm-Message-State: ALoCoQmPU2Mqhlngbn0cQDq7G3Oy7R/sSu7mpldmgBJCs4lURZmIAd4oQSnI0GjSnyyU2QjiXnT7 Hi Chen, Does your record reader and mapper handle the case where one map split may not exactly get the whole record? Your case is not very different from the newlines logic presented here: http://wiki.apache.org/hadoop/HadoopMapReduce On Wed, Aug 29, 2012 at 11:13 AM, Chen He wrote: > Hi guys > > I met a interesting problem when I implement my own custom InputFormat which > extends the FileInputFormat.(I rewrite the RecordReader class but not the > InputSplit class) > > My recordreader will take following format as a basic record: (my > recordreader extends the LineRecordReader. It returns a record if it meets > #Trailer# and contains #Header#. I only have one input file that is composed > of many of following basic record) > > #Header# > .....(many lines, may be 0 lines or 1000 lines, it varies) > #Trailer# > > Everything works fine if above basic input unit in a file is integer times > of mapper. For example, I use 2 mappers and there are two basic records in > my input file. Or I use 3 mappers and there are 6 basic units in the input > file. > > However, if I use 4 mappers but there are 3 basic units in the input > file(not integer times). The final output is incorrect. The "Map Input > Bytes" in the job counter is also less than the input file size. How can I > fix it? Do I need to rewrite the inputSplit? > > Any reply will be appreciated! > > Regards! > > Chen -- Harsh J