Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DD2C8D3FD for ; Mon, 6 Aug 2012 15:45:37 +0000 (UTC) Received: (qmail 73589 invoked by uid 500); 6 Aug 2012 15:45:36 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 73519 invoked by uid 500); 6 Aug 2012 15:45:36 -0000 Mailing-List: contact mapreduce-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-user@hadoop.apache.org Delivered-To: mailing list mapreduce-user@hadoop.apache.org Received: (qmail 73511 invoked by uid 99); 6 Aug 2012 15:45:36 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 06 Aug 2012 15:45:36 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of rahulpoolanchalil@gmail.com designates 209.85.215.48 as permitted sender) Received: from [209.85.215.48] (HELO mail-lpp01m010-f48.google.com) (209.85.215.48) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 06 Aug 2012 15:45:29 +0000 Received: by lagr15 with SMTP id r15so1185847lag.35 for ; Mon, 06 Aug 2012 08:45:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=fTveJV4NK3R/dS3cCc+as9nuln/XayMg6w5mEhXRC04=; b=eUp9skATy86XkdfqK3VqL2LDJYIsuI+YjU3YI6JJbrwokJ+gmffVD1HZ9lfpEov8Pf gaxL9Pc7surOZHTNgLgL6SbP7xylqCqFb5ippK72FHZKMDj5rdqB+itmhoPcpcztIb2j 10DgEHvxbb5b/qjFg2QIW4v9LCT3NTa5I23tEQKa0/meLQbxsFG86+Qy6nTfoAJC7e6V K4sJlvoDxOohR7mu+fED1szZbtBC5PNWkN97Xi33biF+1EghEhrsWMrWMP0MWXogohqQ GvmOrw2Ywm1lOp7SCzMOGPi56nTs0/izKrOZpB5KnpRzYHzT7CPfYLthkGzEn3eCfBmr 6hMw== MIME-Version: 1.0 Received: by 10.152.146.101 with SMTP id tb5mr11269002lab.0.1344267908445; Mon, 06 Aug 2012 08:45:08 -0700 (PDT) Received: by 10.112.8.39 with HTTP; Mon, 6 Aug 2012 08:45:08 -0700 (PDT) In-Reply-To: References: Date: Mon, 6 Aug 2012 23:45:08 +0800 Message-ID: Subject: Re: Handling files with unclear boundaries From: rahul p To: mapreduce-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=e89a8f2348b32a76eb04c69ac32b --e89a8f2348b32a76eb04c69ac32b Content-Type: text/plain; charset=ISO-8859-1 Hi Tariq, Can you accept my gtalk request. On Mon, Aug 6, 2012 at 11:30 PM, Mohammad Tariq wrote: > Hello list, > > I need some guidance on how to handle files where we don't have > any proper delimiters or record boundaries. Actually I am trying to > process a set of file that are totally alien to me (SAS XPT files) > through MR. But one thing that is always fixed is that each time I > have to read 107 bytes from the line. Is it possible to use this > length as a delimiter for creating splits some how??And if so which > InputFormat would be appropriate??Many thanks. > > Regards, > Mohammad Tariq > --e89a8f2348b32a76eb04c69ac32b Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Hi Tariq,
Can you accept my gtalk request.

On Mon, Aug 6, 2012 at 11:30 PM, Mohammad Tariq <d= ontariq@gmail.com> wrote:
Hello list,

=A0 =A0 =A0I need some guidance on how to handle files where we don't h= ave
any proper delimiters or record boundaries. Actually I am trying to
process a set of file that are totally alien to me (SAS XPT files)
through MR. But one thing that is always fixed is that each time I
have to read 107 bytes from the line. Is it possible to use this
length as a delimiter for creating splits some how??And if so which
InputFormat would be appropriate??Many thanks.

Regards,
=A0 =A0 Mohammad Tariq

--e89a8f2348b32a76eb04c69ac32b--