Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id BFD39200B98 for ; Mon, 3 Oct 2016 12:46:56 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id BE522160ADC; Mon, 3 Oct 2016 10:46:56 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id D0195160ACC for ; Mon, 3 Oct 2016 12:46:55 +0200 (CEST) Received: (qmail 69768 invoked by uid 500); 3 Oct 2016 10:46:54 -0000 Mailing-List: contact users-help@apex.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@apex.apache.org Delivered-To: mailing list users@apex.apache.org Received: (qmail 69758 invoked by uid 99); 3 Oct 2016 10:46:54 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 Oct 2016 10:46:54 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 6CB1FC0DCC for ; Mon, 3 Oct 2016 10:46:54 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.379 X-Spam-Level: ** X-Spam-Status: No, score=2.379 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx2-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id xHasLkz8bN0T for ; Mon, 3 Oct 2016 10:46:52 +0000 (UTC) Received: from mail-lf0-f45.google.com (mail-lf0-f45.google.com [209.85.215.45]) by mx2-lw-eu.apache.org (ASF Mail Server at mx2-lw-eu.apache.org) with ESMTPS id 309C65FBD4 for ; Mon, 3 Oct 2016 10:46:52 +0000 (UTC) Received: by mail-lf0-f45.google.com with SMTP id t81so83558841lfe.0 for ; Mon, 03 Oct 2016 03:46:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=g60R1yj7kx3SS4b8PFAOuCmH17zTaiaa4RzcyR3+ySU=; b=sJQs7Ljn2AaiVgdBOd4EZgH6zUF8qWeZaEtY1Ab6N1qP+26kDPvBKQfDwLrBwI1RHf rYSl7Ti5KAFN1oMOc/ed8t8eQ2Z8QA+yb+2ecK0+hijsQxNOjFMeQQ2KJLH8+XRgJ0d0 +r+sD/kY9NA662F8bmaSrJ9BG9hA5eQndOoOPK/celLn5W6nxCfDH5cW0PSgZoz8avpd Ptkbt6Ukg2NRUKoN8RQL0DFOGf5CES0JLoqX91hM72q1S5zjSDCL3DKVByW135YR3s+0 lr3krP8dppCvWwzbWhEEHkz5c47Km7g6gOUkE1djU81A31vi2I14fvB/V03/i6YfiJvx XFLA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=g60R1yj7kx3SS4b8PFAOuCmH17zTaiaa4RzcyR3+ySU=; b=N1E75qngSX50dRYy3gUSYO2pnynNSp66gIKsXLtb6Yb4Oh4HLKY0th/bnYP/UkX9mX lLOVTqyWIDvoYoLHkC1lxI2uW+nwZXM+hg9yBEUGqIHq9S6KoRW1kArFNbm4bktYrcYH WUTqP2rDA1A2RJLzypn49hdKuCyhRtAAuPZJAfA7k5LMKn7QAU+B1OVaTrMnKg4IuYE5 VndzjgOW+E9X2j/lAe2WVsonxAdS8lq0HpKwvpd8MK9TdGAdTae18Xv6E3U5Ha17DhFG /5+NRuCoA0tgVZjaU+nuMtRXIQORBK1nwPL9Z590jL0LPKX1EqO/zQZpI4GbKiO4sPRa he4w== X-Gm-Message-State: AA6/9Rlg1qUbzs1ZHU9vyWjTaFLa3b8DVskU89pZlLiG7Y+nYLVsWObfqehoGjSURVFn789znU2VdhJe0G0dLg== X-Received: by 10.46.1.205 with SMTP id f74mr3078219lji.55.1475491611409; Mon, 03 Oct 2016 03:46:51 -0700 (PDT) MIME-Version: 1.0 Received: by 10.25.66.149 with HTTP; Mon, 3 Oct 2016 03:46:50 -0700 (PDT) In-Reply-To: References: From: chiranjeevi vasupilli Date: Mon, 3 Oct 2016 16:16:50 +0530 Message-ID: Subject: Re: Reading compressed file using FileSplitter To: users@apex.apache.org Content-Type: multipart/alternative; boundary=001a1142bbc25e03cd053df3a9e9 archived-at: Mon, 03 Oct 2016 10:46:56 -0000 --001a1142bbc25e03cd053df3a9e9 Content-Type: text/plain; charset=UTF-8 Thank you Priyanka, we are not using any snappy libraries yet for decomressing, can you please suggest the library and version. so that we will try to implement. On Mon, Oct 3, 2016 at 4:06 PM, Priyanka Gugale wrote: > Hi Chiranjeevi, > > There is no direct support in current operators to decompress data read > from file. But you can do it in following ways: > 1. Extend AbstractBlockReader to use right STREAM type by implementing > `setupStream` function to initialize right stream reader class. e.g. > gzipInputStream if your input was in gzip format. Or in your case > "SnappyInputStream". > 2. Override `readBlock` from AbstractBlockReader and call decompress on > input data using snappy java api and then emit the data. > > I would suggest the option one but what is achievable depends on which > snappy java library you use. Can you tell us which library you are using? > > -Priyanka > > On Mon, Oct 3, 2016 at 2:42 PM, chiranjeevi vasupilli > wrote: > >> Hi Priyanka, >> >> We are getting compressed file from source, which we need to read and >> decompress it. So that we can process the actual data. >> >> Can you please provide any reader/Operator which is readily available to >> decompress the data while reading data in DataTorrent? >> >> >> >> On Mon, Oct 3, 2016 at 1:07 PM, Priyanka Gugale >> wrote: >> >>> Hi, >>> >>> Do you want to read files in compressed form only or you want to your >>> program to decompress and read it? >>> If you want to read it in compressed format you can use FSInputModule >>> (which uses FileSplitter and block reader) directly to read your files. >>> If you want to uncompress while reading, there are other options you can >>> choose. I will explain in detail once you confirm this is what you are >>> trying to achieve. >>> >>> -Priyanka >>> >>> On Mon, Oct 3, 2016 at 12:38 PM, chiranjeevi vasupilli < >>> chiru.vcj@gmail.com> wrote: >>> >>>> Hi Team, >>>> >>>> Can you please provide any reader/Operator which is capable of reading >>>> the compressed data in DataTorrent. >>>> >>>> I have a requirement to read .snappy files having cntl+A separaor using >>>> filesplitter ,can u please let me know how to do it? >>>> >>>> >>>> -- >>>> thanks >>>> chiru >>>> >>> >>> >> >> >> -- >> ur's >> chiru >> > > -- ur's chiru --001a1142bbc25e03cd053df3a9e9 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Thank you Priyanka,

we are not using an= y snappy libraries yet for decomressing, can you please suggest the library= and version. so that we will try to implement.


On = Mon, Oct 3, 2016 at 4:06 PM, Priyanka Gugale <priyag@apache.org> wrote:
Hi Chiranje= evi,

There is no direct support in current operators to = decompress data read from file. But you can do it in following ways:
<= div>1. Extend AbstractBlockReader to use right STREAM type by implementing = `setupStream` function to initialize right stream reader class.=C2=A0e.g. g= zipInputStream if your input was in gzip format. Or in your case "Snap= pyInputStream".
2. Override `readBlock` from AbstractBlockRe= ader and call decompress on input data using snappy java api and then emit = the data.

I would suggest the option one but what = is achievable depends on which snappy java library you use. Can you tell us= which library you are using?

-Priyanka

On Mon, Oct 3, 2016 at 2:42 PM, chiranjeevi vasupilli <chiru= .vcj@gmail.com> wrote:
Hi Priyanka,

We are getting compressed file= from source, which we need to read and decompress it. So that we can proce= ss the actual data.

Can you please provide any reader/Operator which is readily available to= decompress the data =C2=A0while re= ading=C2=A0data in DataTorrent?



= On Mon, Oct 3, 2016 at 1:07 PM, Priyanka Gugale <priyag@apache.org>= wrote:
Hi,<= /div>

Do you want to read files in compressed form only or yo= u want to your program to decompress and read it?
If you want to read i= t in compressed format you can use FSInputModule (which uses FileSplitter a= nd block reader) directly to read your files.
If you want to unco= mpress while reading, there are other options you can choose. I will explai= n in detail once you confirm this is what you are trying to achieve.
<= span class=3D"m_334727421442684844m_384080841917207692HOEnZb">

-Priyanka

On Mon, Oct 3, 2016 at 12:38 PM, chiranjeevi vasupilli= <chiru.vcj@gmail.com> wrote:
Hi Team,

Can you pleas= e provide any reader/Operator which is capable of reading the compressed da= ta in DataTorrent.

I have a requirement to read .s= nappy files having cntl+A separaor using filesplitter ,can u please let me = know how to do it? =C2=A0


--
thanks
chiru




<= /div>-- =
ur's
chiru




--
=
ur's<= br>chiru
--001a1142bbc25e03cd053df3a9e9--