Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0262DCA41 for ; Wed, 6 Jun 2012 18:08:38 +0000 (UTC) Received: (qmail 66718 invoked by uid 500); 6 Jun 2012 18:08:36 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 66657 invoked by uid 500); 6 Jun 2012 18:08:36 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 66649 invoked by uid 99); 6 Jun 2012 18:08:36 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 Jun 2012 18:08:36 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of vinod@vinodsingh.com designates 209.85.214.176 as permitted sender) Received: from [209.85.214.176] (HELO mail-ob0-f176.google.com) (209.85.214.176) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 Jun 2012 18:08:32 +0000 Received: by obbef5 with SMTP id ef5so14589330obb.35 for ; Wed, 06 Jun 2012 11:08:11 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:x-gm-message-state; bh=fE1qBn3WlFQRenr6WDceQ+6LAeT4n+Z0rp6VSSkmTNY=; b=Yjh6FAnJEj/LsD31Av7B/EaZEpkwqi5Tu9kYqS5+9Ug+T5W8aEi5p84odkBtrB0ufH obDYyiRYMia7Y6FL1H0MbWlZdUsV6OxBcDteBPiZwTcJRNtcja4+mLkRUU8kv2oITdmM yNBersI4K4eHoSD9+B48v6oUvzlZXOQ8e9HMdqrRCKlhxHm5lhzHSMLu/7RmS+jX24Eo 8+S2LHhtbAXnPhr0YDyBvD/rcbjszuH5R6nA4lNSxHNZ5QVO/1mW1Ah/KumErLiVjWxE 8nBcmYFp6TQOkDX4BoJoAgHtworylVDPzknywVYcI9SXtYYgu5HIKfKBnB49XAlRgvjU ejlA== Received: by 10.182.31.11 with SMTP id w11mr21497838obh.64.1339006091191; Wed, 06 Jun 2012 11:08:11 -0700 (PDT) MIME-Version: 1.0 Received: by 10.76.122.211 with HTTP; Wed, 6 Jun 2012 11:07:51 -0700 (PDT) In-Reply-To: References: <1338976135.16063.YahooMailNeo@web121203.mail.ne1.yahoo.com> <1338977198.83640.YahooMailNeo@web121202.mail.ne1.yahoo.com> From: Vinod Singh Date: Wed, 6 Jun 2012 23:37:51 +0530 Message-ID: Subject: Re: Compressed data storage in HDFS - Error To: user@hive.apache.org Content-Type: multipart/alternative; boundary=14dae93a14956ae4a004c1d1a62a X-Gm-Message-State: ALoCoQmw5zmrDcvYhGeI3ftGliPduTg69oDjGNeTTUj0qrhKlslrH2mYpzUYedPdC8M8rTkQtyQP X-Virus-Checked: Checked by ClamAV on apache.org --14dae93a14956ae4a004c1d1a62a Content-Type: text/plain; charset=UTF-8 But it may payoff by saving on network IO while copying the data during reduce phase. Though it will vary from case to case. We had good results by using Snappy codec for compressing map output. Snappy provides reasonably good compression at faster rate. Thanks, Vinod http://blog.vinodsingh.com/ On Wed, Jun 6, 2012 at 4:03 PM, Debarshi Basak wrote: > Compression is an overhead when you have a CPU intensive job > > > Debarshi Basak > Tata Consultancy Services > Mailto: debarshi.basak@tcs.com > Website: http://www.tcs.com > ____________________________________________ > Experience certainty. IT Services > Business Solutions > Outsourcing > ____________________________________________ > > -----Bejoy Ks ** wrote: -----** > > To: "user@hive.apache.org" > From: Bejoy Ks > Date: 06/06/2012 03:37PM > Subject: Re: Compressed data storage in HDFS - Error > > > Hi Sreenath > > Output compression is more useful on storage level, when a larger file is > compressed it saves on hdfs blocks and there by the cluster become more > scalable in terms of number of files. > > Yes lzo libraries needs to be there in all task tracker nodes as well the > node that hosts the hive client. > > Regards > Bejoy KS > > ------------------------------ > *From:* Sreenath Menon > *To:* user@hive.apache.org; Bejoy Ks > *Sent:* Wednesday, June 6, 2012 3:25 PM > *Subject:* Re: Compressed data storage in HDFS - Error > > Hi Bejoy > I would like to make this clear. > There is no gain on processing throughput/time on compressing the data > stored in HDFS (not talking about intermediate compression)...wright?? > And do I need to add the lzo libraries in Hadoop_Home/lib/native for all > the nodes (including the slave nodes)?? > > > =====-----=====-----===== > Notice: The information contained in this e-mail > message and/or attachments to it may contain > confidential or privileged information. If you are > not the intended recipient, any dissemination, use, > review, distribution, printing or copying of the > information contained in this e-mail message > and/or attachments to it are strictly prohibited. If > you have received this communication in error, > please notify us by reply e-mail or telephone and > immediately and permanently delete the message > and any attachments. Thank you > > --14dae93a14956ae4a004c1d1a62a Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable But it may payoff by saving on network IO while copying the data during red= uce phase. Though it will vary from case to case. We had good results by us= ing Snappy codec for compressing map output. Snappy provides reasonably goo= d compression at faster rate.

Thanks,
Vinod

http://blog.vinodsingh.com/

On Wed, Jun 6, 2012 at 4:03 PM, Debarshi Bas= ak <debarshi.basak@tcs.com> wrote:
Compr= ession is an overhead when you have a CPU intensive job


Debarshi= Basak
Tata Consultancy Services
Mailto: debarshi.basak@tcs.com
Website: http://www.tcs.co= m
____________________________________________
Experience certain= ty. IT Services
Business Solutions
Outsourcing
_____________= _______________________________

-----Bejoy Ks wrote: -----
To: "user@hive.apache.org" &l= t;user@hive.apach= e.org>
From: Bejoy Ks <= bejoy_ks@yahoo.com>
Date: 06/06/2012 03:37PM
Subject: Re: Comp= ressed data storage in HDFS - Error


Hi Sreenath

Outpu= t compression is more useful on storage level, when a larger file is compre= ssed it saves on hdfs blocks and there by the cluster become more scalable = in terms of number of files.=C2=A0

Yes lzo libraries needs to be there in all task tracker= nodes as well the node that hosts the hive client.

Regards
Bejoy KS


From: Sreenath Menon <sreenathmenon5@gmail.com= >
To: user@hive.apache.org; Bejoy Ks <<= a href=3D"mailto:bejoy_ks@yahoo.com" target=3D"_blank">bejoy_ks@yahoo.com>
Sent: Wednesday, June 6, 20= 12 3:25 PM
Subject: Re: = Compressed data storage in HDFS - Error

Hi Bejoy
I would like to make this clear.
There is no gain on pr= ocessing throughput/time on compressing the data stored in HDFS (not talkin= g about intermediate compression)...wright??
And do I need to add the lzo libraries in Hadoop_Home/lib/native for all the nodes (including th= e slave nodes)??


=3D=3D=3D=3D=3D-----=3D=3D=3D=3D=3D-----=3D=3D=3D=3D=3D
Notice: The information contained in this e-mail
message and/or attachments to it may contain
confidential or privileged information. If you are
not the intended recipient, any dissemination, use,
review, distribution, printing or copying of the
information contained in this e-mail message
and/or attachments to it are strictly prohibited. If
you have received this communication in error,
please notify us by reply e-mail or telephone and
immediately and permanently delete the message
and any attachments. Thank you


--14dae93a14956ae4a004c1d1a62a--