Mailing-List: contact user-help@hive.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hive.apache.org
Received-SPF: pass (athena.apache.org: domain of vinod@vinodsingh.com
 designates 209.85.214.176 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <OF0407FA54.6D51F5C8-ON65257A15.0039FE67-65257A15.0039FE6F@tcs.com>
References: 
 <CAF_aVDG2NfRo6+rZN5HeCfWWVfgCG=eYmVi4MvBRX_3ZQL3BWg@mail.gmail.com>
 <OFFC5F5E72.B6CC6A29-ON65257A15.00333FCC-65257A15.00333FD3@tcs.com>
 <CAF_aVDGMzia8-MJHm6J+yc3dEJ-ESpFrvNgp88V21WOLnDa-yw@mail.gmail.com>
 <1338976135.16063.YahooMailNeo@web121203.mail.ne1.yahoo.com>
 <CAF_aVDGGi=LzuqffRPvGsjJs=231=Q2Od9b3rwKa9=ifu476xg@mail.gmail.com>
 <1338977198.83640.YahooMailNeo@web121202.mail.ne1.yahoo.com>
 <OF0407FA54.6D51F5C8-ON65257A15.0039FE67-65257A15.0039FE6F@tcs.com>
From: Vinod Singh <vinod@vinodsingh.com>
Date: Wed, 6 Jun 2012 23:37:51 +0530
Message-ID: 
 <CAMY2jQ+JRxjKNerwXW7rLKZnW8o9ny_+pfzKmbgni+_sGWY69g@mail.gmail.com>
Subject: Re: Compressed data storage in HDFS - Error
To: user@hive.apache.org
Content-Type: multipart/alternative; boundary=14dae93a14956ae4a004c1d1a62a

--14dae93a14956ae4a004c1d1a62a
Content-Type: text/plain; charset=UTF-8

But it may payoff by saving on network IO while copying the data during
reduce phase. Though it will vary from case to case. We had good results by
using Snappy codec for compressing map output. Snappy provides reasonably
good compression at faster rate.

Thanks,
Vinod

http://blog.vinodsingh.com/

On Wed, Jun 6, 2012 at 4:03 PM, Debarshi Basak <debarshi.basak@tcs.com>wrote:

>  Compression is an overhead when you have a CPU intensive job
>
>
> Debarshi Basak
> Tata Consultancy Services
> Mailto: debarshi.basak@tcs.com
> Website: http://www.tcs.com
> ____________________________________________
> Experience certainty. IT Services
> Business Solutions
> Outsourcing
> ____________________________________________
>
> -----Bejoy Ks ** wrote: -----**
>
> To: "user@hive.apache.org" <user@hive.apache.org>
> From: Bejoy Ks <bejoy_ks@yahoo.com>
> Date: 06/06/2012 03:37PM
> Subject: Re: Compressed data storage in HDFS - Error
>
>
> Hi Sreenath
>
> Output compression is more useful on storage level, when a larger file is
> compressed it saves on hdfs blocks and there by the cluster become more
> scalable in terms of number of files.
>
> Yes lzo libraries needs to be there in all task tracker nodes as well the
> node that hosts the hive client.
>
> Regards
> Bejoy KS
>
>   ------------------------------
> *From:* Sreenath Menon <sreenathmenon5@gmail.com>
> *To:* user@hive.apache.org; Bejoy Ks <bejoy_ks@yahoo.com>
> *Sent:* Wednesday, June 6, 2012 3:25 PM
> *Subject:* Re: Compressed data storage in HDFS - Error
>
> Hi Bejoy
> I would like to make this clear.
> There is no gain on processing throughput/time on compressing the data
> stored in HDFS (not talking about intermediate compression)...wright??
> And do I need to add the lzo libraries in Hadoop_Home/lib/native for all
> the nodes (including the slave nodes)??
>
>
>  =====-----=====-----=====
> Notice: The information contained in this e-mail
> message and/or attachments to it may contain
> confidential or privileged information. If you are
> not the intended recipient, any dissemination, use,
> review, distribution, printing or copying of the
> information contained in this e-mail message
> and/or attachments to it are strictly prohibited. If
> you have received this communication in error,
> please notify us by reply e-mail or telephone and
> immediately and permanently delete the message
> and any attachments. Thank you
>
>

--14dae93a14956ae4a004c1d1a62a
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

But it may payoff by saving on network IO while copying the data during red=
uce phase. Though it will vary from case to case. We had good results by us=
ing Snappy codec for compressing map output. Snappy provides reasonably goo=
d compression at faster rate.<div>

<br clear=3D"all">Thanks,<br>Vinod<br><br><a href=3D"http://blog.vinodsingh=
.com/">http://blog.vinodsingh.com/</a><br>
<br><div class=3D"gmail_quote">On Wed, Jun 6, 2012 at 4:03 PM, Debarshi Bas=
ak <span dir=3D"ltr">&lt;<a href=3D"mailto:debarshi.basak@tcs.com" target=
=3D"_blank">debarshi.basak@tcs.com</a>&gt;</span> wrote:<br><blockquote cla=
ss=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;pa=
dding-left:1ex">

<font face=3D"Default Sans Serif,Verdana,Arial,Helvetica,sans-serif"> Compr=
ession is an overhead when you have a CPU intensive job<br><br><br>Debarshi=
 Basak<br>Tata Consultancy Services<br>Mailto: <a href=3D"mailto:debarshi.b=
asak@tcs.com" target=3D"_blank">debarshi.basak@tcs.com</a><br>

Website: <a href=3D"http://www.tcs.com" target=3D"_blank">http://www.tcs.co=
m</a><br>____________________________________________<br>Experience certain=
ty.	IT Services<br>			Business Solutions<br>			Outsourcing<br>_____________=
_______________________________<br>

<br><font color=3D"#990099">-----Bejoy Ks <u></u> wrote: -----<u></u></font=
><div><blockquote style=3D"border-left:2px solid black;padding-right:0px;pa=
dding-left:5px;margin-left:5px;margin-right:0px">To: &quot;<a href=3D"mailt=
o:user@hive.apache.org" target=3D"_blank">user@hive.apache.org</a>&quot; &l=
t;<a href=3D"mailto:user@hive.apache.org" target=3D"_blank">user@hive.apach=
e.org</a>&gt;<br>

From: Bejoy Ks &lt;<a href=3D"mailto:bejoy_ks@yahoo.com" target=3D"_blank">=
bejoy_ks@yahoo.com</a>&gt;<br>Date: 06/06/2012 03:37PM<br>Subject: Re: Comp=
ressed data storage in HDFS - Error<br><br><div style=3D"font-size:10pt;fon=
t-family:verdana,helvetica,sans-serif">

<div><span><br></span></div><div>Hi Sreenath</div><div><br></div><div>Outpu=
t compression is more useful on storage level, when a larger file is compre=
ssed it saves on hdfs blocks and there by the cluster become more scalable =
in terms of number of files.=C2=A0</div>

<div><br></div><div>Yes lzo libraries needs to be there in all task tracker=
 nodes as well the node that hosts the hive client.</div><div><br></div><di=
v>Regards</div><div>Bejoy KS<br></div><div><br></div><div></div><div style=
=3D"font-family:verdana,helvetica,sans-serif;font-size:10pt">

 <div style=3D"font-family:times new roman,new york,times,serif;font-size:1=
2pt"> <div dir=3D"ltr"> <font face=3D"Arial"> <hr size=3D"1">  <b><span sty=
le=3D"font-weight:bold">From:</span></b> Sreenath Menon &lt;<a href=3D"mail=
to:sreenathmenon5@gmail.com" target=3D"_blank">sreenathmenon5@gmail.com</a>=
&gt;<br>

 <b><span style=3D"font-weight:bold">To:</span></b> <a href=3D"mailto:user@=
hive.apache.org" target=3D"_blank">user@hive.apache.org</a>; Bejoy Ks &lt;<=
a href=3D"mailto:bejoy_ks@yahoo.com" target=3D"_blank">bejoy_ks@yahoo.com</=
a>&gt; <br>

 <b><span style=3D"font-weight:bold">Sent:</span></b> Wednesday, June 6, 20=
12 3:25 PM<br> <b><span style=3D"font-weight:bold">Subject:</span></b> Re: =
Compressed data storage in HDFS - Error<br> </font> </div> <br>
<div>Hi Bejoy<br>I would like to make this clear.<br>There is no gain on pr=
ocessing throughput/time on compressing the data stored in HDFS (not talkin=
g about intermediate compression)...wright??<br>And do I need to add the <s=
pan>lzo libraries in Hadoop_Home/lib/native for all the nodes (including th=
e slave nodes)??<br>


</span>
</div><br><br> </div> </div>  </div></blockquote></div><div></div></font><p=
>=3D=3D=3D=3D=3D-----=3D=3D=3D=3D=3D-----=3D=3D=3D=3D=3D<br>
Notice: The information contained in this e-mail<br>
message and/or attachments to it may contain <br>
confidential or privileged information. If you are <br>
not the intended recipient, any dissemination, use, <br>
review, distribution, printing or copying of the <br>
information contained in this e-mail message <br>
and/or attachments to it are strictly prohibited. If <br>
you have received this communication in error, <br>
please notify us by reply e-mail or telephone and <br>
immediately and permanently delete the message <br>
and any attachments. Thank you</p>

<p></p>
</blockquote></div><br></div>

--14dae93a14956ae4a004c1d1a62a--