incubator-hcatalog-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rajesh Balamohan <rajesh.balamo...@gmail.com>
Subject Re: HCataStorer performance
Date Sat, 02 Jun 2012 01:12:22 GMT
Hi Sushanth,

Thanks for the reply.

I am looking at around 270 GB of data stored using PigStorage/HCatStorer. I
use around 300 reducers for this. Its close to 1-1.5 GB per reducer (1.5 GB
for skewed reducer)

I am using RCFile as the underlying mechanism and its in compressed format.

I don't have any bags (I believe HCatStorer tries to unwrap which are
stored. Its a flat record of 256 columns.

There is no difference in storing the variable between PigStorage and
HCatStorer.

~Rajesh.B

On Sat, Jun 2, 2012 at 2:06 AM, Sushanth Sowmyan <khorgath@gmail.com> wrote:

> Hi Rajesh,
>
> I'm afraid we haven't done a performance analysis for a while now (the
> last time we did so was around HCat 0.1 timeframe.
>
> What we noticed when we did that was that I/O was always the biggest
> bottleneck, and so things like what underlying format(say RCFile
> versus text) was used and whether or not compression was on were the
> relevant performance predictors. A 4x slowness is not expected.
>
> What data sizes are you looking at, and are there any other variables
> between your HCatStorer() and PigStorage() cases?
>
> Thanks,
> -Sushanth
>
> On Tue, May 22, 2012 at 5:06 PM, Rajesh Balamohan
> <rajesh.balamohan@gmail.com> wrote:
> > Hi All,
> >
> > Currently I am using HCat 0.4 & Pig 0.9.3.
> >
> > While running jobs, I observed that HCatStorer() is a lot slower than
> > PigStorage(). (approximately 4x)
> >
> > Is this a known issue? Any pointers would be of great help.
> >
> > --
> > ~Rajesh.B
>



-- 
~Rajesh.B

Mime
View raw message