hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vincent BARAT (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-1029) HBaseStorage is way too slow to be usable
Date Thu, 29 Oct 2009 15:15:59 GMT

    [ https://issues.apache.org/jira/browse/PIG-1029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771444#action_12771444

Vincent BARAT commented on PIG-1029:

I have a small cluster of 1 master node + 3 MR nodes (virtual nodes) connected through a gigabit
switch (network connexion if fast).
Each MR node runs also a HBase region server and zookeeper.

What I have noticed if that the HBase data is not always read from the local node  (according
to the Hadoop web frontend). Most of the time the data is read from another node.

Anyway, I don't think that the slowness comes from this, I suspect 2 things:

1) reading from Hbase is just far slower than reading from a hadoop file
2) converting hbase records to PIG tuples (what is done in HBaseSlice object) is slow (this
is a bunch of object instantiation)

Unfortunately, I have not performed additional test to figure out what is the exact reason.

> HBaseStorage is way too slow to be usable
> -----------------------------------------
>                 Key: PIG-1029
>                 URL: https://issues.apache.org/jira/browse/PIG-1029
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.4.0
>            Reporter: Vincent BARAT
> I have performed a set of benchmarks on HBaseStorage loader, using PIG 0.4.0 and HBase
0.20.0 (using the patch referred in https://issues.apache.org/jira/browse/PIG-970) and Hadoop
> The HBaseStorage loader is basically 10x slower than the PigStorage loader.
> To bypass this limitation, I had to read my HBase tables, write them to a Hadoop file
and then use this file as input for my subsequent computations.
> I report this bug for the track, I will try to sse if I can optimise this a bit.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message