hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vincent BARAT (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-1029) HBaseStorage is way too slow to be usable
Date Thu, 11 Feb 2010 22:32:30 GMT

    [ https://issues.apache.org/jira/browse/PIG-1029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832707#action_12832707

Vincent BARAT commented on PIG-1029:

OK, I've found the parameter. Nevertheless, as I previously stated, "even if this cache size
can be configured globally using configuration files, I think the HBaseStorage() should take
an additional parameters (optional maybe) allowing to set the cache size for the scanned table."

Don't you think so ? Do you think it's worth having it available in the HBaseStorage() call
? My point is that you can have tables with very large rows and others with very small rows,
making the use of the hbase.client.scanner.caching parameter at config file level non usable,
and a way to set it at PIG level very useful.

> HBaseStorage is way too slow to be usable
> -----------------------------------------
>                 Key: PIG-1029
>                 URL: https://issues.apache.org/jira/browse/PIG-1029
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.4.0
>            Reporter: Vincent BARAT
> I have performed a set of benchmarks on HBaseStorage loader, using PIG 0.4.0 and HBase
0.20.0 (using the patch referred in https://issues.apache.org/jira/browse/PIG-970) and Hadoop
> The HBaseStorage loader is basically 10x slower than the PigStorage loader.
> To bypass this limitation, I had to read my HBase tables, write them to a Hadoop file
and then use this file as input for my subsequent computations.
> I report this bug for the track, I will try to sse if I can optimise this a bit.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message