hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Hsieh (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter
Date Fri, 28 Oct 2011 20:39:33 GMT

    [ https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13138729#comment-13138729
] 

Jonathan Hsieh commented on HBASE-4532:
---------------------------------------

This seems to be checked into trunk now and there seems to be an extraneous System.out.println
that is causing some of my tests to "fail" when run from maven (apparently maven buffers in
memory instead of writing it out as a test is executing).

Here's the OOME that maven reports:

Exception in thread "ThreadedStreamConsumer" java.lang.OutOfMemoryError: Java heap spaceat
java.util.Arrays.copyOf(Arrays.java:2882)at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)at
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390)at java.lang.StringBuffer.append(StringBuffer.java:224)at
org.apache.maven.surefire.report.ConsoleOutputFileReporter.writeMessage(ConsoleOutputFileReporter.java:115)at
org.apache.maven.surefire.report.MulticastingReporter.writeMessage(MulticastingReporter.java:101)at
org.apache.maven.surefire.report.TestSetRunListener.writeTestOutput(TestSetRunListener.java:99)at
org.apache.maven.plugin.surefire.booterclient.output.ForkClient.consumeLine(ForkClient.java:132)at
org.apache.maven.plugin.surefire.booterclient.output.ThreadedStreamConsumer$Pumper.run(ThreadedStreamConsumer.java:67)at
java.lang.Thread.run(Thread.java:662) man

I've attached a patch eliminates this issue.

                
> Avoid top row seek by dedicated bloom filter for delete family bloom filter
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-4532
>                 URL: https://issues.apache.org/jira/browse/HBASE-4532
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: D27.1.patch, D27.1.patch, HBASE-4532-apache-trunk.patch, hbase-4532-89-fb.patch
>
>
> The previous jira, HBASE-4469, is to avoid the top row seek operation if row-col bloom
filter is enabled. 
> This jira tries to avoid top row seek for all the cases by creating a dedicated bloom
filter only for delete family
> The only subtle use case is when we are interested in the top row with empty column.
> For example, 
> we are interested in row1/cf1:/1/put.
> So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter
will say there is NO delete family.
> Then it will avoid the top row seek and return a fake kv, which is the last kv for this
row (createLastOnRowCol).
> In this way, we have already missed the real kv we are interested in.
> The solution for the above problem is to disable this optimization if we are trying to
GET/SCAN a row with empty column.
> Evaluation from TestSeekOptimization:
> Previously:
> For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization:
1714 (68.40%), savings: 31.60%
> For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization:
1714 (68.40%), savings: 31.60%
> For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization:
1458 (58.18%), savings: 41.82%
> For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1714
(68.40%), savings: 31.60%
> For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1714
(68.40%), savings: 31.60%
> For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization:
1458 (58.18%), savings: 41.82%
> So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is enabled.[HBASE-4469]
> ================================================
> After this change:
> For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization:
1458 (58.18%), savings: 41.82%
> For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization:
1458 (58.18%), savings: 41.82%
> For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization:
1458 (58.18%), savings: 41.82%
> For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1458
(58.18%), savings: 41.82%
> For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1458
(58.18%), savings: 41.82%
> For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization:
1458 (58.18%), savings: 41.82%
> So we can get about 10% more seek savings for ALL kinds of bloom filter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message