hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doğacan Güney (JIRA) <j...@apache.org>
Subject [jira] Commented: (HBASE-1647) Filter#filterRow is called too often, filters rows it shouldn't have
Date Fri, 17 Jul 2009 11:19:15 GMT

    [ https://issues.apache.org/jira/browse/HBASE-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732469#action_12732469

Doğacan Güney commented on HBASE-1647:

>  # results is now a field for no reason. This reduces GC efficiency and performance.

I explained why in my previous comment. Not sure if mine is a valid reason for worrying though.
It seems results is always cleared in internal hbase usage so my extra safeguard there may
be pointless.

> RegionScanner#next is a mess now. Too many boolean flags, I don't detect a sense of clear
minded purpose. 
> Unbalanced and uncertain flags and filter.reset calls make me concerned about bugs.

I see your point, yet in other ways, it is also clearer now. All the extra logic outside the
while loop is moved into the loop, and stop row comparison code is now in one place.

I reduced boolean flags to one (filterCurrentRow). It is an optimization flag like stickyNextRow
in underlying scanners.

I also refactored code a bit. Let me know if it is clearer now.

> # The last bug one is tests were deleted, instead of migrated. We lose test coverage
with this patch.

I added tests to TestScanner.

> Filter#filterRow is called too often, filters rows it shouldn't have
> --------------------------------------------------------------------
>                 Key: HBASE-1647
>                 URL: https://issues.apache.org/jira/browse/HBASE-1647
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.0
>            Reporter: Doğacan Güney
>             Fix For: 0.20.0
>         Attachments: HBASE-1647-v2.patch, HBASE-1647-v3.patch, HBASE-1647-v4.patch, HBASE-1647-v5.patch,
ScanBug.java, scanfilter.patch
> Filter#filterRow is called from ScanQueryMatcher#filterEntireRow which is called from
StoreScanner.next. However, if I understood the code correctly, StoreScanner processes KeyValue-s
in a column-oriented order (i.e. after row1-col1 comes row2-col1, not row1-col2). Thus, when
filterEntireRow is called, in reality, the filter only processed (via filterKeyValue) only
one column of a row.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message