hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Taylor (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-9316) Use JoinedHeap between MUST_PASS_ALL filters to better leverage essential column family feature
Date Fri, 23 Aug 2013 01:42:51 GMT

    [ https://issues.apache.org/jira/browse/HBASE-9316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13748192#comment-13748192
] 

James Taylor commented on HBASE-9316:
-------------------------------------

It doesn't help distinguish null columns, it just makes it more efficient. Maybe there's a
existing, better way or another, different JIRA to file, but let me try to explain in a better
way:

Let's say you have a query like this:

SELECT * FROM t WHERE c IS NULL

If you have a regular Phoenix table, then we currently insert an empty key value for each
row. So to satisfy this query, we can
- project our empty KeyValue cf/cq plus the cf/cq for c.
- in our filter, include the row if it doesn't have the c cf/cq. We know we'll get called,
since we know that every row has this empty key value.

Another option in Phoenix is to create a VIEW (a read-only table that maps to an existing
HBase table). In this case, we won't have our empty key value, so we have to project everything
into the scan and do the same as above.

So the problem stems from the lack of a way to be able to specify a filter that gets invoked
when a KeyValue is *not* present (or maybe there is a way?).

Instead, if this JIRA is implemented, I was thinking that Phoenix could have a MUST_PASS_ALL
filter list for each column family. If the first filter finds the c KeyValue, then it would
filter the row. Otherwise, any of the subsequent filters would include the row. This way,
you wouldn't necessarily load every store file or need to include an empty key value (though
that still may be a more efficient way to go).

Any ideas?
                
> Use JoinedHeap between MUST_PASS_ALL filters to better leverage essential column family
feature 
> ------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-9316
>                 URL: https://issues.apache.org/jira/browse/HBASE-9316
>             Project: HBase
>          Issue Type: Bug
>            Reporter: James Taylor
>
> Currently, all column families in a MUST_PASS_ALL filter list are loaded in advance of
filtering. Instead, only the essential column family from the first filter should be loaded
and then its heap joined with subsequent essential column family from the next filter in the
list (probably up to some reasonable/configurable limit).
> One particular Phoenix use case for this is when a SQL query is trying to detect the
absence of a KeyValue (though a <column> IS NULL clause). Our workaround for a Phoenix
TABLE is to insert a known, empty key value with every row, or for a Phoenix VIEW (mapping
to an existing HBase table) to project everything. With this feature, we could instead use
a filter per column family and prevent the loading of the corresponding Store in many cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message