hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Duo Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
Date Wed, 21 Jun 2017 02:08:00 GMT

    [ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056839#comment-16056839
] 

Duo Zhang commented on HBASE-17125:
-----------------------------------

{quote}
Am still not in favor of asking the user to configure some extra Filter to get an expected
behave from the system
{quote}

I'd say again that the javadoc never guarantee the current behavior and no doubt it is a broken
semantic. And see my comment above,  I think use another filter to address the problem introduced
by filter is the right direction. We should not put too many complexities to our core system.

And see my comment above, a real user case which shows that the current approach can solve
his/her problem

{quote}
Oh, seems the user calls setMaxVerions to 1. I believe the problem is that he/she found that
the filter will return old values then he/she use setMaxVersions(1) and hope this could solve
the problem.
So it is clear that in this user's mind, setMaxVersions should be used to control the number
of versions passed to the filter. This is exactly what we provide in the latest patch. With
the patch in place, the user does not need to call setMaxVersions(1) anymore.
Thanks.
{quote}

> Inconsistent result when use filter to read data
> ------------------------------------------------
>
>                 Key: HBASE-17125
>                 URL: https://issues.apache.org/jira/browse/HBASE-17125
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Guanghao Zhang
>            Assignee: Guanghao Zhang
>            Priority: Critical
>             Fix For: 2.0.0
>
>         Attachments: example.diff, HBASE-17125.master.001.patch, HBASE-17125.master.002.patch,
HBASE-17125.master.002.patch, HBASE-17125.master.003.patch, HBASE-17125.master.004.patch,
HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, HBASE-17125.master.007.patch,
HBASE-17125.master.008.patch, HBASE-17125.master.009.patch, HBASE-17125.master.009.patch,
HBASE-17125.master.010.patch, HBASE-17125.master.011.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. The oldest
version doesn't remove immediately. But from the user view, the oldest version has gone. When
user use a filter to query, if the filter skip a new version, then the oldest version will
be seen again. But after compact the region, then the oldest version will never been seen.
So it is weird for user. The query will get inconsistent result before and after region compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the cell by
filter, then check the number of versions needed. So if the filter skip the new version, then
the oldest version will be seen again when it is not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two solution for
this problem. The first idea is check the number of versions first, then check the cell by
filter. As the comment of setFilter, the filter is called after all tests for ttl, column
match, deletes and max versions have been run.
> {code}
>   /**
>    * Apply the specified server-side filter when performing the Query.
>    * Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>    * for ttl, column match, deletes and max versions have been run.
>    * @param filter filter to run on the server
>    * @return this for invocation chaining
>    */
>   public Query setFilter(Filter filter) {
>     this.filter = filter;
>     return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the user query
only need 3 versions. It first check the version's number, then check the cell by filter.
So the cells number of the result may less than 3. But there are 2 versions which don't read
anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break the javadoc
of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message