hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anoop Sam John (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-17678) FilterList with MUST_PASS_ONE may lead to redundant cells returned
Date Wed, 07 Jun 2017 08:36:18 GMT

    [ https://issues.apache.org/jira/browse/HBASE-17678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16040507#comment-16040507
] 

Anoop Sam John commented on HBASE-17678:
----------------------------------------

CellUtil#createCell will again make a new KV which internally having another copy of the already
copied component.  We can avoid that.
Better to use KeyValueUtil#toNewKeyCell(final Cell cell) 

For every return value of the individual filters we need the extra check on what to pass next
to this Cell?  May be some specific return values only. Iff we store?  Other cases just add
a comment why we dont need that and just set as null?  I hope even if prevCell is null the
remaining logic will work.

So only very specific cases we have this prevCell keeping and so the extra copy.  This is
per cell that is why worried much. WDYT [~tedyu]?

> FilterList with MUST_PASS_ONE may lead to redundant cells returned
> ------------------------------------------------------------------
>
>                 Key: HBASE-17678
>                 URL: https://issues.apache.org/jira/browse/HBASE-17678
>             Project: HBase
>          Issue Type: Bug
>          Components: Filters
>    Affects Versions: 2.0.0, 1.3.0, 1.2.1
>         Environment: RedHat 7.x
>            Reporter: Jason Tokayer
>            Assignee: Zheng Hu
>         Attachments: HBASE-17678.addendum.patch, HBASE-17678.v1.patch, HBASE-17678.v1.rough.patch,
HBASE-17678.v2.patch, HBASE-17678.v3.patch, HBASE-17678.v4.patch, HBASE-17678.v4.patch, HBASE-17678.v5.patch,
HBASE-17678.v6.patch, HBASE-17678.v7.patch, HBASE-17678.v7.patch, TestColumnPaginationFilterDemo.java
>
>
> When combining ColumnPaginationFilter with a single-element filterList, MUST_PASS_ONE
and MUST_PASS_ALL give different results when there are multiple cells with the same timestamp.
This is unexpected since there is only a single filter in the list, and I would believe that
MUST_PASS_ALL and MUST_PASS_ONE should only affect the behavior of the joined filter and not
the behavior of any one of the individual filters. If this is not a bug then it would be nice
if the documentation is updated to explain this nuanced behavior.
> I know that there was a decision made in an earlier Hbase version to keep multiple cells
with the same timestamp. This is generally fine but presents an issue when using the aforementioned
filter combination.
> Steps to reproduce:
> In the shell create a table and insert some data:
> {code:none}
> create 'ns:tbl',{NAME => 'family',VERSIONS => 100}
> put 'ns:tbl','row','family:name','John',1000000000000
> put 'ns:tbl','row','family:name','Jane',1000000000000
> put 'ns:tbl','row','family:name','Gil',1000000000000
> put 'ns:tbl','row','family:name','Jane',1000000000000
> {code}
> Then, use a Scala client as:
> {code:none}
> import org.apache.hadoop.hbase.filter._
> import org.apache.hadoop.hbase.util.Bytes
> import org.apache.hadoop.hbase.client._
> import org.apache.hadoop.hbase.{CellUtil, HBaseConfiguration, TableName}
> import scala.collection.mutable._
> val config = HBaseConfiguration.create()
> config.set("hbase.zookeeper.quorum", "localhost")
> config.set("hbase.zookeeper.property.clientPort", "2181")
> val connection = ConnectionFactory.createConnection(config)
> val logicalOp = FilterList.Operator.MUST_PASS_ONE
> val limit = 1
> var resultsList = ListBuffer[String]()
> for (offset <- 0 to 20 by limit) {
> 	val table = connection.getTable(TableName.valueOf("ns:tbl"))
> 	val paginationFilter = new ColumnPaginationFilter(limit,offset)
> 	val filterList: FilterList = new FilterList(logicalOp,paginationFilter)
> 	println("@ filterList = "+filterList)
> 	val results = table.get(new Get(Bytes.toBytes("row")).setFilter(filterList))
> 	val cells = results.rawCells()
> 	if (cells != null) {
> 		for (cell <- cells) {
> 		  val value = new String(CellUtil.cloneValue(cell))
> 		  val qualifier = new String(CellUtil.cloneQualifier(cell))
> 		  val family = new String(CellUtil.cloneFamily(cell))
> 		  val result = "OFFSET = "+offset+":"+family + "," + qualifier + "," + value + ","
+ cell.getTimestamp()
> 		  resultsList.append(result)
> 		}
> 	}
> }
> resultsList.foreach(println)
> {code}
> Here are the results for different limit and logicalOp settings:
> {code:none}
> Limit = 1 & logicalOp = MUST_PASS_ALL:
> scala> resultsList.foreach(println)
> OFFSET = 0:family,name,Jane,1000000000000
> Limit = 1 & logicalOp = MUST_PASS_ONE:
> scala> resultsList.foreach(println)
> OFFSET = 0:family,name,Jane,1000000000000
> OFFSET = 1:family,name,Gil,1000000000000
> OFFSET = 2:family,name,Jane,1000000000000
> OFFSET = 3:family,name,John,1000000000000
> Limit = 2 & logicalOp = MUST_PASS_ALL:
> scala> resultsList.foreach(println)
> OFFSET = 0:family,name,Jane,1000000000000
> Limit = 2 & logicalOp = MUST_PASS_ONE:
> scala> resultsList.foreach(println)
> OFFSET = 0:family,name,Jane,1000000000000
> OFFSET = 2:family,name,Jane,1000000000000
> {code}
> So, it seems that MUST_PASS_ALL gives the expected behavior, but MUST_PASS_ONE does not.
Furthermore, MUST_PASS_ONE seems to give only a single (not-duplicated)  within a page, but
not across pages.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message