hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tokayer, Jason M." <Jason.Toka...@capitalone.com>
Subject FilterList with a ColumnPaginationFilter in Java (Scala) Client
Date Sat, 18 Feb 2017 19:35:47 GMT

I am having some difficulty understanding the results when I apply a ColumnPaginationFilter
within a FilterList. I’m not sure whether this is an Hbase bug or a gap in my understanding
of how the API works.

Specifically, I’m noticing a difference between using MUST_PASS_ONE vs MUST_PASS_ALL in
my filterList even when I only have a single filter in the list. I walk through a full, but
simplified (ie I took out the other filters in the list because I have narrowed down the problem;
but I still do need to use a filterList), example below that illustrated the issue:

First, in the shell I create a table and insert multiple values with the same timestamp:
create 'ns:tbl',{NAME => 'family',VERSIONS => 100}
put 'ns:tbl','row','family:name','John',1000000000000
put 'ns:tbl','row','family:name','Jane',1000000000000
put 'ns:tbl','row','family:name','Gil',1000000000000
put 'ns:tbl','row','family:name','Jane',1000000000000

Now, I create a custom client written in Scala that uses the Java APIs:

import org.apache.hadoop.hbase.filter._
import org.apache.hadoop.hbase.util.Bytes
import org.apache.hadoop.hbase.client._
import org.apache.hadoop.hbase.{CellUtil, HBaseConfiguration, TableName}
import scala.collection.mutable._

val config = HBaseConfiguration.create()
config.set("hbase.zookeeper.quorum", "localhost")
config.set("hbase.zookeeper.property.clientPort", "2181")

val connection = ConnectionFactory.createConnection(config)

val logicalOp = FilterList.Operator.MUST_PASS_ALL
val limit = 1
var resultsList = ListBuffer[String]()
for (offset <- 0 to 20 by limit) {
            val table = connection.getTable(TableName.valueOf("ns:tbl"))
            val paginationFilter = new ColumnPaginationFilter(limit,offset)
            val filterList: FilterList = new FilterList(logicalOp,paginationFilter)
            val results = table.get(new Get(Bytes.toBytes("row")).setFilter(filterList))
            val cells = results.rawCells()
            if (cells != null) {
                        for (cell <- cells) {
                          val value = new String(CellUtil.cloneValue(cell))
                          val qualifier = new String(CellUtil.cloneQualifier(cell))
                          val family = new String(CellUtil.cloneFamily(cell))
                          val result = "OFFSET = "+offset+":"+family + "," + qualifier + ","
+ value + "," + cell.getTimestamp()

My results look like:
limit = 1 & logicalOp = MUST_PASS_ALL:
scala> resultsList.foreach(println)
OFFSET = 0:family,name,Jane,1000000000000

limit = 1 & logicalOp = MUST_PASS_ONE:
scala> resultsList.foreach(println)
OFFSET = 0:family,name,Jane,1000000000000
OFFSET = 1:family,name,Gil,1000000000000
OFFSET = 2:family,name,Jane,1000000000000
OFFSET = 3:family,name,John,1000000000000

limit = 2 & logicalOp = MUST_PASS_ALL:
scala> resultsList.foreach(println)
OFFSET = 0:family,name,Jane,1000000000000

limit = 2 & logicalOp = MUST_PASS_ONE:
scala> resultsList.foreach(println)
OFFSET = 0:family,name,Jane,1000000000000
OFFSET = 2:family,name,Jane,1000000000000

My main question is around why, when using MUST_PASS_ONE, don’t I get back only the single,
most-recently-inserted value of the cell as I do when I use MUST_PASS_ALL? Note that if I
don’t use a filterList at all and instance just set the get’s filter to the paginationFilter,
I get the result I would expect (ie the single OFFSET = 0:family,name,Jane,1000000000000).

The documentation isn’t entirely clear about this situation, and I’m hoping someone on
either mailing list may be able to assist.


The information contained in this e-mail is confidential and/or proprietary to Capital One
and/or its affiliates and may only be used solely in performance of work or services for Capital
One. The information transmitted herewith is intended only for use by the individual or entity
to which it is addressed. If the reader of this message is not the intended recipient, you
are hereby notified that any review, retransmission, dissemination, distribution, copying
or other use of, or taking of any action in reliance upon this information is strictly prohibited.
If you have received this communication in error, please contact the sender and delete the
material from your computer.
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message