hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicholas Harezga <n.hare...@apexxs.com>
Subject ValueFilter returning old versions
Date Wed, 23 Mar 2016 18:36:04 GMT
I have a table with row keys representing file names, a single column family, and file creation
time as the column qualifier. The value of these columns is a serialized JSON representation
of an object. My program goes through the records, performs an operation on the file, and
modifies the JSON object to indicate that the file has been processed. On each run of the
program I only want to grab up to a specified number of records that have yet to be processed.
Previously I was grabbing all of the records and filtering at the client side. I am now attempting
to move the filtering to the server side to reduce network traffic and hopefully streamline
the process a bit.

I am using a ValueFilter with a SubstringComparator to get the rows that meet my conditions.
Scan scan = new Scan();
String filterString = "\"jobState\":\"new\"";

scan.setFilter(new ValueFilter(CompareFilter.CompareOp.EQUAL, new SubstringComparator(filterString)));

When records are added they have a jobState of "new" and when they have been processed the
jobState is set to "processed" and the record in HBase is updated. If I do a scan from HBase
shell or do a scan of the full table from Java I get the most recent version (maximum versions
for this table is set to 1). When I scan using the filter I still get the original version
of this row, and if I change the filter to use "processed" I get the updated version.

The end result of this is that I process the same files several times. The process repeats
itself until HBase performs a flush or compaction, verified by flushing manually from HBase

I am currently using hbase-shaded-client v1.1.2 for my Java API and I have HBase v1.0.0-cdh5.4.8
running on my cluster under Cloudera Manager v5.4.8. I believe I found a similar issue posted
in December, 2013 (http://mail-archives.apache.org/mod_mbox/hbase-user/201312.mbox/%3CCADoiZqpxq64L75v3T3RGsks-82kRYMFMNYnYs-+2u0-f2a0PoA@mail.gmail.com%3E)
but there didn't appear to be any resolution to the issue other than creating a custom filter.

Is there a newer version of HBase that doesn't have this issue? Is there a better way for
me to do the filtering that I need to do?

If there is any further information I can provide please let me know. Any recommendations/help
would be greatly appreciated.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message