hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ccalugaru <ccalug...@sdl.com>
Subject Hbase update use case
Date Sun, 11 Aug 2013 21:17:24 GMT
Hi all,
I have the following hbase use case:
One Hbase table, with a row key (built with a combination of md5 hashes) and
2 column families. Logically, the table stores sentences. The table has
hundreds of millions of records.

I have a webapp that connects to this hbase table, and needs to randomly
export sentences, based on some conditions. Currently, all these conditions
can be looked-up just by using the rowkey.
Typically, one export would contain just a couple of hundreds sentences. The
important restriction is that once some segments are exported, they should
not be present in any subsequent export.

So my question is related to this - how should I make sure the same segments
do not get exported again?

Should I 'mark' the exported segments, by updating a flag, after each export
happens? This has the drawback that, when looking at which segments meet my
conditions, I wouldn't be able to use just the rowkey for identifying those
records, but also that flag. Hence, I would need to use filters, which I
know are way slower.

Is there a better approach for this? 

View this message in context: http://apache-hbase.679495.n3.nabble.com/Hbase-update-use-case-tp4049091.html
Sent from the HBase User mailing list archive at Nabble.com.

View raw message