hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <ri...@laposte.net>
Subject RE: Read access pattern
Date Tue, 30 Apr 2013 13:17:24 GMT
Yes, I see, but this is quite expensive as the table is huge

-----Message d'origine-----
De : Jean-Marc Spaggiari [mailto:jean-marc@spaggiari.org] 
Envoyé : lundi 29 avril 2013 20:04
À : user@hbase.apache.org; ricla@laposte.net
Objet : Re: Read access pattern

HBASE-4811 is what you should be looking for, but it's not even close to be implemented yet...

One option will be to have 2 tables, each in a reserved order. So scanning forward in each
will give you the key just after which at the end will give you the key before the and the
after...

2013/4/29  <ricla@laposte.net>:
>
> Thanx for the quick answer.
>
>> For the next key, I think you can simply use your current key as your 
>> scanner first key. You will then find the one which is just after.
>> Then you will have to verify the MD5 hash to make sure it's still for 
>> the same object.
> Right, this is basically easy.
>
>> First, if you know that you are storing data about every 10 seconds, 
>> set the startRow with something like
>> getMD5AsHex(Bytes.toBytes(myObjectId)) + String.format("%19d\n", 
>> (Long.MAX_VALUE - (changeDate.getTime() - 60000))) then ready the few 
>> lines you will have until you find your current line, and keep the 
>> last one.
>
> Actually it is impossible to know the timerange for which there will 
> be a next entry
>
>>
>> Else, if you don't know, you will have to start with 
>> scan.setStartRow(getMD5AsHex(Bytes.toBytes(myObjectId))); but you 
>> might have to skip MANY lines before finding the right one. Do I 
>> don't really recommend that.
>
> ouch, obviously not very efficient. I assume even with a filter ?
>> Message du 29/04/13 18:18
>> De : "Jean-Marc Spaggiari"
>> A : user@hbase.apache.org
>> Copie à :
>> Objet : Re: Read access pattern
>>
>> Hum.
>>
>> For the next key, I think you can simply use your current key as your 
>> scanner first key. You will then find the one which is just after.
>> Then you will have to verify the MD5 hash to make sure it's still for 
>> the same object.
>>
>> scan.setStartRow(getMD5AsHex(Bytes.toBytes(myObjectId)) + 
>> String.format("%19d\n", (Long.MAX_VALUE - changeDate.getTime())));
>>
>> If you want to find the one just before, quickly, I see 2 options.
>>
>> First, if you know that you are storing data about every 10 seconds, 
>> set the startRow with something like
>> getMD5AsHex(Bytes.toBytes(myObjectId)) + String.format("%19d\n", 
>> (Long.MAX_VALUE - (changeDate.getTime() - 60000))) then ready the few 
>> lines you will have until you find your current line, and keep the 
>> last one.
>>
>> Else, if you don't know, you will have to start with 
>> scan.setStartRow(getMD5AsHex(Bytes.toBytes(myObjectId))); but you 
>> might have to skip MANY lines before finding the right one. Do I 
>> don't really recommend that.
>>
>> JM
>>
>> 2013/4/29 Shahab Yunus :
>> > I think you cannot use the scanner simply to to a range scan here 
>> > as your keys are not monotonically increasing. You need to apply 
>> > logic to decode/reverse your mechanism that you have used to hash 
>> > your keys at the time of writing. You might want to check out the 
>> > SemaText library which does distributed scans and seem to handle 
>> > the scenarios that you want to implement.
>> > http://blog.sematext.com/2012/04/09/hbasewd-avoid-regionserver-hots
>> > potting-despite-writing-records-with-sequential-keys/
>> >
>> >
>> > On Mon, Apr 29, 2013 at 11:03 AM, wrote:
>> >
>> >> Hi,
>> >>
>> >> I have a rowkey defined by :
>> >> getMD5AsHex(Bytes.toBytes(myObjectId)) + String.format("%19d\n", 
>> >> (Long.MAX_VALUE - changeDate.getTime()));
>> >>
>> >> How could I get the previous and next row for a given rowkey ?
>> >> For instance, I have the following ordered keys :
>> >>
>> >> 00003db1b6c1e7e7d2ece41ff2184f76*9223370673172227807
>> >> 00003db1b6c1e7e7d2ece41ff2184f76*9223370674468022807
>> >> >00003db1b6c1e7e7d2ece41ff2184f76*9223370674468862807
>> >> 00003db1b6c1e7e7d2ece41ff2184f76*9223370674984237807
>> >> 00003db1b6c1e7e7d2ece41ff2184f76*9223370674987271807
>> >>
>> >> If I choose the rowkey :
>> >> 00003db1b6c1e7e7d2ece41ff2184f76*9223370674468862807, what would 
>> >> be the correct scan to get the previous and next key ?
>> >> Result would be :
>> >> 00003db1b6c1e7e7d2ece41ff2184f76*9223370674468022807
>> >> 00003db1b6c1e7e7d2ece41ff2184f76*9223370674984237807
>> >>
>> >> Thank you !
>> >> R.
>> >>
>> >> Une messagerie gratuite, garantie à vie et des services en plus, 
>> >> ça vous tente ?
>> >> Je crée ma boîte mail www.laposte.net
>> >>
>>
>
> Une messagerie gratuite, garantie à vie et des services en plus, ça vous tente ?
> Je crée ma boîte mail www.laposte.net


Mime
View raw message