hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vladimir Rodionov <vrodio...@carrieriq.com>
Subject RE: row filter - binary comparator at certain range
Date Mon, 21 Oct 2013 16:36:31 GMT
I advise you to refactor your key.

1. First, use salting of a low cardinality (say 1 random byte)
2. To improve range query - add time bucket to your time dimensions:

KEY: 
salt_timebucket_time

tiimebucket is something similar to: day_hour_min
time       - sec+ms part of timestamp

It will be easier for you to rollup events by tiimebucket or part of it:
byte day, by hour, by min, but you will need your own filter to efficiently implement functionality
(range query).
FuzzyRowFilter will work for small ranges, but for large ranges performance will degrade significantly.

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: Vladimir Rodionov
Sent: Monday, October 21, 2013 9:14 AM
To: user@hbase.apache.org
Subject: RE: row filter - binary comparator at certain range

FuzzyRowFilter does not work on sub-key ranges.
Salting is bad for any scan operation, unfortunately. When salt prefix cardinality is small
(1-2 bytes),
one can try something similar to FuzzyRowFilter but with additional sub-key range support.
If salt prefix cardinality is high (> 2 bytes) - do a full scan with your own Filter (for
timestamp ranges).

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: Premal Shah [premal.j.shah@gmail.com]
Sent: Sunday, October 20, 2013 10:42 PM
To: user
Subject: Re: row filter - binary comparator at certain range

Have you looked at FuzzyRowFilter? Seems to me that it might satisfy your
use-case.
http://blog.sematext.com/2012/08/09/consider-using-fuzzyrowfilter-when-in-need-for-secondary-indexes-in-hbase/


On Sun, Oct 20, 2013 at 9:31 PM, Tony Duan <duanjianmin@126.com> wrote:

> Alex Vasilenko <aa.vasilenko@...> writes:
>
> >
> > Lars,
> >
> > But how it will behave, when I have salt at the beginning of the key to
> > properly shard table across regions? Imagine row key of format
> > salt:timestamp and rows goes like this:
> > ...
> > 1:15
> > 1:16
> > 1:17
> > 1:23
> > 2:3
> > 2:5
> > 2:12
> > 2:15
> > 2:19
> > 2:25
> > ...
> >
> > And I want to find all rows, that has second part (timestamp) in range
> > 15-25. What startKey and endKey should be used?
> >
> > Alexandr Vasilenko
> > Web Developer
> > Skype:menterr
> > mob: +38097-611-45-99
> >
> > 2012/2/9 lars hofhansl <lhofhansl@...>
> Hi,
> Alexandr Vasilenko
> Have you ever resolved this issue?i am also facing this iusse.
> i also want implement this functionality.
> Imagine row key of format
> salt:timestamp and rows goes like this:
> ...
> 1:15
> 1:16
> 1:17
> 1:23
> 2:3
> 2:5
> 2:12
> 2:15
> 2:19
> 2:25
> ...
>
> And I want to find all rows, that has second part (timestamp) in range
> 15-25.
>
> Could you please tell me how you resolve this ?
> thanks  in advance.
>
>
> Tony duan
>
>


--
Regards,
Premal Shah.

Confidentiality Notice:  The information contained in this message, including any attachments
hereto, may be confidential and is intended to be read only by the individual or entity to
whom this message is addressed. If the reader of this message is not the intended recipient
or an agent or designee of the intended recipient, please note that any review, use, disclosure
or distribution of this message or its attachments, in any form, is strictly prohibited. 
If you have received this message in error, please immediately notify the sender and/or Notifications@carrieriq.com
and delete or destroy any copy of this message and its attachments.

Mime
View raw message