hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Angus He <angu...@gmail.com>
Subject Re: HBase schema design and value filtering
Date Thu, 20 May 2010 06:05:42 GMT
Or you can just set the startkey to S, the end key to T.

On Thu, May 20, 2010 at 1:41 PM, Raghava Mutharaju
<m.vijayaraghava@gmail.com> wrote:
> Hi Angus,
>
>      Thank you for the reply. As of now, X in S(X) would be a String. Later
> on, there are plans to encode it in numerical format. So I am looking for
> alternate Filters.
>
> Regards,
> Raghava.
>
> On Thu, May 20, 2010 at 12:23 AM, Angus He <angushe@gmail.com> wrote:
>
>> Suppose the maximum length of number of S(X), eg.  10,  you can set
>> the start key to S(0000000000)  (ten 0s) and the end key to
>> S(99999999999) ( 11 9s).
>>
>> On Mon, May 17, 2010 at 8:43 AM, Raghava Mutharaju
>> <m.vijayaraghava@gmail.com> wrote:
>> > Hi all,
>> >
>> >    Let a set, S(X) = {a, b, c, d, e, f, .....}. I compute the values of
>> the
>> > set in multiple MR job iterations i.e. multiple MR jobs would be run one
>> > after another several times. In each iteration, a subset of the values
>> would
>> > be computed i.e. the value of the set would be computed incrementally. I
>> am
>> > using HBase to store the result. In this scenario, my design is as
>> follows
>> >
>> > Schema Design:
>> >
>> >   - S(X) is the row key.
>> >   - Each element would be a column in the column family. The label of the
>> >   column would be the iteration number followed by a number indicating
>> the
>> >   position of the element in the subset.
>> >   Eg: In iteration 1, subset {a,b} has been computed. Then the row would
>> be
>> >   S(X) = {contains: {{1.1: a}, {1.2: b}}}. Here, contains is the name of
>> the
>> >   column family.
>> >
>> > I can add the results of subsequent iterations (other subsets) to S(X) by
>> > adding more columns.
>> > Would this design be appropriate for the above scenario?
>> >
>> > There would be many S(X) - X can be X1, X2, X3, .... and many elements in
>> > the set, S(X).
>> >
>> > Filtering:
>> >
>> > To retrieve all the sets, S(X), a range fetch should be performed. I
>> > wouldn't know the startkey and endkey because number of S(X) sets is not
>> > known before hand. Can I use PrefixFilter for this, by setting prefix as
>> > 'S'?
>> >
>> > Thank you in advance.
>> >
>> > Regards,
>> > Raghava.
>> >
>>
>>
>>
>> --
>> Regards
>> Angus
>>
>



-- 
Regards
Angus

Mime
View raw message