From Angus He <angu...@gmail.com>
Subject Re: HBase schema design and value filtering
Date Thu, 20 May 2010 04:23:26 GMT
```Suppose the maximum length of number of S(X), eg.  10,  you can set
the start key to S(0000000000)  (ten 0s) and the end key to
S(99999999999) ( 11 9s).

On Mon, May 17, 2010 at 8:43 AM, Raghava Mutharaju
<m.vijayaraghava@gmail.com> wrote:
> Hi all,
>
>    Let a set, S(X) = {a, b, c, d, e, f, .....}. I compute the values of the
> set in multiple MR job iterations i.e. multiple MR jobs would be run one
> after another several times. In each iteration, a subset of the values would
> be computed i.e. the value of the set would be computed incrementally. I am
> using HBase to store the result. In this scenario, my design is as follows
>
> Schema Design:
>
>   - S(X) is the row key.
>   - Each element would be a column in the column family. The label of the
>   column would be the iteration number followed by a number indicating the
>   position of the element in the subset.
>   Eg: In iteration 1, subset {a,b} has been computed. Then the row would be
>   S(X) = {contains: {{1.1: a}, {1.2: b}}}. Here, contains is the name of the
>   column family.
>
> I can add the results of subsequent iterations (other subsets) to S(X) by
> Would this design be appropriate for the above scenario?
>
> There would be many S(X) - X can be X1, X2, X3, .... and many elements in
> the set, S(X).
>
> Filtering:
>
> To retrieve all the sets, S(X), a range fetch should be performed. I
> wouldn't know the startkey and endkey because number of S(X) sets is not
> known before hand. Can I use PrefixFilter for this, by setting prefix as
> 'S'?
>
>
> Regards,
> Raghava.
>

--
Regards
Angus

```
