hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject Re: observer coprocessor question regarding puts
Date Fri, 14 Jun 2013 14:45:47 GMT
Not to beat a dead horse... 

I did want to touch a bit more on the schema design issues and considerations. 

If you have a really wide composite key and you're only storing a single cell, you will end
up with a very long (tall) table. 

Does this make sense? 

Would it make more sense in using a smaller key and then storing multiple cells with part
of the rowkey as a column qualifier? 

Using your example... you have [A,B,C] as your rowkey and then Column1 with a value. 

You could make the row key [A, B] with the column qualifier [C] storing the value there. 

Does that make sense? 


On Jun 13, 2013, at 9:51 PM, Michel Segel <michael_segel@hotmail.com> wrote:

> Ok...
> But then you are duplicating the data, so you will have to reconcile the two sets and
there is a possibility that the data sets are out of sync.
> I don't know your entire Schema, but if the row key is larger than the value, you may
want to think about changing the Schema.
> Sent from a remote device. Please excuse any typos...
> Mike Segel
> On Jun 13, 2013, at 9:34 PM, rob mancuso <rcuso123@gmail.com> wrote:
>> Thx Mike, for the most part.
>> My key is substantially larger than my value, so I was thinking of leaving
>> the cq->value stuff as is and just inverting the rowkey.
>> So the original table would have
>> [A, B, C] cf1:cq1 val1
>> And the secondary table would have
>> [C, B, A] cf1:cq1 val1
>> On Jun 10, 2013 3:42 PM, "Michael Segel" <michael_segel@hotmail.com> wrote:
>>> If I understand you ...
>>> You have the row key = [A,B,C]
>>> You want to create an inverted mapping of  Key [C] => {[A,B,C]}
>>> That is to say that your inverted index would be all of the rows where the
>>> value of C = x  .
>>> And x is some value.
>>> You should have to worry about column qualifiers just the values of A , B
>>> and C.
>>> In this case, the columns in your index will also be the values of the
>>> tuples.
>>> You really don't need C because you already have it, but then you'd need
>>> to remember to add it to the pair (A, B) that you are storing.
>>> I'd say waste the space and store (A,B,C) but that's just me.
>>> Is that what you want to do?
>>> -Mike
>>> On Jun 9, 2013, at 12:16 PM, rob mancuso <rcuso123@gmail.com> wrote:
>>>> Thx Anoop, I believe this is what I'm looking for.
>>>> Regarding my use case,  my rowkey is [A,B,C], but i also have a
>>> requirement
>>>> to access data by [C] only.  So I'm looking to use a post-put coprocessor
>>>> to maintain one secondary index table where the rowkey starts with [C].
>>> My
>>>> cqs are numerics representing time and can be any number btw 1 and 3600
>>> (ie
>>>> seconds within an hour). Because I won't know the cq value for each
>>>> incoming put (just the cf), I need something to deconstruct the put into
>>> a
>>>> list of cqs ...which I believe you've provided with getFamilyMap.
>>>> Thx again!
>>>> On Jun 9, 2013 12:47 AM, "Anoop John" <anoop.hbase@gmail.com> wrote:
>>>>> You want to have an index per every CF+CQ right?  You want to maintain
>>> diff
>>>>> tables for diff columns?
>>>>> Put is having getFamilyMap method Map CF vs List KVs.  From this List
>>>>> KVs you can get all the CQ names and values etc..
>>>>> -Anoop-
>>>>> On Sat, Jun 8, 2013 at 11:24 PM, rob mancuso <rcuso123@gmail.com>
>>> wrote:
>>>>>> Hi,
>>>>>> I'm looking to write a post-put observer coprocessor to maintain
>>>>>> secondary index.  Basically, my current rowkey design is a composite
>>>>>> A,B,C and I want to be able to also access data by C.  So all i'm
>>> looking
>>>>>> to do is invert the rowkey and apply it for all cf:cq values that
>>>>> in.
>>>>>> My problem (i think), is that in all the good examples i've seen,
>>>>> all
>>>>>> deconstruct the Put by calling put.get(<cf>,<cq>)...implying
they know
>>>>> the
>>>>>> qualifier ahead of time.  I'm looking to specify the family and
>>> generate
>>>>> a
>>>>>> put to the secondary index table for all qualifiers ...not knowing
>>>>>> caring what the qualifier is.
>>>>>> Any pointers would be appreciated,
>>>>>> Thx - Rob
>>>>>> Is there a way

View raw message