hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From rob mancuso <rcuso...@gmail.com>
Subject Re: observer coprocessor question regarding puts
Date Tue, 18 Jun 2013 02:36:20 GMT
Thx Mike, makes perfect sense.  I'm using opentsdb, so my schema is fixed.
Metric is at the front of my key (my [A]) and dataserver is at the end (my
[C]).  I need to be able to query by either or, and simply inverting the
rowkey allows me to use the opentsdb apis...by leaving the cf:cq and value
as is.

My initial attempt works, but I'm getting socket timeouts when I increase
volume.  I have some more debugging to do.

Thx
On Jun 14, 2013 10:46 AM, "Michael Segel" <michael_segel@hotmail.com> wrote:

> Not to beat a dead horse...
>
> I did want to touch a bit more on the schema design issues and
> considerations.
>
> If you have a really wide composite key and you're only storing a single
> cell, you will end up with a very long (tall) table.
>
> Does this make sense?
>
> Would it make more sense in using a smaller key and then storing multiple
> cells with part of the rowkey as a column qualifier?
>
> Using your example... you have [A,B,C] as your rowkey and then Column1
> with a value.
>
> You could make the row key [A, B] with the column qualifier [C] storing
> the value there.
>
> Does that make sense?
>
> -Mike
>
> On Jun 13, 2013, at 9:51 PM, Michel Segel <michael_segel@hotmail.com>
> wrote:
>
> > Ok...
> >
> > But then you are duplicating the data, so you will have to reconcile the
> two sets and there is a possibility that the data sets are out of sync.
> >
> > I don't know your entire Schema, but if the row key is larger than the
> value, you may want to think about changing the Schema.
> >
> >
> > Sent from a remote device. Please excuse any typos...
> >
> > Mike Segel
> >
> > On Jun 13, 2013, at 9:34 PM, rob mancuso <rcuso123@gmail.com> wrote:
> >
> >> Thx Mike, for the most part.
> >>
> >> My key is substantially larger than my value, so I was thinking of
> leaving
> >> the cq->value stuff as is and just inverting the rowkey.
> >>
> >> So the original table would have
> >>
> >> [A, B, C] cf1:cq1 val1
> >>
> >> And the secondary table would have
> >>
> >> [C, B, A] cf1:cq1 val1
> >> On Jun 10, 2013 3:42 PM, "Michael Segel" <michael_segel@hotmail.com>
> wrote:
> >>
> >>>
> >>> If I understand you ...
> >>>
> >>> You have the row key = [A,B,C]
> >>> You want to create an inverted mapping of  Key [C] => {[A,B,C]}
> >>>
> >>> That is to say that your inverted index would be all of the rows where
> the
> >>> value of C = x  .
> >>> And x is some value.
> >>>
> >>> You should have to worry about column qualifiers just the values of A
> , B
> >>> and C.
> >>>
> >>> In this case, the columns in your index will also be the values of the
> >>> tuples.
> >>> You really don't need C because you already have it, but then you'd
> need
> >>> to remember to add it to the pair (A, B) that you are storing.
> >>> I'd say waste the space and store (A,B,C) but that's just me.
> >>>
> >>>
> >>> Is that what you want to do?
> >>>
> >>> -Mike
> >>>
> >>> On Jun 9, 2013, at 12:16 PM, rob mancuso <rcuso123@gmail.com> wrote:
> >>>
> >>>> Thx Anoop, I believe this is what I'm looking for.
> >>>>
> >>>> Regarding my use case,  my rowkey is [A,B,C], but i also have a
> >>> requirement
> >>>> to access data by [C] only.  So I'm looking to use a post-put
> coprocessor
> >>>> to maintain one secondary index table where the rowkey starts with
> [C].
> >>> My
> >>>> cqs are numerics representing time and can be any number btw 1 and
> 3600
> >>> (ie
> >>>> seconds within an hour). Because I won't know the cq value for each
> >>>> incoming put (just the cf), I need something to deconstruct the put
> into
> >>> a
> >>>> list of cqs ...which I believe you've provided with getFamilyMap.
> >>>>
> >>>> Thx again!
> >>>> On Jun 9, 2013 12:47 AM, "Anoop John" <anoop.hbase@gmail.com>
wrote:
> >>>>
> >>>>> You want to have an index per every CF+CQ right?  You want to
> maintain
> >>> diff
> >>>>> tables for diff columns?
> >>>>>
> >>>>> Put is having getFamilyMap method Map CF vs List KVs.  From this
> List of
> >>>>> KVs you can get all the CQ names and values etc..
> >>>>>
> >>>>> -Anoop-
> >>>>>
> >>>>> On Sat, Jun 8, 2013 at 11:24 PM, rob mancuso <rcuso123@gmail.com>
> >>> wrote:
> >>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> I'm looking to write a post-put observer coprocessor to maintain
a
> >>>>>> secondary index.  Basically, my current rowkey design is a
> composite of
> >>>>>> A,B,C and I want to be able to also access data by C.  So all
i'm
> >>> looking
> >>>>>> to do is invert the rowkey and apply it for all cf:cq values
that
> come
> >>>>> in.
> >>>>>>
> >>>>>> My problem (i think), is that in all the good examples i've
seen,
> they
> >>>>> all
> >>>>>> deconstruct the Put by calling put.get(<cf>,<cq>)...implying
they
> know
> >>>>> the
> >>>>>> qualifier ahead of time.  I'm looking to specify the family
and
> >>> generate
> >>>>> a
> >>>>>> put to the secondary index table for all qualifiers ...not knowing
> or
> >>>>>> caring what the qualifier is.
> >>>>>>
> >>>>>> Any pointers would be appreciated,
> >>>>>> Thx - Rob
> >>>>>>
> >>>>>> Is there a way
> >>>
> >>>
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message