hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From anil gupta <anilgupt...@gmail.com>
Subject Re: Coprocessor Increments
Date Tue, 15 Oct 2013 20:47:49 GMT
On Tue, Oct 15, 2013 at 11:12 AM, Michael Segel
<msegel_hadoop@hotmail.com>wrote:

> Anil,
> > Agree with you. But, as per my knowledge and experience with
> coprocessors,
> > they are meant to be used for operations that are local to RS. Otherwise,
> > you are in danger of running into deadlocks, scalability issues.
>
>
> I also did a quick look at…  HBASE-7474…
>
> You start with the assumption that all of your data is within a single
> region.
>
No, i dont. That sorting CP works even if your scan spans multiple RS's. I
do a merge sort at client side in that case. Please look at the code
closely. :)

>
> IMHO, this is a very narrow window for use cases.
>
> Most use cases have data that crosses region boundaries.
>
> From a design perspective… limiting the use case to only within region…
> kinda kills the reason for coprocessors to exist. Even looking back at the
> implementation by Google, they don't appear to have this problem… errr
> limitation.
>
> Sorry… IMHO and YMMV.
>
>
> On Oct 14, 2013, at 3:25 PM, anil gupta <anilgupta84@gmail.com> wrote:
>
> > Inline.
> >
> >
> > On Mon, Oct 14, 2013 at 7:50 AM, Michael Segel <
> msegel_hadoop@hotmail.com>wrote:
> >
> >> Anil,
> >>
> >> I wasn't suggesting that you can't do what you're doing, but you end up
> >> running in to the risks which coprocessors are supposed to remove. The
> >> standard YMMV always applies.
> >>
> > Agree with you. But, as per my knowledge and experience with
> coprocessors,
> > they are meant to be used for operations that are local to RS. Otherwise,
> > you are in danger of running into deadlocks, scalability issues.
> >
> >>
> >> You have a cluster… another team in your company wants to use the
> cluster.
> >> So instead of the cluster being a single resource for your app/team, it
> now
> >> becomes a shared resource. So now you have people accessing HBase for
> >> multiple apps.
> >>
> > Well, its a separation of responsibility in this case. We don't want
> teams
> > to step each other toes and at the same time work well as an ecosystem.
> > Rule: Other teams can use same cluster. But they cannot write directly
> into
> > the tables that we own/control.  If they want to write into our tables
> then
> > they have to use our HBase Client.
> >
> >>
> >> You could then run multiple HBase HMasters with different locations for
> >> files, however… this can get messy.
> >> HOYA seems to suggest this as the future.  If so, then you have to
> wonder
> >> about data locality.
> >>
> > HOYA is not even in beta at present. So, right now we are not thinking
> > about it.
> >
> >>
> >> Having your app update the primary table and then the secondary index is
> >> always a good fallback, however you need to ensure that you understand
> the
> >> risks.
> >>
> > Agree, i understand that there is risk. But, you have to bite the bullet
> > when you are doing something that is not supported out of the box.  We
> also
> > use CP's wherever they are appropriate(like HBASE-7474).
> >
> >>
> >> With respect to secondary indexes… if you decouple the writes… you can
> get
> >> better throughput. Note that the code becomes a bit more complex because
> >> you're going to have to introduce a couple of different things.  But
> thats
> >> something for a different discussion…
> >>
> > Whether to use CP or not, depends on the use case. In my opinion, CP's
> are
> > really powerful and an awesome feature in HBase. But, sometimes if not
> used
> > properly(like creating a Cyclic Graph as per Tom's example), they might
> be
> > problematic.
> >
> >
> >>
> >> On Oct 13, 2013, at 10:15 AM, anil gupta <anilgupta84@gmail.com> wrote:
> >>
> >>> Inline.
> >>>
> >>> On Sun, Oct 13, 2013 at 6:02 AM, Michael Segel <
> >> msegel_hadoop@hotmail.com>wrote:
> >>>
> >>>> Ok…
> >>>>
> >>>> Sure you can have your app update the secondary index table.
> >>>> The only issue with that is if someone updates the base table outside
> of
> >>>> your app,
> >>>> they may or may not increment the secondary index.
> >>>>
> >>> Anil: We dont allow people to write data into HBase from their own
> HBase
> >>> client. We control the writes into HBase. So, we dont have the problem
> of
> >>> secondary index not getting written.
> >>> For example, If you expose a restful web service you can easily control
> >> the
> >>> writes to HBase. Even, if user requests to write one row in "main
> table",
> >>> you application can have the logic to writing in "Secondary index"
> >> tables.
> >>> In this way, it is transparent to users also. You can add/remove
> seconday
> >>> indexes as you want.
> >>>
> >>>> Note that your secondary index doesn't have to be an inverted table,
> but
> >>>> could be SOLR, LUCENE or something else.
> >>>>
> >>> Anil:As of now, we are happy with Inverted tables as they fit to our
> use
> >>> case.
> >>>
> >>>>
> >>>> So you really want to secondary indexes on the server.
> >>>>
> >>>> There are a couple of things that could improve the performance,
> >> although
> >>>> the write to the secondary index would most likely lag under heavy
> load.
> >>>>
> >>>>
> >>>> On Oct 12, 2013, at 11:27 PM, anil gupta <anilgupta84@gmail.com>
> wrote:
> >>>>
> >>>>> John,
> >>>>>
> >>>>> My 2 cents:
> >>>>> I tried implementing Secondary Index by using Region Observers on
> Put.
> >> It
> >>>>> works well under low load. But, under heavy load the RO could not
> keep
> >> up
> >>>>> with load cross region server writes.
> >>>>> Then, i decided not to use RO as per Andrew's explanation and  I
> moved
> >>>> all
> >>>>> the logic of building secondary index tables on my HBase Client
.
> Since
> >>>>> then, the system has been running fine under heavy load.
> >>>>> IMO, if you will use RO and do cross RS read/write then perhaps
this
> >> will
> >>>>> become your bottleneck in HBase.
> >>>>> Is it possible for you to avoid RO and control the writes/updates
> from
> >>>>> client side?
> >>>>>
> >>>>> Thanks,
> >>>>> Anil Gupta
> >>>>>
> >>>>>
> >>>>> On Fri, Oct 11, 2013 at 6:06 PM, John Weatherford <
> >>>>> john.weatherford@telescope.tv> wrote:
> >>>>>
> >>>>>> OP Here :)
> >>>>>>
> >>>>>> Our current design involves a Region Observer on a table that
does
> >>>>>> increments on a second table. We took the approach that Michael
said
> >> and
> >>>>>> inside the RO, we got a new connection and everything. We believe
> this
> >>>> is
> >>>>>> causing deadlocks for us. Our next attempt is going to be writing
to
> >>>>>> another row in the same table where we will store the increments.
If
> >>>> this
> >>>>>> doesn't work, we are going to simply pull the increments out
of the
> RO
> >>>> and
> >>>>>> do them in the application or in Flume.
> >>>>>>
> >>>>>> @Tom Brown
> >>>>>> I would be very interested to hear more about your solution
of
> >>>>>> aggregating the increments in another system that is then
> responsible
> >>>> for
> >>>>>> updating in Hbase.
> >>>>>>
> >>>>>> -jW
> >>>>>>
> >>>>>>
> >>>>>> On Fri 11 Oct 2013 10:26:58 AM PDT, Vladimir Rodionov wrote:
> >>>>>>
> >>>>>>> With respect to the OP's design… does the deadlock occur
because
> he's
> >>>>>>>>> trying to update a column in a different row within
the same
> table?
> >>>>>>>>>
> >>>>>>>>
> >>>>>>> Because he is trying to update *row* in a different Region
(and
> >>>>>>> potentially in different RS).
> >>>>>>>
> >>>>>>> Best regards,
> >>>>>>> Vladimir Rodionov
> >>>>>>> Principal Platform Engineer
> >>>>>>> Carrier IQ, www.carrieriq.com
> >>>>>>> e-mail: vrodionov@carrieriq.com
> >>>>>>>
> >>>>>>> ______________________________**__________
> >>>>>>> From: Michael Segel [msegel_hadoop@hotmail.com]
> >>>>>>> Sent: Friday, October 11, 2013 9:10 AM
> >>>>>>> To: user@hbase.apache.org
> >>>>>>> Cc: Vladimir Rodionov
> >>>>>>> Subject: Re: Coprocessor Increments
> >>>>>>>
> >>>>>>>
> >>>>>>> Confidentiality Notice:  The information contained in this
message,
> >>>>>>> including any attachments hereto, may be confidential and
is
> intended
> >>>> to be
> >>>>>>> read only by the individual or entity to whom this message
is
> >>>> addressed. If
> >>>>>>> the reader of this message is not the intended recipient
or an
> agent
> >> or
> >>>>>>> designee of the intended recipient, please note that any
review,
> use,
> >>>>>>> disclosure or distribution of this message or its attachments,
in
> any
> >>>> form,
> >>>>>>> is strictly prohibited.  If you have received this message
in
> error,
> >>>> please
> >>>>>>> immediately notify the sender and/or Notifications@carrieriq.comand
> >>>>>>> delete or destroy any copy of this message and its attachments.
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Thanks & Regards,
> >>>>> Anil Gupta
> >>>>
> >>>>
> >>>
> >>>
> >>> --
> >>> Thanks & Regards,
> >>> Anil Gupta
> >>
> >>
> >
> >
> > --
> > Thanks & Regards,
> > Anil Gupta
>
>


-- 
Thanks & Regards,
Anil Gupta

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message