hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <msegel_had...@hotmail.com>
Subject Re: Coprocessor Increments
Date Tue, 15 Oct 2013 18:12:42 GMT
> Agree with you. But, as per my knowledge and experience with coprocessors,
> they are meant to be used for operations that are local to RS. Otherwise,
> you are in danger of running into deadlocks, scalability issues.

I also did a quick look at…  HBASE-7474…

You start with the assumption that all of your data is within a single region. 

IMHO, this is a very narrow window for use cases.

Most use cases have data that crosses region boundaries. 

From a design perspective… limiting the use case to only within region… kinda kills the
reason for coprocessors to exist. Even looking back at the implementation by Google, they
don't appear to have this problem… errr limitation. 

Sorry… IMHO and YMMV.

On Oct 14, 2013, at 3:25 PM, anil gupta <anilgupta84@gmail.com> wrote:

> Inline.
> On Mon, Oct 14, 2013 at 7:50 AM, Michael Segel <msegel_hadoop@hotmail.com>wrote:
>> Anil,
>> I wasn't suggesting that you can't do what you're doing, but you end up
>> running in to the risks which coprocessors are supposed to remove. The
>> standard YMMV always applies.
> Agree with you. But, as per my knowledge and experience with coprocessors,
> they are meant to be used for operations that are local to RS. Otherwise,
> you are in danger of running into deadlocks, scalability issues.
>> You have a cluster… another team in your company wants to use the cluster.
>> So instead of the cluster being a single resource for your app/team, it now
>> becomes a shared resource. So now you have people accessing HBase for
>> multiple apps.
> Well, its a separation of responsibility in this case. We don't want teams
> to step each other toes and at the same time work well as an ecosystem.
> Rule: Other teams can use same cluster. But they cannot write directly into
> the tables that we own/control.  If they want to write into our tables then
> they have to use our HBase Client.
>> You could then run multiple HBase HMasters with different locations for
>> files, however… this can get messy.
>> HOYA seems to suggest this as the future.  If so, then you have to wonder
>> about data locality.
> HOYA is not even in beta at present. So, right now we are not thinking
> about it.
>> Having your app update the primary table and then the secondary index is
>> always a good fallback, however you need to ensure that you understand the
>> risks.
> Agree, i understand that there is risk. But, you have to bite the bullet
> when you are doing something that is not supported out of the box.  We also
> use CP's wherever they are appropriate(like HBASE-7474).
>> With respect to secondary indexes… if you decouple the writes… you can get
>> better throughput. Note that the code becomes a bit more complex because
>> you're going to have to introduce a couple of different things.  But thats
>> something for a different discussion…
> Whether to use CP or not, depends on the use case. In my opinion, CP's are
> really powerful and an awesome feature in HBase. But, sometimes if not used
> properly(like creating a Cyclic Graph as per Tom's example), they might be
> problematic.
>> On Oct 13, 2013, at 10:15 AM, anil gupta <anilgupta84@gmail.com> wrote:
>>> Inline.
>>> On Sun, Oct 13, 2013 at 6:02 AM, Michael Segel <
>> msegel_hadoop@hotmail.com>wrote:
>>>> Ok…
>>>> Sure you can have your app update the secondary index table.
>>>> The only issue with that is if someone updates the base table outside of
>>>> your app,
>>>> they may or may not increment the secondary index.
>>> Anil: We dont allow people to write data into HBase from their own HBase
>>> client. We control the writes into HBase. So, we dont have the problem of
>>> secondary index not getting written.
>>> For example, If you expose a restful web service you can easily control
>> the
>>> writes to HBase. Even, if user requests to write one row in "main table",
>>> you application can have the logic to writing in "Secondary index"
>> tables.
>>> In this way, it is transparent to users also. You can add/remove seconday
>>> indexes as you want.
>>>> Note that your secondary index doesn't have to be an inverted table, but
>>>> could be SOLR, LUCENE or something else.
>>> Anil:As of now, we are happy with Inverted tables as they fit to our use
>>> case.
>>>> So you really want to secondary indexes on the server.
>>>> There are a couple of things that could improve the performance,
>> although
>>>> the write to the secondary index would most likely lag under heavy load.
>>>> On Oct 12, 2013, at 11:27 PM, anil gupta <anilgupta84@gmail.com> wrote:
>>>>> John,
>>>>> My 2 cents:
>>>>> I tried implementing Secondary Index by using Region Observers on Put.
>> It
>>>>> works well under low load. But, under heavy load the RO could not keep
>> up
>>>>> with load cross region server writes.
>>>>> Then, i decided not to use RO as per Andrew's explanation and  I moved
>>>> all
>>>>> the logic of building secondary index tables on my HBase Client . Since
>>>>> then, the system has been running fine under heavy load.
>>>>> IMO, if you will use RO and do cross RS read/write then perhaps this
>> will
>>>>> become your bottleneck in HBase.
>>>>> Is it possible for you to avoid RO and control the writes/updates from
>>>>> client side?
>>>>> Thanks,
>>>>> Anil Gupta
>>>>> On Fri, Oct 11, 2013 at 6:06 PM, John Weatherford <
>>>>> john.weatherford@telescope.tv> wrote:
>>>>>> OP Here :)
>>>>>> Our current design involves a Region Observer on a table that does
>>>>>> increments on a second table. We took the approach that Michael said
>> and
>>>>>> inside the RO, we got a new connection and everything. We believe
>>>> is
>>>>>> causing deadlocks for us. Our next attempt is going to be writing
>>>>>> another row in the same table where we will store the increments.
>>>> this
>>>>>> doesn't work, we are going to simply pull the increments out of the
>>>> and
>>>>>> do them in the application or in Flume.
>>>>>> @Tom Brown
>>>>>> I would be very interested to hear more about your solution of
>>>>>> aggregating the increments in another system that is then responsible
>>>> for
>>>>>> updating in Hbase.
>>>>>> -jW
>>>>>> On Fri 11 Oct 2013 10:26:58 AM PDT, Vladimir Rodionov wrote:
>>>>>>> With respect to the OP's design… does the deadlock occur because
>>>>>>>>> trying to update a column in a different row within the
same table?
>>>>>>> Because he is trying to update *row* in a different Region (and
>>>>>>> potentially in different RS).
>>>>>>> Best regards,
>>>>>>> Vladimir Rodionov
>>>>>>> Principal Platform Engineer
>>>>>>> Carrier IQ, www.carrieriq.com
>>>>>>> e-mail: vrodionov@carrieriq.com
>>>>>>> ______________________________**__________
>>>>>>> From: Michael Segel [msegel_hadoop@hotmail.com]
>>>>>>> Sent: Friday, October 11, 2013 9:10 AM
>>>>>>> To: user@hbase.apache.org
>>>>>>> Cc: Vladimir Rodionov
>>>>>>> Subject: Re: Coprocessor Increments
>>>>>>> Confidentiality Notice:  The information contained in this message,
>>>>>>> including any attachments hereto, may be confidential and is
>>>> to be
>>>>>>> read only by the individual or entity to whom this message is
>>>> addressed. If
>>>>>>> the reader of this message is not the intended recipient or an
>> or
>>>>>>> designee of the intended recipient, please note that any review,
>>>>>>> disclosure or distribution of this message or its attachments,
in any
>>>> form,
>>>>>>> is strictly prohibited.  If you have received this message in
>>>> please
>>>>>>> immediately notify the sender and/or Notifications@carrieriq.com
>>>>>>> delete or destroy any copy of this message and its attachments.
>>>>> --
>>>>> Thanks & Regards,
>>>>> Anil Gupta
>>> --
>>> Thanks & Regards,
>>> Anil Gupta
> -- 
> Thanks & Regards,
> Anil Gupta

View raw message