hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Weatherford <john.weatherf...@telescope.tv>
Subject Re: Coprocessor Increments
Date Mon, 14 Oct 2013 20:39:47 GMT
   Thanks for your explanation. It helped me understand why we are 
getting deadlocks as well. I (foolishly) thought of tables working on 
exclusive regions. In my mind I thought I had a DAG since Table 1 was 
always writing to table 2, and the graph never connected back or 
cycled. I failed to realized that although the tables were a DAG, the 
regions obviously were not.


On Mon 14 Oct 2013 09:36:09 AM PDT, Tom Brown wrote:
> If you allow coprocessors to connect to any other server the connections
> between the HBase nodes can be represented as a directed graph. This means
> that deadlock is possible (when N1 is blocked waiting for N2, and N2 is
> blocked waiting for N1). In order to guarantee that no deadlock will occur,
> you have to be able to represent the connections as a directed acyclic
> graph. And to do this, you cannot have nodes connecting directly to each
> other.
> You could also achieve this by carefully assigning your regions. If the
> primary table (PT) is located exclusively on nodes N1..N3, and the
> secondary/index table is located exclusively on nodes N4..N6, (assuming
> updates to the secondary/index table never change the PT) the connections
> would remain a DAG and deadlock would be impossible.
> You're right that by putting the business logic into the coprocessor you
> gain the ability to easily allow any group to access your cluster. But that
> access isn't free. To use a SQL analogy: large organizations always protect
> their SQL servers with a DBA. They do this because the potential downsides
> of allowing unsupervised and unstructured access are too great.
> --Tom
> On Mon, Oct 14, 2013 at 8:50 AM, Michael Segel <msegel_hadoop@hotmail.com>wrote:
>> Anil,
>> I wasn't suggesting that you can't do what you're doing, but you end up
>> running in to the risks which coprocessors are supposed to remove. The
>> standard YMMV always applies.
>> You have a cluster… another team in your company wants to use the cluster.
>> So instead of the cluster being a single resource for your app/team, it now
>> becomes a shared resource. So now you have people accessing HBase for
>> multiple apps.
>> You could then run multiple HBase HMasters with different locations for
>> files, however… this can get messy.
>> HOYA seems to suggest this as the future.  If so, then you have to wonder
>> about data locality.
>> Having your app update the primary table and then the secondary index is
>> always a good fallback, however you need to ensure that you understand the
>> risks.
>> With respect to secondary indexes… if you decouple the writes… you can get
>> better throughput. Note that the code becomes a bit more complex because
>> you're going to have to introduce a couple of different things.  But thats
>> something for a different discussion…
>> On Oct 13, 2013, at 10:15 AM, anil gupta <anilgupta84@gmail.com> wrote:
>>> Inline.
>>> On Sun, Oct 13, 2013 at 6:02 AM, Michael Segel <
>> msegel_hadoop@hotmail.com>wrote:
>>>> Ok…
>>>> Sure you can have your app update the secondary index table.
>>>> The only issue with that is if someone updates the base table outside of
>>>> your app,
>>>> they may or may not increment the secondary index.
>>> Anil: We dont allow people to write data into HBase from their own HBase
>>> client. We control the writes into HBase. So, we dont have the problem of
>>> secondary index not getting written.
>>> For example, If you expose a restful web service you can easily control
>> the
>>> writes to HBase. Even, if user requests to write one row in "main table",
>>> you application can have the logic to writing in "Secondary index"
>> tables.
>>> In this way, it is transparent to users also. You can add/remove seconday
>>> indexes as you want.
>>>> Note that your secondary index doesn't have to be an inverted table, but
>>>> could be SOLR, LUCENE or something else.
>>> Anil:As of now, we are happy with Inverted tables as they fit to our use
>>> case.
>>>> So you really want to secondary indexes on the server.
>>>> There are a couple of things that could improve the performance,
>> although
>>>> the write to the secondary index would most likely lag under heavy load.
>>>> On Oct 12, 2013, at 11:27 PM, anil gupta <anilgupta84@gmail.com> wrote:
>>>>> John,
>>>>> My 2 cents:
>>>>> I tried implementing Secondary Index by using Region Observers on Put.
>> It
>>>>> works well under low load. But, under heavy load the RO could not keep
>> up
>>>>> with load cross region server writes.
>>>>> Then, i decided not to use RO as per Andrew's explanation and  I moved
>>>> all
>>>>> the logic of building secondary index tables on my HBase Client . Since
>>>>> then, the system has been running fine under heavy load.
>>>>> IMO, if you will use RO and do cross RS read/write then perhaps this
>> will
>>>>> become your bottleneck in HBase.
>>>>> Is it possible for you to avoid RO and control the writes/updates from
>>>>> client side?
>>>>> Thanks,
>>>>> Anil Gupta
>>>>> On Fri, Oct 11, 2013 at 6:06 PM, John Weatherford <
>>>>> john.weatherford@telescope.tv> wrote:
>>>>>> OP Here :)
>>>>>> Our current design involves a Region Observer on a table that does
>>>>>> increments on a second table. We took the approach that Michael said
>> and
>>>>>> inside the RO, we got a new connection and everything. We believe
>>>> is
>>>>>> causing deadlocks for us. Our next attempt is going to be writing
>>>>>> another row in the same table where we will store the increments.
>>>> this
>>>>>> doesn't work, we are going to simply pull the increments out of the
>>>> and
>>>>>> do them in the application or in Flume.
>>>>>> @Tom Brown
>>>>>> I would be very interested to hear more about your solution of
>>>>>> aggregating the increments in another system that is then responsible
>>>> for
>>>>>> updating in Hbase.
>>>>>> -jW
>>>>>> On Fri 11 Oct 2013 10:26:58 AM PDT, Vladimir Rodionov wrote:
>>>>>>> With respect to the OP's design… does the deadlock occur because
>>>>>>>>> trying to update a column in a different row within the
same table?
>>>>>>> Because he is trying to update *row* in a different Region (and
>>>>>>> potentially in different RS).
>>>>>>> Best regards,
>>>>>>> Vladimir Rodionov
>>>>>>> Principal Platform Engineer
>>>>>>> Carrier IQ, www.carrieriq.com
>>>>>>> e-mail: vrodionov@carrieriq.com
>>>>>>> ______________________________**__________
>>>>>>> From: Michael Segel [msegel_hadoop@hotmail.com]
>>>>>>> Sent: Friday, October 11, 2013 9:10 AM
>>>>>>> To: user@hbase.apache.org
>>>>>>> Cc: Vladimir Rodionov
>>>>>>> Subject: Re: Coprocessor Increments
>>>>>>> Confidentiality Notice:  The information contained in this message,
>>>>>>> including any attachments hereto, may be confidential and is
>>>> to be
>>>>>>> read only by the individual or entity to whom this message is
>>>> addressed. If
>>>>>>> the reader of this message is not the intended recipient or an
>> or
>>>>>>> designee of the intended recipient, please note that any review,
>>>>>>> disclosure or distribution of this message or its attachments,
in any
>>>> form,
>>>>>>> is strictly prohibited.  If you have received this message in
>>>> please
>>>>>>> immediately notify the sender and/or Notifications@carrieriq.com
>>>>>>> delete or destroy any copy of this message and its attachments.
>>>>> --
>>>>> Thanks & Regards,
>>>>> Anil Gupta
>>> --
>>> Thanks & Regards,
>>> Anil Gupta

View raw message