Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 03E839599 for ; Wed, 16 May 2012 21:41:21 +0000 (UTC) Received: (qmail 32428 invoked by uid 500); 16 May 2012 21:41:19 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 32361 invoked by uid 500); 16 May 2012 21:41:19 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 32351 invoked by uid 99); 16 May 2012 21:41:19 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 May 2012 21:41:18 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW X-Spam-Check-By: apache.org Received-SPF: unknown (athena.apache.org: error in processing during lookup of dave@urbanairship.com) Received: from [209.85.214.169] (HELO mail-ob0-f169.google.com) (209.85.214.169) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 May 2012 21:41:13 +0000 Received: by obbwd18 with SMTP id wd18so2242413obb.14 for ; Wed, 16 May 2012 14:40:52 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:x-gm-message-state; bh=ioQN/T+gg7NEMLQ5YKj9+uTmvwY2OxoBqemZcFz1jms=; b=jxV7WN7tsAaxA4IYdoMmDR58cShqIzL21+MT6gzx9w3aP2PeBL0VDl3TYp8rqDxBIv 1PwFOC6y40KBE+BDZ3pUZk9yiCpYs4keCKFMlL2vRkTvrRGl8AIfPjfjFFQQCDkhUbCk uQTXQ1KRgp+0qsRiDVMQDI/V3aL9K2gh4TdH8lg0nIXfTt7QIoGpHV3DflGxZYgL+t7R rjDkZei70Tab+Jcf31M0g+P2WdpSRhRnebyoFALJHJJ4/Jl0OICP0kD8bISssZVeIPG/ 3cV/tG/x9RebYUUo30vGC6nosYNklhINgqCdYBy0TviyeFNuF5LfteTRo/1ovU65IK5C NetA== MIME-Version: 1.0 Received: by 10.182.141.9 with SMTP id rk9mr4336057obb.50.1337204452591; Wed, 16 May 2012 14:40:52 -0700 (PDT) Received: by 10.182.12.4 with HTTP; Wed, 16 May 2012 14:40:52 -0700 (PDT) In-Reply-To: References: Date: Wed, 16 May 2012 14:40:52 -0700 Message-ID: Subject: Re: EndPoint Coprocessor could be dealocked? From: Dave Revell To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=e89a8f6428ec63897404c02e2cf1 X-Gm-Message-State: ALoCoQlxfslA9/W7iXh7yiMc8+iLHHjO5EFn32qG5/z/jkspkMXs1olI9r+4h4NvSFcHvwGXgrk7 X-Virus-Checked: Checked by ClamAV on apache.org --e89a8f6428ec63897404c02e2cf1 Content-Type: text/plain; charset=ISO-8859-1 Many people will probably try to use coprocessors as a way of implementing app logic on top of HBase without the headaches of writing a daemon. Sometimes client-side approaches are inadvisable; for example, there may be several client languages/runtimes and the app logic should not be reimplemented in each. It's understandable that people wouldn't want to deal with setting up a daemon and RPC mechanism if they can piggyback on the existing HBase coprocessor mechanism. Are HBase coprocessors explicitly wrong for this use case if the app logic needs to access multiple regions in a single call? Cheers, Dave On Wed, May 16, 2012 at 12:07 PM, Michael Segel wrote: > > I think we need to look at the base problem that is trying to be solved. > > I mean the discussion on the RPC mechanism. but the problem that the OP is > trying to solve is how to use multiple indexes in a 'query'. > > Note: I put ' ' around query because its a m/r job or a single thread > where the user is trying to get a result set which is a significantly > smaller subset, using more than 1 index. > > So the idea is to do a quick get() against each index and the result would > be a list of row keys. The next step is to get the intersection(s) quickly > (which I proposed), and then you would just need to do a quick series of > get()s to pull back the list of rows. > > If I understand the OP's problem, its not a co-processor type of problem. > > Its one of where you submit a m/r job. Within your toolRunner, you would > actually do the fetches against the indexes and then build the ultimate > result set. then you just need a map job to take your result set as an > input. > > Drawback... if the list of rows is very, very long, you may run out of > memory. So you need to resolve that... > (Which is why I was suggesting on using a temp table and then you can use > the rows in the temp table as input in to your fetch... > > While not something I would use for 'real time' its something where I can > really shrink the number of rows you have to fetch for further processing. > So if your full table scan takes an hour, but we can do N get()s to get > the rows in the Index, find the intersection I and then do I.size() get()s > to fetch the data. This should take much less time. > > > Again, I don't see this in a coprocessor based solution, however, the N > get()s and intersection could be done at the start of the job, or could be > part of a Map only job. > > Kind of an interesting problem... but if anyone has a large set of data > and some time to play, you will end up solving a problem that you can' do > in an RDBMS easily. > > On May 16, 2012, at 1:17 PM, Andrew Purtell wrote: > > >> On May 16, 2012, at 1:12 AM, fding hbase wrote: > >>> But sadly, HBase ipc doesn't allow coprocessor chaining mechanism... > >>> Someone mentioned on > >>> > http://grokbase.com/t/hbase/user/116hrhhf8m/coprocessor-failure-question-and-examples > >>> : > >>> > >>> If a RegionObserver issues RPC to another table from any of the hooks > that > >>> are called out of RPC handlers (for Gets, Puts, Deletes, etc.), you > risk > >>> deadlock. Whatever activity you want to check should be in the same > >>> region as account data to avoid that. > >>> (Or HBase RPC needs to change.) > >>> > >>> So, that means, the deadlock is inevitable under current circumstance. > The > >>> coprocessors are still limited. > >>> > >>> What I'm seeking is possible extensions of coprocessors or workaround > for > >>> such situations that extra RPC is needed in the RPC handlers. > > > > This isn't a limitation, this is a design choice. Such extensions of > > coprocessors most likely won't happen. What a RegionObserver allows > > you to do is exactly this: Intercept and potentially modify lifecycle > > or user operations on that single region alone. If it helps, think of > > each region as its own independent database. > > > > If you need to take cross-region actions according to some user > > action, then you should be looking first at extending the client, not > > the server. > > > > Best regards, > > > > - Andy > > > > Problems worthy of attack prove their worth by hitting back. - Piet > > Hein (via Tom White) > > > > --e89a8f6428ec63897404c02e2cf1--