Return-Path: X-Original-To: apmail-hbase-dev-archive@www.apache.org Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DBE2210440 for ; Mon, 11 Nov 2013 20:23:20 +0000 (UTC) Received: (qmail 74657 invoked by uid 500); 11 Nov 2013 20:23:19 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 74603 invoked by uid 500); 11 Nov 2013 20:23:19 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 74594 invoked by uid 99); 11 Nov 2013 20:23:19 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 Nov 2013 20:23:19 +0000 Date: Mon, 11 Nov 2013 20:23:19 +0000 (UTC) From: "Andrew Purtell (JIRA)" To: dev@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Resolved] (HBASE-5982) HBase Coprocessor Local Aggregation MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-5982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-5982. ----------------------------------- Resolution: Incomplete Fix Version/s: (was: 0.92.1) Assignee: (was: dengpeng) No patch and no activity for a year, resolving as 'Incomplete' > HBase Coprocessor Local Aggregation > ----------------------------------- > > Key: HBASE-5982 > URL: https://issues.apache.org/jira/browse/HBASE-5982 > Project: HBase > Issue Type: Improvement > Components: Coprocessors > Affects Versions: 0.92.1 > Environment: cloudera-cdh3u3,hbase-0.92.1 > Reporter: dengpeng > Labels: Coprocessor > Original Estimate: 0.05h > Remaining Estimate: 0.05h > > In our application, we need to handle the following SQL-like process on hbase. There are very complex processes on each region, and the result of 'top #' from each region will be sent back to the coprocessor client in the current region-based endpoint framework. > Let's take the following SQL as an example. Suppose there are 100 regions in each RS and there are 100 RSs in the cluster, the client will receive 100*100*1M = 10G records from all the region, and then select top 1M records from 10G records. The client need much RAM to handle these data and the network of the cluster maybe the bottleneck. > If we have the RS-based endpoint, each RS will handle parts of result from its regions, the client will receive 100*1M = 0.1G records. The burden of the client and the network will dramatically reduced. > example: > select top 1000000 count(1) as A , sum(intRxlevDL)/count(intRxlevDL) as B , intBscPc as bscPc , intLac as LAC , intCI as CI from ftbMrMsg t1 where ( t1.dtTime >= '2012-03-02 04:00:00.000' and t1.dtTime < '2012-03-02 05:00:00.000' )group by bscPc , LAC , CI having B >= 0.2order by bscPc ASC , LAC ASC , CI ASC > So far, the network is a bottleneck in our application when using coprocessor to handle the above SQL. I think the RS-based Endpoint is worth doing, especially for the 'top #' process. What's your opinion about this? I think we can open a jira. -- This message was sent by Atlassian JIRA (v6.1#6144)