Return-Path: Delivered-To: apmail-hadoop-hbase-dev-archive@minotaur.apache.org Received: (qmail 76897 invoked from network); 2 Jul 2009 18:31:59 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 2 Jul 2009 18:31:59 -0000 Received: (qmail 27962 invoked by uid 500); 2 Jul 2009 18:32:10 -0000 Delivered-To: apmail-hadoop-hbase-dev-archive@hadoop.apache.org Received: (qmail 27689 invoked by uid 500); 2 Jul 2009 18:32:09 -0000 Mailing-List: contact hbase-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-dev@hadoop.apache.org Delivered-To: mailing list hbase-dev@hadoop.apache.org Received: (qmail 27679 invoked by uid 99); 2 Jul 2009 18:32:09 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Jul 2009 18:32:09 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Jul 2009 18:32:07 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 4BB0C234C1E6 for ; Thu, 2 Jul 2009 11:31:47 -0700 (PDT) Message-ID: <1805378928.1246559507309.JavaMail.jira@brutus> Date: Thu, 2 Jul 2009 11:31:47 -0700 (PDT) From: "Chris K Wensel (JIRA)" To: hbase-dev@hadoop.apache.org Subject: [jira] Commented: (HBASE-1605) TableInputFormat should support 'limit' In-Reply-To: <1155700141.1246554107626.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HBASE-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726607#action_12726607 ] Chris K Wensel commented on HBASE-1605: --------------------------------------- Good questions. In SQL, LIMIT returns the first N rows of the result set. and is typically used with OFFSET to allow pagination. In Cascading, the Limit Operation only allows each task to see N/M rows (accounting for remainders). no notion of OFFSET as limit in this case is really used for unit/integration testing or sampling. re HBase, you guys should choose a model that makes most sense for typical hbase consumer applications. but allowing for an even load across many mappers, but orthogonally limiting the total number of rows processed is what I'm after. having this work with a Filter would also be very nice. i.e. give me the 1k rows that satisfy this condition. but I guess if i want the first 1k rows that satisfy the filter, we might be limited to a single region (and single mapper as I see the code now). so maybe there are two modes. sample and result. sample returns 'random' N rows (top N/M from regions). result turns ordered N rows (from a region by virtue). anyways, just throwing that out there. current use case would be happy with either. though 'result' is probably the most useful coupled with HBASE-1172. > TableInputFormat should support 'limit' > --------------------------------------- > > Key: HBASE-1605 > URL: https://issues.apache.org/jira/browse/HBASE-1605 > Project: Hadoop HBase > Issue Type: Improvement > Components: mapred > Reporter: Chris K Wensel > > Would be useful if TableInputFormat could be passed a 'limit' property value that limited the total result set to the value of 'limit'. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.