Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A0D3110931 for ; Sun, 11 Aug 2013 21:17:53 +0000 (UTC) Received: (qmail 5878 invoked by uid 500); 11 Aug 2013 21:17:51 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 5766 invoked by uid 500); 11 Aug 2013 21:17:50 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 5758 invoked by uid 99); 11 Aug 2013 21:17:50 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 11 Aug 2013 21:17:50 +0000 X-ASF-Spam-Status: No, hits=2.0 required=5.0 tests=SPF_NEUTRAL,URI_HEX X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: 216.139.236.26 is neither permitted nor denied by domain of ccalugaru@sdl.com) Received: from [216.139.236.26] (HELO sam.nabble.com) (216.139.236.26) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 11 Aug 2013 21:17:45 +0000 Received: from ben.nabble.com ([192.168.236.152]) by sam.nabble.com with esmtp (Exim 4.72) (envelope-from ) id 1V8d0u-00073A-0L for user@hbase.apache.org; Sun, 11 Aug 2013 14:17:24 -0700 Date: Sun, 11 Aug 2013 14:17:24 -0700 (PDT) From: ccalugaru To: user@hbase.apache.org Message-ID: <1376255843934-4049091.post@n3.nabble.com> Subject: Hbase update use case MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Hi all, I have the following hbase use case: One Hbase table, with a row key (built with a combination of md5 hashes) and 2 column families. Logically, the table stores sentences. The table has hundreds of millions of records. I have a webapp that connects to this hbase table, and needs to randomly export sentences, based on some conditions. Currently, all these conditions can be looked-up just by using the rowkey. Typically, one export would contain just a couple of hundreds sentences. The important restriction is that once some segments are exported, they should not be present in any subsequent export. So my question is related to this - how should I make sure the same segments do not get exported again? Should I 'mark' the exported segments, by updating a flag, after each export happens? This has the drawback that, when looking at which segments meet my conditions, I wouldn't be able to use just the rowkey for identifying those records, but also that flag. Hence, I would need to use filters, which I know are way slower. Is there a better approach for this? -- View this message in context: http://apache-hbase.679495.n3.nabble.com/Hbase-update-use-case-tp4049091.html Sent from the HBase User mailing list archive at Nabble.com.