Return-Path: Delivered-To: apmail-hbase-user-archive@www.apache.org Received: (qmail 18004 invoked from network); 22 Jun 2010 07:50:47 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 22 Jun 2010 07:50:47 -0000 Received: (qmail 33058 invoked by uid 500); 22 Jun 2010 07:50:47 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 32624 invoked by uid 500); 22 Jun 2010 07:50:44 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 32607 invoked by uid 99); 22 Jun 2010 07:50:43 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 22 Jun 2010 07:50:43 +0000 X-ASF-Spam-Status: No, hits=1.9 required=10.0 tests=AWL,FREEMAIL_FROM,HTML_MESSAGE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of m.vijayaraghava@gmail.com designates 209.85.214.169 as permitted sender) Received: from [209.85.214.169] (HELO mail-iw0-f169.google.com) (209.85.214.169) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 22 Jun 2010 07:50:37 +0000 Received: by iwn39 with SMTP id 39so2695035iwn.14 for ; Tue, 22 Jun 2010 00:50:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:mime-version:received:from:date :message-id:subject:to:content-type; bh=Pb7w/XpsXWCXr8n8hRp/OixFKAlEIMV2H0SmWZBHxzI=; b=BDUf5p8R66/HXGolhtKadiB7gdLNY0e6FcxK728iWQUFpTo/ePDkRlF+utHttocFry ZtbgGaqBIsYTy6cbRvYz6gqnFHXct8ZiMfF6vA8G5QlOQLYhb5oVkHOvdpk+TTV+EEOZ ZFhqkt6in3C7YVUGCk+8rJzreEzldmNj2eX6M= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:from:date:message-id:subject:to:content-type; b=F0mr9LXGQK1FUvdyJbjVoHCsYPumdQZNblH5os3aP/4mU9BS0MdAuOIIq4rBmQfgT+ 1xw3h9J3H9MHyYtN7k8ToZea2AN014MsspapvFdRa1ISXVNHPyrKXK+UxNBoPzXiMMDY 5l+huzTLZpUm71VTvnCPGhK5fyXUDhCyU/SAc= Received: by 10.231.111.209 with SMTP id t17mr6538990ibp.182.1277193016212; Tue, 22 Jun 2010 00:50:16 -0700 (PDT) MIME-Version: 1.0 Received: by 10.231.173.14 with HTTP; Tue, 22 Jun 2010 00:49:56 -0700 (PDT) From: Raghava Mutharaju Date: Tue, 22 Jun 2010 03:49:56 -0400 Message-ID: Subject: multiple reads from a Map - optimization question To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=0016e649c74c0a7c52048999ac42 --0016e649c74c0a7c52048999ac42 Content-Type: text/plain; charset=ISO-8859-1 Hello all, In the data, I have to check for multiple conditions and then work with the data that satisfies all the conditions. I am doing this as an MR job with no reduce and the conditions are translated to a set of filters. Among the multiple conditions (2 or 3 max), data that satisfies one of them would come as input to the Map (initial filter is set in the scan to the mappers). Now, from among the dataset that comes through to each map, I would check for other conditions (1 or 2 remaining conditions). Since map() is called for each row of data, it would mean 1 or 2 read calls (with filter) to HBase tables. This setup, even for small data (data would fit in a region and so only 1 map is taking in all the data) is very slow. Here, note that, I shouldn't be filtering the incoming data to map but based on that data, next set of filtering conditions would be formed. Can this be improved? Would constructing secondary indexes help (would need a dramatic improvement actually)? Or is this type of problem not suitable for HBase? Thank you. Regards, Raghava. --0016e649c74c0a7c52048999ac42--