Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E5817498D for ; Fri, 17 Jun 2011 20:22:18 +0000 (UTC) Received: (qmail 39731 invoked by uid 500); 17 Jun 2011 20:22:17 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 39705 invoked by uid 500); 17 Jun 2011 20:22:17 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 39697 invoked by uid 99); 17 Jun 2011 20:22:17 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 17 Jun 2011 20:22:17 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=5.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of dean.hiller@broadridge.com designates 64.18.2.159 as permitted sender) Received: from [64.18.2.159] (HELO exprod7og103.obsmtp.com) (64.18.2.159) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 17 Jun 2011 20:22:11 +0000 Received: from gwm.broadridge.com ([167.212.2.180]) (using TLSv1) by exprod7ob103.postini.com ([64.18.6.12]) with SMTP ID DSNKTfu3XoLxkgfsRfEdQ/l4Q/grW1WbI+aM@postini.com; Fri, 17 Jun 2011 13:21:51 PDT Received: from jsppsldlpsi02.broadridge.net ([10.98.54.12]) by gwm.broadridge.com (8.13.8/8.13.8) with ESMTP id p5HKLnJj925900 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Fri, 17 Jun 2011 16:21:50 -0400 Received: from JSIPSWEXHA01.bsg.ad.adp.com (jsipswexha01.bsg.ad.adp.com [10.17.69.24]) by jsppsldlpsi02.broadridge.net (RSA Interceptor) for ; Fri, 17 Jun 2011 16:21:39 -0400 Received: from jscpcwexmaa1.bsg.ad.adp.com ([fe80::882f:16e8:308d:3ef]) by JSIPSWEXHA01.bsg.ad.adp.com ([::1]) with mapi; Fri, 17 Jun 2011 16:21:33 -0400 From: "Hiller, Dean x66079" To: "user@hbase.apache.org" Date: Fri, 17 Jun 2011 16:21:31 -0400 Subject: RE: What's the best approach to search in HBase? Thread-Topic: What's the best approach to search in HBase? Thread-Index: Acwr20XGGOCQVaNgTFqN3AuAforZKQBULYKg Message-ID: <08230D4C8E666D479F6DE495A7684DCD0ABA957D@JSCPCWEXMAA1.bsg.ad.adp.com> References: <2D6136772A13B84E95DF6DA79E85A9F00142F40132F1@NSPEXMBX-A.the-lab.llnl.gov> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-RSA-Inspected: yes X-RSA-Classifications: public What about using Hbasene....is it pretty good....looks just like a distribu= ted Lucene and the same api and everything? Later, Dean -----Original Message----- From: Mark Kerzner [mailto:markkerzner@gmail.com]=20 Sent: Wednesday, June 15, 2011 10:10 PM To: user@hbase.apache.org Subject: Re: What's the best approach to search in HBase? Thank you, everybody. I summarized your advice here, http://shmsoft.blogspot.com/2011/06/search-in-ediscovery.html, because I need it for my open source eDiscovery, and now just need to try it all :) Sincerely, Mark On Mon, Jun 6, 2011 at 11:18 AM, Buttler, David wrote: > I store over 500M documents in HBase, and index using Solr with dynamic > fields. This gives you tremendous flexibility to do the type of queries = you > are looking for -- and to make them simple and intuitive via a faceted > interface. > > However, there was quite a bit of software that we had to write to get > things going, and I can neither release all of it open source, or support > other people using it. If I had to start again, I would seriously look a= t > solutions like elastic search and lily. > > Dave > > -----Original Message----- > From: Mark Kerzner [mailto:markkerzner@gmail.com] > Sent: Friday, June 03, 2011 5:57 PM > To: HBase Discussion Group > Subject: What's the best approach to search in HBase? > > Hi, > > I need to store, say, 10M-100M documents, with each document having say 1= 00 > fields, like author, creation date, access date, etc., and then I want to > ask questions like > > give me all documents whose author is like abc**, and creation date any > time > in 2010 and access date in 2010-2011, and so on, perhaps 10-20 conditions= , > matching a list of some keywords. > > What's best, Lucene, Katta, HBase CF with secondary indices, or plain sca= n > and compare of every record? > > Thanks a bunch! > > Mark > This message and any attachments are intended only for the use of the add= ressee and may contain information that is privileged and confidential. If the reade= r of the = message is not the intended recipient or an authorized representative of = the intended recipient, you are hereby notified that any dissemination of thi= s communication is strictly prohibited. If you have received this communica= tion in error, please notify us immediately by e-mail and delete the message and = any attachments from your system. =0D