Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id EE81F4660 for ; Tue, 31 May 2011 19:00:20 +0000 (UTC) Received: (qmail 58215 invoked by uid 500); 31 May 2011 19:00:17 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 58143 invoked by uid 500); 31 May 2011 19:00:17 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 58135 invoked by uid 99); 31 May 2011 19:00:17 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 31 May 2011 19:00:17 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.220.176] (HELO mail-vx0-f176.google.com) (209.85.220.176) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 31 May 2011 19:00:10 +0000 Received: by vxa37 with SMTP id 37so5582869vxa.35 for ; Tue, 31 May 2011 11:59:49 -0700 (PDT) Received: by 10.52.90.243 with SMTP id bz19mr882856vdb.272.1306868389070; Tue, 31 May 2011 11:59:49 -0700 (PDT) MIME-Version: 1.0 Received: by 10.52.110.134 with HTTP; Tue, 31 May 2011 11:59:29 -0700 (PDT) X-Originating-IP: [64.105.168.204] In-Reply-To: References: <31743063.post@talk.nabble.com> From: Ted Dunning Date: Tue, 31 May 2011 11:59:29 -0700 Message-ID: Subject: Re: trying to select technology To: common-user@hadoop.apache.org Cc: Matthew Foley Content-Type: multipart/alternative; boundary=20cf307c9aa419466004a497021c X-Virus-Checked: Checked by ClamAV on apache.org --20cf307c9aa419466004a497021c Content-Type: text/plain; charset=ISO-8859-1 To pile on, thousands or millions of documents are well within the range that is well addressed by Lucene. Solr may be an even better option than bare Lucene since it handles lots of the boilerplate problems like document parsing and index update scheduling. On Tue, May 31, 2011 at 11:56 AM, Matthew Foley wrote: > Sounds like you're looking for a full-text inverted index. Lucene is a > good opensource implementation of that. I believe it has an option for > storing the original full text as well as the indexes. > --Matt > > On May 31, 2011, at 10:50 AM, cs230 wrote: > > > Hello All, > > I am planning to start project where I have to do extensive storage of xml > and text files. On top of that I have to implement efficient algorithm for > searching over thousands or millions of files, and also do some indexes to > make search faster next time. > > I looked into Oracle database but it delivers very poor result. Can I use > Hadoop for this? Which Hadoop project would be best fit for this? > > Is there anything from Google I can use? > > Thanks a lot in advance. > -- > View this message in context: > http://old.nabble.com/trying-to-select-technology-tp31743063p31743063.html > Sent from the Hadoop core-user mailing list archive at Nabble.com. > > > --20cf307c9aa419466004a497021c--