Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 63A10439F for ; Tue, 31 May 2011 18:57:41 +0000 (UTC) Received: (qmail 48220 invoked by uid 500); 31 May 2011 18:57:38 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 48176 invoked by uid 500); 31 May 2011 18:57:38 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 48168 invoked by uid 99); 31 May 2011 18:57:38 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 31 May 2011 18:57:38 +0000 X-ASF-Spam-Status: No, hits=1.1 required=5.0 tests=NO_RDNS_DOTCOM_HELO,RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [69.147.107.20] (HELO mrout1-b.corp.re1.yahoo.com) (69.147.107.20) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 31 May 2011 18:57:30 +0000 Received: from SP2-EX07CAS05.ds.corp.yahoo.com (sp2-ex07cas05.corp.sp2.yahoo.com [98.137.59.39]) by mrout1-b.corp.re1.yahoo.com (8.14.4/8.14.4/y.out) with ESMTP id p4VIuk7G015589 for ; Tue, 31 May 2011 11:56:46 -0700 (PDT) Received: from SP2-EX07VS03.ds.corp.yahoo.com ([98.137.59.32]) by SP2-EX07CAS05.ds.corp.yahoo.com ([98.137.59.39]) with mapi; Tue, 31 May 2011 11:56:46 -0700 From: Matthew Foley To: "common-user@hadoop.apache.org" CC: Matthew Foley Date: Tue, 31 May 2011 11:56:44 -0700 Subject: Re: trying to select technology Thread-Topic: trying to select technology Thread-Index: AcwfxH0NqrdPK83bTuW4Kn9ZgRD+Xw== Message-ID: References: <31743063.post@talk.nabble.com> In-Reply-To: <31743063.post@talk.nabble.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Sounds like you're looking for a full-text inverted index. Lucene is a goo= d opensource implementation of that. I believe it has an option for storin= g the original full text as well as the indexes. --Matt On May 31, 2011, at 10:50 AM, cs230 wrote: Hello All, I am planning to start project where I have to do extensive storage of xml and text files. On top of that I have to implement efficient algorithm for searching over thousands or millions of files, and also do some indexes to make search faster next time.=20 I looked into Oracle database but it delivers very poor result. Can I use Hadoop for this? Which Hadoop project would be best fit for this?=20 Is there anything from Google I can use?=20 Thanks a lot in advance. --=20 View this message in context: http://old.nabble.com/trying-to-select-techno= logy-tp31743063p31743063.html Sent from the Hadoop core-user mailing list archive at Nabble.com.