Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: common-user@hadoop.apache.org
Received-SPF: neutral (athena.apache.org: local policy)
From: Matthew Foley <mattf@yahoo-inc.com>
To: "common-user@hadoop.apache.org" <common-user@hadoop.apache.org>
CC: Matthew Foley <mattf@yahoo-inc.com>
Date: Tue, 31 May 2011 11:56:44 -0700
Subject: Re: trying to select technology
Thread-Topic: trying to select technology
Thread-Index: AcwfxH0NqrdPK83bTuW4Kn9ZgRD+Xw==
Message-ID: <B1A438AC-486B-47E6-9538-F08E55813EC8@yahoo-inc.com>
References: <31743063.post@talk.nabble.com>
In-Reply-To: <31743063.post@talk.nabble.com>
Accept-Language: en-US
Content-Language: en-US
acceptlanguage: en-US
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0

Sounds like you're looking for a full-text inverted index.  Lucene is a goo=
d opensource implementation of that.  I believe it has an option for storin=
g the original full text as well as the indexes.
--Matt

On May 31, 2011, at 10:50 AM, cs230 wrote:


Hello All,

I am planning to start project where I have to do extensive storage of xml
and text files. On top of that I have to implement efficient algorithm for
searching over thousands or millions of files, and also do some indexes to
make search faster next time.=20

I looked into Oracle database but it delivers very poor result. Can I use
Hadoop for this? Which Hadoop project would be best fit for this?=20

Is there anything from Google I can use?=20

Thanks a lot in advance.
--=20
View this message in context: http://old.nabble.com/trying-to-select-techno=
logy-tp31743063p31743063.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.