Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: common-user@hadoop.apache.org
Received-SPF: neutral (nike.apache.org: local policy)
MIME-Version: 1.0
In-Reply-To: <B1A438AC-486B-47E6-9538-F08E55813EC8@yahoo-inc.com>
References: <31743063.post@talk.nabble.com>
 <B1A438AC-486B-47E6-9538-F08E55813EC8@yahoo-inc.com>
From: Ted Dunning <tdunning@maprtech.com>
Date: Tue, 31 May 2011 11:59:29 -0700
Message-ID: <BANLkTi=h=G_E3MWY_FDk_yJRpC2_s31Nog@mail.gmail.com>
Subject: Re: trying to select technology
To: common-user@hadoop.apache.org
Cc: Matthew Foley <mattf@yahoo-inc.com>
Content-Type: multipart/alternative; boundary=20cf307c9aa419466004a497021c

--20cf307c9aa419466004a497021c
Content-Type: text/plain; charset=ISO-8859-1

To pile on, thousands or millions of documents are well within the range
that is well addressed by Lucene.

Solr may be an even better option than bare Lucene since it handles lots of
the boilerplate problems like document parsing and index update scheduling.

On Tue, May 31, 2011 at 11:56 AM, Matthew Foley <mattf@yahoo-inc.com> wrote:

> Sounds like you're looking for a full-text inverted index.  Lucene is a
> good opensource implementation of that.  I believe it has an option for
> storing the original full text as well as the indexes.
> --Matt
>
> On May 31, 2011, at 10:50 AM, cs230 wrote:
>
>
> Hello All,
>
> I am planning to start project where I have to do extensive storage of xml
> and text files. On top of that I have to implement efficient algorithm for
> searching over thousands or millions of files, and also do some indexes to
> make search faster next time.
>
> I looked into Oracle database but it delivers very poor result. Can I use
> Hadoop for this? Which Hadoop project would be best fit for this?
>
> Is there anything from Google I can use?
>
> Thanks a lot in advance.
> --
> View this message in context:
> http://old.nabble.com/trying-to-select-technology-tp31743063p31743063.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>
>

--20cf307c9aa419466004a497021c--