lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John Evans" <j...@jpevans.com>
Subject Re: Using lucene as a database... good idea or bad idea?
Date Wed, 30 Jul 2008 17:47:48 GMT
Hi All,

Thanks for all of the feedback.  Largely as a result of the responses I've
received from the mailing list, Lucene has made it's way on to our short
list of possible solutions.  I'm not sure what the timeframe is for
implementing a prototype and testing it, but I will try to report back with
the results when/if it happens.

Thanks again,
-
John



On Tue, Jul 29, 2008 at 2:37 PM, Chris Lu <chris.lu@gmail.com> wrote:

> It surely is possible. AFAIK, LinkedIn use lucene to store some data.
>
> But, Lucene index in a sense is similar to database index. Both are data
> structures for a specialized and limited query execution path.
>
> So this depends on your applications' query, and how you create the lucene
> index. The normal usage you listed sounds reasonable.
> But you may also need to think about maintenance. In case the index is
> corrupted somehow, you may also consider store the data into database,
> which
> are more easier to manually manipulate.
>
> --
> Chris Lu
> -------------------------
> Instant Scalable Full-Text Search On Any Database/Application
> site: http://www.dbsight.net
> demo: http://search.dbsight.com
> Lucene Database Search in 3 minutes:
>
> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
> DBSight customer, a shopping comparison site, (anonymous per request) got
> 2.6 Million Euro funding!
>
> On Mon, Jul 28, 2008 at 6:53 PM, John Evans <john@jpevans.com> wrote:
>
> > Hi All,
> >
> > I have successfully used Lucene in the "tradtiional" way to provide
> > full-text search for various websites.  Now I am tasked with developing a
> > data-store to back a web crawler.  The crawler can be configured to
> > retrieve
> > arbitrary fields from arbitrary pages, so the result is that each
> document
> > may have a random assortment of fields.  It seems like Lucene may be a
> > natural fit for this scenario since you can obviously add arbitrary
> fields
> > to each document and you can store the actually data in the database.
> I've
> > done some research to make sure that it would meet all of our individual
> > requirements (that we can iterate over documents, update (delete/replace)
> > documents, etc.) and everything looks good.  I've also seen a couple of
> > references around the net to other people trying similar things...
> however,
> > I know it's not meant to be used this way, so I thought I would post here
> > and ask for guidance?  Has anyone done something similar?  Is there any
> > specific reason to think this is a bad idea?
> >
> > The one thing that I am least certain about his how well it will scale.
>  We
> > may reach the point where we have tens of millions of documents and a
> high
> > percentage of those documents may be relatively large (10k-50k each).  We
> > actually would NOT be expecting/needing Lucene's normal extreme fast text
> > search times for this, but we would need reasonable times for adding new
> > documents to the index, retrieving documents by ID (for iterating over
> all
> > documents), optimizing the index after a series of changes, etc.
> >
> > Any advice/input/theories anyone can contribute would be greatly
> > appreciated.
> >
> > Thanks,
> > -
> > John
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message