lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jochen" <>
Subject RE: Lucene Optimized Query Broken?
Date Thu, 08 Jan 2004 17:10:17 GMT

	Could you share some details of the implementation, and performance
of the relational data store you implemented? I would be especially
interested in the DB design. How does a very large number of documents
affect your performance and DB size (as you hinted in other mail of yours)?

	Do you think it is worth the effort even if the indexes do not
change frequently (i.e. only increase in size over time)?


> -----Original Message-----
> From: Robert Engels []
> Sent: Wednesday, January 07, 2004 9:13 AM
> To: Lucene Developers List
> Subject: RE: Lucene Optimized Query Broken?
------ snip -----
> I do have an Lucene IndexReader & IndexWriter implementation that uses a
> relational datastore, and it is extremely fast, in many ways much faster
> than Lucene's file system based indexing, especially for indexes that
> change
> frequently.
> This is the last holdup, on having a truely lightening fast search system
> in
> the relational store.
> It sounds like your proposed changes ill work. If you need any assistance
> in
> debugging, etc. please let  me know.
> Robert
> -----Original Message-----
> From: Doug Cutting []
> Sent: Wednesday, January 07, 2004 10:56 AM
> To: Lucene Developers List
> Subject: Re: Lucene Optimized Query Broken?
> Robert Engels wrote:
> > I have a index with documents that have only 2 fields, the first
> (unique)
> is
> > 'very unique', in that most document have at least somewhat varying
> terms,
> > the second is a boolean that contains only (boolean) 'true' or 'false'.
> The
> > index contains 100,000,000+ documents.
> >
> > If I perform the following search "+unique:somevalue +boolean:true',
> lucene
> > with search on the first term, returning very few documents, but then it
> > will search the second term, returning possibly a million+ documents,
> then
> > it will intersect the list, return 'hits' of only a few documents.
> First, this is not the sort of query that Lucene is designed to
> efficiently handle.  Rather, this is the sort of thing that a relational
> database is desgined for.  Lucene is primarily designed to support text
> searching, where field values are natural language text and query terms
> are words describing a user's interest.  You can implement full text
> search with a relational database, but it will be slow.  Similarly, you
> can search tabular data with Lucene, but it may be slow.
> That said, I'm currently working on an optimization that will make such
> queries substantially faster in Lucene.  The heart of it is to add data
> to the index so that TermDocs.skipTo() is much faster.  Then the search
> algorithms are modified to call TermDocs.skipTo().  This should make
> conjunctive queries (ANDs and phrases) significantly faster when one
> term occurs much less frequently than others.  I hope to check this in
> in the next week or so.
> Doug
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message