lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Engels" <reng...@ix.netcom.com>
Subject RE: Lucene Optimized Query Broken?
Date Sat, 10 Jan 2004 01:15:08 GMT
My goof, the numbers are actually much better (I had the SQL Server Profiler
ON...)

Find documents for single term returning 21,000 hits: 350ms
Find documents for single term returning 100 hits: 0ms (immeasurable)


-----Original Message-----
From: Robert Engels [mailto:rengels@ix.netcom.com]
Sent: Friday, January 09, 2004 6:32 PM
To: Jochen
Subject: RE: Lucene Optimized Query Broken?


Some performance numbers:

Number of Fields Per Document: 3 (external unique id, date, tokenized text)
Number of Unique Terms: 67,972
Number of Document Terms: 2,800,000
Time to index 225,000 documents: 20 mins.

Incremental Indexing: depends on number of terms, but < 5ms per term

Search Performance:

Find documents for single term returning 13,000 hits: 500ms
Find documents for single term returning 2000 hits: 10ms
Find documents for 2 terms containing 2 hits: 90ms

Obviously these numbers are highly dependent on the hardware, especially the
DBMS. I do not have enough machines to ideally test the performance -
separate dbms, app server, client.

For reference purposes:

DBMS running SQL Server 2000, 1.5 GHZ Pentium IV, 784 mb memory
Application Server, and Client on same machine, Pentium III 500mhz laptop,
256 mb.

If I move the app server to the dbms, the searches are much faster, but the
incremental updates are slower, since the dbms commit processing competes
with app server (IndexWriter) code.

This is really the first pass, and I expect to be able to perform more
optimizations, especially when Lucene supports the skipTo() for document
terms.

Robert Engels

-----Original Message-----
From: Jochen [mailto:lucenelist@quontis.com]
Sent: Thursday, January 08, 2004 11:10 AM
To: 'Lucene Developers List'; rengels@ix.netcom.com
Subject: RE: Lucene Optimized Query Broken?


Robert,

	Could you share some details of the implementation, and performance
of the relational data store you implemented? I would be especially
interested in the DB design. How does a very large number of documents
affect your performance and DB size (as you hinted in other mail of yours)?

	Do you think it is worth the effort even if the indexes do not
change frequently (i.e. only increase in size over time)?

	Best,
		Jochen

> -----Original Message-----
> From: Robert Engels [mailto:rengels@ix.netcom.com]
> Sent: Wednesday, January 07, 2004 9:13 AM
> To: Lucene Developers List
> Subject: RE: Lucene Optimized Query Broken?
>
------ snip -----
>
> I do have an Lucene IndexReader & IndexWriter implementation that uses a
> relational datastore, and it is extremely fast, in many ways much faster
> than Lucene's file system based indexing, especially for indexes that
> change
> frequently.
>
> This is the last holdup, on having a truely lightening fast search system
> in
> the relational store.
>
> It sounds like your proposed changes ill work. If you need any assistance
> in
> debugging, etc. please let  me know.
>
> Robert
>
> -----Original Message-----
> From: Doug Cutting [mailto:cutting@apache.org]
> Sent: Wednesday, January 07, 2004 10:56 AM
> To: Lucene Developers List
> Subject: Re: Lucene Optimized Query Broken?
>
>
> Robert Engels wrote:
> > I have a index with documents that have only 2 fields, the first
> (unique)
> is
> > 'very unique', in that most document have at least somewhat varying
> terms,
> > the second is a boolean that contains only (boolean) 'true' or 'false'.
> The
> > index contains 100,000,000+ documents.
> >
> > If I perform the following search "+unique:somevalue +boolean:true',
> lucene
> > with search on the first term, returning very few documents, but then it
> > will search the second term, returning possibly a million+ documents,
> then
> > it will intersect the list, return 'hits' of only a few documents.
>
> First, this is not the sort of query that Lucene is designed to
> efficiently handle.  Rather, this is the sort of thing that a relational
> database is desgined for.  Lucene is primarily designed to support text
> searching, where field values are natural language text and query terms
> are words describing a user's interest.  You can implement full text
> search with a relational database, but it will be slow.  Similarly, you
> can search tabular data with Lucene, but it may be slow.
>
> That said, I'm currently working on an optimization that will make such
> queries substantially faster in Lucene.  The heart of it is to add data
> to the index so that TermDocs.skipTo() is much faster.  Then the search
> algorithms are modified to call TermDocs.skipTo().  This should make
> conjunctive queries (ANDs and phrases) significantly faster when one
> term occurs much less frequently than others.  I hope to check this in
> in the next week or so.
>
> Doug
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message