lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Spencer, Dave" <d...@lumos.com>
Subject RE: Performance with 5 Millions indexed items
Date Tue, 10 Sep 2002 18:13:22 GMT
I have a 1GHz P4 w/ 512MB of RAM and prob a standard 7200 RPM disk.
Running w/ JDK1.4.

I have indexed the content from dmoz.org [maybe I should donate this as
a kind
of example] and the index size is 1GB and it has 3.2M docs in it. I
think it takes
around 4 hours to produce the index.

Briefly, for one quick test, a fuzzy 2 word search takes 10x as long as
the same search unfuzzy.

	Searching for: title:kasparov
	35 total matching documents after 1232(ms)

	Searching for: title:kasparov title:chess
	1046 total matching documents after 1272(ms)

	Searching for: title:kasparov~ title:chess~
	18965 total matching documents after 11276(ms)

As an aside, you can get the dmoz.org content here:
http://dmoz.org/rdf.html
I indexed "content.rdf.u8.gz".
It is invalid xml(!) and I couldn't get several SAX parsers to work so I
had
to use Electric XML. 



-----Original Message-----
From: Mader, Volker [mailto:VMader@heiler.com]
Sent: Tuesday, September 10, 2002 12:00 AM
To: lucene-user@jakarta.apache.org
Subject: Performance with 5 Millions indexed items


Hi,

I've got a question about performance with "bigger" indexes. We used
IndexWriter with GermanAnalyzer to index data with the following fields:

Field1: ID (a long value)
Field2: Description (a free text)
Field3: Groups (a list of up to 10 long values encoded in a single
string)
Field4: Classes (a list of up to 10 long values encoded in a single
string)

Documents are created with the 4 fields and then added to the
Indexwriter.
After all the index is optimized.

Searching now for a word in field "Description" using
IndexSearcher(GermanAnalyzer) with FuzzyQuery leads to search times up
to 30 seconds on a Pentium 4 1,4GHz.
Also the retrieval with hits.doc(..) is very slow.

Any ideas?

Volker

--
To unsubscribe, e-mail:
<mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail:
<mailto:lucene-user-help@jakarta.apache.org>



--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message