lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "luan xl" <helix_...@hotmail.com>
Subject A problem on performance
Date Sat, 26 Aug 2006 00:58:15 GMT
I have got nearly 4 million chinese documents, each size ranges from 1k - 
300k. So I use
org.apache.lucene.analysis.cn.ChineseAnalyzer as the analyzer for the text. 
The index have
four fields: 

content - tokenized not stored
title - tokenized and stored
path - stored only
date - stored only

For some reason, I divide these documents into 12 sets and use 
IndexSearcher over 
MultiReader for search. For all the english query, the speed is very fast, 
only cost about
10-100ms. But when I use the Chinese words for query, the situation is a 
bit confused:
If the word is only one char, so the Query is actually a TermQuery, the 
speed is very fast.
however, If the word is more than one char, the Query is actually a 
PhraseQuery with slop 0,
IndexSearcher usually cost 3000-5000ms to return the Hits.

I have also tested with the QueryParser and get the same results, and my 
environment is a
Dell PE2600 2G*2 Xeon, 2GRAM, 10000R/s SCSI, Debian/sarge, Sun JDK 1.5 + 
lucene 2.0.0

thanks.



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message