lucene-lucene-net-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Digy" <digyd...@gmail.com>
Subject RE: lucene performance questions
Date Tue, 18 May 2010 20:20:44 GMT
Whether you tokenize them or not, there shouldn't be any performance change.
(ignoring the parsing of a few words of user's query)
Is this some kind of XY problem
(http://dictionary.babylon.com/xy%20problem/)

DIGY



-----Original Message-----
From: Ravi Patel [mailto:rpatel4@live.com] 
Sent: Tuesday, May 18, 2010 4:34 PM
To: lucene-net-dev@lucene.apache.org
Subject: lucene performance questions


 

I have a bunch of fields that have single values such as "date", "id",
"flagged"

 

I've noticed that if I Index Tokenize them, my queries are much faster than
if they are Untokenized.


In My query, I'm using a BooleanQuery or RangeFilter/Query and
querying/sorting/filterling based on these values.

Example uses:

SortField minuteSort = new SortField("date", SortField.STRING, reverse);

filter = new RangeFilter("id", lowerId, upperId, false, false);

booleanQuery.Add(new TermQuery(new Term("flagged", "true")),
BooleanClause.Occur.MUST_NOT);

 

Two Questions:

1.  Is there a cost at search-time in making fields Tokenized that don't
need to be?  I assume there's a cost at Index time, but I'm not too worried
about the Index cost.

2.  Should fields that are used in my 3 example lines above by Tokenized?
If not, why am I seeing a huge performance difference when they are
UnTokenized?  I'm really not running any queries that require some sort of
analysis on these fields other than that they are indexed as-s
 		 	   		  
_________________________________________________________________
The New Busy think 9 to 5 is a cute idea. Combine multiple calendars with
Hotmail. 
http://www.windowslive.com/campaign/thenewbusy?tile=multicalendar&ocid=PID28
326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_5


Mime
View raw message