lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Steward <>
Subject Re: lucene code changes
Date Tue, 19 May 2009 14:14:52 GMT
 I have a need to implement an custom inverted index in Lucene.
have files like the ones I have attached here. The Files have words and
and scores assigned to that word. There will 100's of such files. Each
file will have atleast 50000 such name value pairs. 

Note: Currently the file only shows 10s of such name value pairs. But
My real production data will have 50000 plus name value pairs in file.

I index the data using Lucene's Inverted Index. The query that is being
execute against the Index has 100 Words. When the query is excuted
against the index the result is returned in 100 milli seconds or so. 

Problem: Once i have the results of the query, I have to go
through each file (for ex. attached file one). Then for each word in
the user input query, I have to compute the total score. Doing this
against 100's of files and 100's of keywords is causing the score
computation to be slow i.e. about 3-5seconds. 

I need help resolving the above problem so that score computation takes less than 200Milli
Seconds or so.
One Resolution I was thinking is modifying the Lucene Source Code
for creating inverted index. In this index we store the score in the
index itself. When the results of the query are returned, we will get
the scores along with the file names, there by eleminating the need to
search the file for the keyword and corresponding score. I need to
compute the total of all scores that belong to one single file.

I am also open to any other ideas that you may have. Any suggestions regarding this will be
very helpful.



View raw message