lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From yanyanzeng <luckyyanyan...@hotmail.com>
Subject bad index by batch indexing
Date Thu, 07 Aug 2008 00:39:25 GMT

Hi,
    I am building a search engine for text transcript documents from the
database of an enterprise messaging system,  and have designed a batch
processing job to incrementally build the index,because the database from
production is around huge, around 10G. 
   Now I am still testing in DEV environment, and have been puzzled by this
problem for a couple of days.  
If I build the index in one setting(because DEV database is very very
small),  the index is correct because I can get hits for my queries,  also, 
what luke shows looks fine,  4800 documents, 450 terms.
However, if I test building using my batch processing job,  I do get the
index which looks fine, but, when I search, it already returns 0 hits.  I
checked with Luke, which shows there are 5200 documents, 0 terms .
There is no exception or runtime error or anything abnormal during indexing
or searching,  I am really at a loss. 
The only difference between the two is that:   in the one setting approach, 
the whole index is built using the same indexwriter object.
in the batch approach,  an indexwriter object is opened per batch and closed
when the batch is finished. 
But,  I  think I have taken care of it by  
               IndexWriter  writer = new IndexWriter(FSDir, Analyser,
!FSdir.exists) 
 
Since lucene is designed for adding to exisiting index when the 3rd
parameter is false,   I do not understand where it went wrong.  
Should I have kept one singleton instance of the writer  until  all
documents in the database are processed, rather than opening &closing one
for each batch?     Or,  should I have kept a single instance of analyser?  
This does not seem necessary, but I really can not figure out where it went
wrong, and how come this strange behavior:  520 documents but 0 terms. 

I would be very grateful if anyone could advise.  THanks very much.

yanyan


-- 
View this message in context: http://www.nabble.com/bad-index-by-batch-indexing-tp18862037p18862037.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message