lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: bad index by batch indexing
Date Thu, 07 Aug 2008 02:14:03 GMT
Only one Writer should be active on an index at any given time. You  
don't want to batch unless you need to see the docs in the index as  
you build it. - it's slower overall. You can add from multiple threads  
at the same time, but use the same Writer.

Sent from my iPhone

On Aug 6, 2008, at 8:39 PM, yanyanzeng <luckyyanyan888@hotmail.com>  
wrote:

>
> Hi,
>    I am building a search engine for text transcript documents from  
> the
> database of an enterprise messaging system,  and have designed a batch
> processing job to incrementally build the index,because the database  
> from
> production is around huge, around 10G.
>   Now I am still testing in DEV environment, and have been puzzled  
> by this
> problem for a couple of days.
> If I build the index in one setting(because DEV database is very very
> small),  the index is correct because I can get hits for my  
> queries,  also,
> what luke shows looks fine,  4800 documents, 450 terms.
> However, if I test building using my batch processing job,  I do get  
> the
> index which looks fine, but, when I search, it already returns 0  
> hits.  I
> checked with Luke, which shows there are 5200 documents, 0 terms .
> There is no exception or runtime error or anything abnormal during  
> indexing
> or searching,  I am really at a loss.
> The only difference between the two is that:   in the one setting  
> approach,
> the whole index is built using the same indexwriter object.
> in the batch approach,  an indexwriter object is opened per batch  
> and closed
> when the batch is finished.
> But,  I  think I have taken care of it by
>               IndexWriter  writer = new IndexWriter(FSDir, Analyser,
> !FSdir.exists)
>
> Since lucene is designed for adding to exisiting index when the 3rd
> parameter is false,   I do not understand where it went wrong.
> Should I have kept one singleton instance of the writer  until  all
> documents in the database are processed, rather than opening  
> &closing one
> for each batch?     Or,  should I have kept a single instance of  
> analyser?
> This does not seem necessary, but I really can not figure out where  
> it went
> wrong, and how come this strange behavior:  520 documents but 0 terms.
>
> I would be very grateful if anyone could advise.  THanks very much.
>
> yanyan
>
>
> -- 
> View this message in context: http://www.nabble.com/bad-index-by-batch-indexing-tp18862037p18862037.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message