lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anshum <>
Subject Re: bad index by batch indexing
Date Thu, 07 Aug 2008 02:08:21 GMT
This really seems like an issue the batching mechanism (one of those errors
which seem trivial on discovery :) ). I work with batched indexing and it
works absolutely fine on data that is a lot higher in magnitude. You could
try calling the indexwriter without the 3rd argument and see if it helps.
Also, which version of lucene are you using?


On Thu, Aug 7, 2008 at 6:09 AM, yanyanzeng <>wrote:

> Hi,
>    I am building a search engine for text transcript documents from the
> database of an enterprise messaging system,  and have designed a batch
> processing job to incrementally build the index,because the database from
> production is around huge, around 10G.
>   Now I am still testing in DEV environment, and have been puzzled by this
> problem for a couple of days.
> If I build the index in one setting(because DEV database is very very
> small),  the index is correct because I can get hits for my queries,  also,
> what luke shows looks fine,  4800 documents, 450 terms.
> However, if I test building using my batch processing job,  I do get the
> index which looks fine, but, when I search, it already returns 0 hits.  I
> checked with Luke, which shows there are 5200 documents, 0 terms .
> There is no exception or runtime error or anything abnormal during indexing
> or searching,  I am really at a loss.
> The only difference between the two is that:   in the one setting approach,
> the whole index is built using the same indexwriter object.
> in the batch approach,  an indexwriter object is opened per batch and
> closed
> when the batch is finished.
> But,  I  think I have taken care of it by
>               IndexWriter  writer = new IndexWriter(FSDir, Analyser,
> !FSdir.exists)
> Since lucene is designed for adding to exisiting index when the 3rd
> parameter is false,   I do not understand where it went wrong.
> Should I have kept one singleton instance of the writer  until  all
> documents in the database are processed, rather than opening &closing one
> for each batch?     Or,  should I have kept a single instance of analyser?
> This does not seem necessary, but I really can not figure out where it went
> wrong, and how come this strange behavior:  520 documents but 0 terms.
> I would be very grateful if anyone could advise.  THanks very much.
> yanyan
> --
> View this message in context:
> Sent from the Lucene - Java Users mailing list archive at
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

The facts expressed here belong to everybody, the opinions to me.
The distinction is yours to draw............

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message