lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anshum <ansh...@gmail.com>
Subject Re: bad index by batch indexing
Date Thu, 07 Aug 2008 02:08:21 GMT
This really seems like an issue the batching mechanism (one of those errors
which seem trivial on discovery :) ). I work with batched indexing and it
works absolutely fine on data that is a lot higher in magnitude. You could
try calling the indexwriter without the 3rd argument and see if it helps.
Also, which version of lucene are you using?

--
Anshum
http://ai-cafe.blogspot.com

On Thu, Aug 7, 2008 at 6:09 AM, yanyanzeng <luckyyanyan888@hotmail.com>wrote:

>
> Hi,
>    I am building a search engine for text transcript documents from the
> database of an enterprise messaging system,  and have designed a batch
> processing job to incrementally build the index,because the database from
> production is around huge, around 10G.
>   Now I am still testing in DEV environment, and have been puzzled by this
> problem for a couple of days.
> If I build the index in one setting(because DEV database is very very
> small),  the index is correct because I can get hits for my queries,  also,
> what luke shows looks fine,  4800 documents, 450 terms.
> However, if I test building using my batch processing job,  I do get the
> index which looks fine, but, when I search, it already returns 0 hits.  I
> checked with Luke, which shows there are 5200 documents, 0 terms .
> There is no exception or runtime error or anything abnormal during indexing
> or searching,  I am really at a loss.
> The only difference between the two is that:   in the one setting approach,
> the whole index is built using the same indexwriter object.
> in the batch approach,  an indexwriter object is opened per batch and
> closed
> when the batch is finished.
> But,  I  think I have taken care of it by
>               IndexWriter  writer = new IndexWriter(FSDir, Analyser,
> !FSdir.exists)
>
> Since lucene is designed for adding to exisiting index when the 3rd
> parameter is false,   I do not understand where it went wrong.
> Should I have kept one singleton instance of the writer  until  all
> documents in the database are processed, rather than opening &closing one
> for each batch?     Or,  should I have kept a single instance of analyser?
> This does not seem necessary, but I really can not figure out where it went
> wrong, and how come this strange behavior:  520 documents but 0 terms.
>
> I would be very grateful if anyone could advise.  THanks very much.
>
> yanyan
>
>
> --
> View this message in context:
> http://www.nabble.com/bad-index-by-batch-indexing-tp18862037p18862037.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
--
The facts expressed here belong to everybody, the opinions to me.
The distinction is yours to draw............

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message