lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From yanyanzeng <luckyyanyan...@hotmail.com>
Subject Re: bad index by batch indexing
Date Thu, 07 Aug 2008 03:47:06 GMT

hi, thank you very much for your reply.  I am using the latest version,
lucene 2.3.2.   I will try using two arguments and post my result later.

yanyan




Anshum-2 wrote:
> 
> This really seems like an issue the batching mechanism (one of those
> errors
> which seem trivial on discovery :) ). I work with batched indexing and it
> works absolutely fine on data that is a lot higher in magnitude. You could
> try calling the indexwriter without the 3rd argument and see if it helps.
> Also, which version of lucene are you using?
> 
> --
> Anshum
> http://ai-cafe.blogspot.com
> 
> On Thu, Aug 7, 2008 at 6:09 AM, yanyanzeng
> <luckyyanyan888@hotmail.com>wrote:
> 
>>
>> Hi,
>>    I am building a search engine for text transcript documents from the
>> database of an enterprise messaging system,  and have designed a batch
>> processing job to incrementally build the index,because the database from
>> production is around huge, around 10G.
>>   Now I am still testing in DEV environment, and have been puzzled by
>> this
>> problem for a couple of days.
>> If I build the index in one setting(because DEV database is very very
>> small),  the index is correct because I can get hits for my queries, 
>> also,
>> what luke shows looks fine,  4800 documents, 450 terms.
>> However, if I test building using my batch processing job,  I do get the
>> index which looks fine, but, when I search, it already returns 0 hits.  I
>> checked with Luke, which shows there are 5200 documents, 0 terms .
>> There is no exception or runtime error or anything abnormal during
>> indexing
>> or searching,  I am really at a loss.
>> The only difference between the two is that:   in the one setting
>> approach,
>> the whole index is built using the same indexwriter object.
>> in the batch approach,  an indexwriter object is opened per batch and
>> closed
>> when the batch is finished.
>> But,  I  think I have taken care of it by
>>               IndexWriter  writer = new IndexWriter(FSDir, Analyser,
>> !FSdir.exists)
>>
>> Since lucene is designed for adding to exisiting index when the 3rd
>> parameter is false,   I do not understand where it went wrong.
>> Should I have kept one singleton instance of the writer  until  all
>> documents in the database are processed, rather than opening &closing one
>> for each batch?     Or,  should I have kept a single instance of
>> analyser?
>> This does not seem necessary, but I really can not figure out where it
>> went
>> wrong, and how come this strange behavior:  520 documents but 0 terms.
>>
>> I would be very grateful if anyone could advise.  THanks very much.
>>
>> yanyan
>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/bad-index-by-batch-indexing-tp18862037p18862037.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
> 
> 
> -- 
> --
> The facts expressed here belong to everybody, the opinions to me.
> The distinction is yours to draw............
> 
> 

-- 
View this message in context: http://www.nabble.com/bad-index-by-batch-indexing-tp18862037p18863533.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message