lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: addIndexes() Size
Date Mon, 06 Dec 2004 17:27:17 GMT
There was a bug in 1.4 (and maybe 1.4.1?) that kept some index files 
around that were not used.

Are you using Lucene 1.4.3?  It not, try that and see if it helps.

	Erik

On Dec 6, 2004, at 12:17 PM, Garrett Heaver wrote:

> No there are no duplicate copies - I've the correct number when I view
> through luke and I don't overlap - the temporary index is destroyed 
> after it
> is added to the main index - I'm currently at index version 159 and it 
> seems
> that all of my .prx files come in at around 1435 megs (ouch)
>
> Thanks
> Garrett
>
> -----Original Message-----
> From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
> Sent: 06 December 2004 17:12
> To: Lucene Users List
> Subject: Re: addIndexes() Size
>
> If I were you, I would first use Luke to peek at the index.  You may
> find something obvious there, like multiple copies of the same
> Document.
> Does your temp index 'overlap' with A index in terms of Documents?  If
> so, you will end up with multliple copies, as addIndexes method doesn't
> detect and remove duplicate Documents.
>
> Otis
>
> --- Garrett Heaver <garrett.heaver@researchandmarkets.com> wrote:
>
>> Hi.
>>
>>
>>
>> Its probably really simple to explain this but since I'm not up to
>> speed on
>> the way Lucene stores the data I'm a little confused.
>>
>>
>>
>> I'm building an Index, which resides on Server A, with the Lucene
>> Service
>> running on Server B. Now not to bore you with the details but because
>> of the
>> network transfer rate etc I'm running the actual index on
>> \\ServerA\idx
>> <file:///\\ServerA\idx>  and building a temp Index at
>> \\ServerB\idx\temp
>> <file:///\\ServerB\idx\temp>  (obviously because the Local FS is much
>> faster
>> for the service) and then calling addIndexes to import the temp index
>> to the
>> ServerA index before destroying the ServerB index, holding for a bit
>> and
>> then checking for new documents.
>>
>>
>>
>> All works grand BUT the size of the resultant index on ServerA is
>> HUGE in
>> comparison to one I'd build from start to finish (i.e. a simple
>> addDocument
>> Index) - 38gig for 220,000 Unstored Items cannot be right (to give
>> you and
>> idea of how mad this seems, the backed up version of the database
>> from which
>> the data is pulled is only 2gigs)
>>
>>
>>
>> I've considered it being perhaps the number of Items that had to be
>> integrated each time addIndexes was called - right now I'm adding
>> around
>> 10,000 at a time (I had done 1000 at a time but this looked like it
>> was
>> going to end up even larger still)
>>
>>
>>
>> I'm holding off twiddling the minMergeDocs and mergeFactor until I
>> can get a
>> better understanding of whats going on here.
>>
>>
>>
>> Many thanks for any reply's
>>
>> Garrett
>>
>>
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message