lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Garrett Heaver" <garrett.hea...@researchandmarkets.com>
Subject RE: addIndexes() Size
Date Mon, 06 Dec 2004 17:17:25 GMT
No there are no duplicate copies - I've the correct number when I view
through luke and I don't overlap - the temporary index is destroyed after it
is added to the main index - I'm currently at index version 159 and it seems
that all of my .prx files come in at around 1435 megs (ouch)

Thanks
Garrett

-----Original Message-----
From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com] 
Sent: 06 December 2004 17:12
To: Lucene Users List
Subject: Re: addIndexes() Size

If I were you, I would first use Luke to peek at the index.  You may
find something obvious there, like multiple copies of the same
Document.
Does your temp index 'overlap' with A index in terms of Documents?  If
so, you will end up with multliple copies, as addIndexes method doesn't
detect and remove duplicate Documents.

Otis

--- Garrett Heaver <garrett.heaver@researchandmarkets.com> wrote:

> Hi.
> 
>  
> 
> Its probably really simple to explain this but since I'm not up to
> speed on
> the way Lucene stores the data I'm a little confused.
> 
>  
> 
> I'm building an Index, which resides on Server A, with the Lucene
> Service
> running on Server B. Now not to bore you with the details but because
> of the
> network transfer rate etc I'm running the actual index on
> \\ServerA\idx
> <file:///\\ServerA\idx>  and building a temp Index at
> \\ServerB\idx\temp
> <file:///\\ServerB\idx\temp>  (obviously because the Local FS is much
> faster
> for the service) and then calling addIndexes to import the temp index
> to the
> ServerA index before destroying the ServerB index, holding for a bit
> and
> then checking for new documents.
> 
>  
> 
> All works grand BUT the size of the resultant index on ServerA is
> HUGE in
> comparison to one I'd build from start to finish (i.e. a simple
> addDocument
> Index) - 38gig for 220,000 Unstored Items cannot be right (to give
> you and
> idea of how mad this seems, the backed up version of the database
> from which
> the data is pulled is only 2gigs)
> 
>  
> 
> I've considered it being perhaps the number of Items that had to be
> integrated each time addIndexes was called - right now I'm adding
> around
> 10,000 at a time (I had done 1000 at a time but this looked like it
> was
> going to end up even larger still)
> 
>  
> 
> I'm holding off twiddling the minMergeDocs and mergeFactor until I
> can get a
> better understanding of whats going on here.
> 
>  
> 
> Many thanks for any reply's
> 
> Garrett
> 
>  
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message