lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <>
Subject Re: addIndexes() Size
Date Mon, 06 Dec 2004 17:12:21 GMT
If I were you, I would first use Luke to peek at the index.  You may
find something obvious there, like multiple copies of the same
Does your temp index 'overlap' with A index in terms of Documents?  If
so, you will end up with multliple copies, as addIndexes method doesn't
detect and remove duplicate Documents.


--- Garrett Heaver <> wrote:

> Hi.
> Its probably really simple to explain this but since I'm not up to
> speed on
> the way Lucene stores the data I'm a little confused.
> I'm building an Index, which resides on Server A, with the Lucene
> Service
> running on Server B. Now not to bore you with the details but because
> of the
> network transfer rate etc I'm running the actual index on
> \\ServerA\idx
> <file:///\\ServerA\idx>  and building a temp Index at
> \\ServerB\idx\temp
> <file:///\\ServerB\idx\temp>  (obviously because the Local FS is much
> faster
> for the service) and then calling addIndexes to import the temp index
> to the
> ServerA index before destroying the ServerB index, holding for a bit
> and
> then checking for new documents.
> All works grand BUT the size of the resultant index on ServerA is
> HUGE in
> comparison to one I'd build from start to finish (i.e. a simple
> addDocument
> Index) - 38gig for 220,000 Unstored Items cannot be right (to give
> you and
> idea of how mad this seems, the backed up version of the database
> from which
> the data is pulled is only 2gigs)
> I've considered it being perhaps the number of Items that had to be
> integrated each time addIndexes was called - right now I'm adding
> around
> 10,000 at a time (I had done 1000 at a time but this looked like it
> was
> going to end up even larger still)
> I'm holding off twiddling the minMergeDocs and mergeFactor until I
> can get a
> better understanding of whats going on here.
> Many thanks for any reply's
> Garrett

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message