Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm
Precedence: bulk
Reply-To: "Lucene Users List" <lucene-user@jakarta.apache.org>
Message-ID: <40EBC921.4000209@getopt.org>
Date: Wed, 07 Jul 2004 11:57:53 +0200
From: Andrzej Bialecki <ab@getopt.org>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US;
 rv:1.7) Gecko/20040608
MIME-Version: 1.0
To: Lucene Users List <lucene-user@jakarta.apache.org>
Subject: Re: Most efficient way to index 14M documents (out of memory/file
 handles)
References: <200407070723.i677NJU9023504@outmail.freedom2surf.net>
In-Reply-To: <200407070723.i677NJU9023504@outmail.freedom2surf.net>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit

markharw00d@yahoo.co.uk wrote:

> A colleague of mine found the fastest way to index was to use a RAMDirectory, letting it grow
> to a pre-defined maximum size, then merging it to a new temporary file-based index to
> flush it. Repeat this, creating new directories for all the file based indexes then perform 
> a merge into one index once all docs are indexed.
> 
> I haven't managed to test this for myself but my colleague  says he noticed a 
> considerable speed up by merging once at the end with this approach so you may want
> to give it a try. (This was with Lucene 1.3)

I can confirm that this approach works quite well - I use it myself in 
some applications, both with Lucene 1.3 and 1.4. The disadvantage is of 
course that the memory consumption goes up, so you have to be careful to 
   cap the max size of RAMDirectory according to your max heap size limits.

-- 
Best regards,
Andrzej Bialecki

-------------------------------------------------
Software Architect, System Integration Specialist
CEN/ISSS EC Workshop, ECIMF project chair
EU FP6 E-Commerce Expert/Evaluator
-------------------------------------------------
FreeBSD developer (http://www.freebsd.org)


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org