lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mateusz Berezecki <mateu...@gmail.com>
Subject Re: indexing performance problems
Date Wed, 10 Jun 2009 08:42:13 GMT
Hi list!

I'm forwarding as somehow I did not put the list in the CC but the
answer I think is noteworthy, so here it is. Please remember to use
StringBuffer before blaming lucene ;-)

Actual time consumed by lucene is now ~130 minutes as opposed to 20
hours which is neat. I can do much more passes during the day.

Mateusz

On Wed, Jun 10, 2009 at 2:31 AM, Michael
McCandless<lucene@mikemccandless.com> wrote:
> Oh actually could you send this response to the full list (so they see
> closure too)?  Thanks.
>
> Mike
>
> On Tue, Jun 9, 2009 at 6:32 PM, Mateusz Berezecki<mateuszb@gmail.com> wrote:
>> Hi Michael,
>>
>> It took a while but thanks to your suggestions I've started poking
>> around and it turned out that the bottleneck was with GC for JVM and
>> in my use of String instead of StringBuffer. The thing is now over 20
>> times faster !
>>
>> Thanks a lot !
>>
>> Mateusz
>>
>> On Mon, Jun 8, 2009 at 2:13 PM, Michael
>> McCandless<lucene@mikemccandless.com> wrote:
>>> On Mon, Jun 8, 2009 at 7:54 AM, Mateusz Berezecki<mateuszb@gmail.com> wrote:
>>>
>>>> Thanks for a prompt response.
>>>
>>> You're welcome!
>>>
>>>>> A mergeFactor of 150 is way too high; I'd put that back to 10 and see
>>>>> if the problem persists.  Also make sure you're using
>>>>> autoCommit=false, and try the suggestions here:
>>>>>
>>>>>    http://wiki.apache.org/lucene-java/ImproveIndexingSpeed
>>>>
>>>> I've set mergeFactor to 10, 15 and 20 before trying out 150 and the
>>>> problem persisted, although I have to admit that 2.9 gives some
>>>> serious speed improvements as compared to 2.4.1 which I believe is a
>>>> good sign, i.e. it reaches the same document that causes deadlock much
>>>> faster than 2.4.1 does
>>>
>>> Hmm: do you know for certain that a particular document causes this?
>>> If you make a standalone test indexing only that document, does the
>>> problem happen?
>>>
>>>>> You're sure the JRE's heap size is big enough?
>>>>
>>>> I've set it to 3.8 GB and I'm running it on a desktop with 4 GB of RAM.
>>>
>>> OK sounds like plenty, though likely the OS won't give you 3.8 GB (if
>>> the JRE is 32-bit).
>>>
>>>>> If the problem persists... can you turn on IndexWriter's infoStream
>>>>> and post the resulting output leading up to the 100% CPU?  You might
>>>>> also try "kill -QUIT" when the 100% CPU problem is happening, to catch
>>>>> the stack trace of all threads, and post that too...
>>>>
>>>> Not sure how do I turn on the infoStream and autoCommit? WRT to
>>>> autoCommit I did not use the deprecated API with autoCommit flags in
>>>> constructors, so assuming I used the recommended API is the autoCommit
>>>> on/off by default?
>>>
>>> For infoStream, eg:  IndexWriter.setInfoStream(System.out);
>>>
>>> And, yes, since you're using a non-deprecated ctor of IndexWriter, you
>>> are getting autoCommit=false, so that's good.
>>>
>>> Mike
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message