lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: indexing performance problems
Date Wed, 10 Jun 2009 10:36:20 GMT
Thanks for bringing closure!

Mike

On Wed, Jun 10, 2009 at 4:42 AM, Mateusz Berezecki<mateuszb@gmail.com> wrote:
> Hi list!
>
> I'm forwarding as somehow I did not put the list in the CC but the
> answer I think is noteworthy, so here it is. Please remember to use
> StringBuffer before blaming lucene ;-)
>
> Actual time consumed by lucene is now ~130 minutes as opposed to 20
> hours which is neat. I can do much more passes during the day.
>
> Mateusz
>
> On Wed, Jun 10, 2009 at 2:31 AM, Michael
> McCandless<lucene@mikemccandless.com> wrote:
>> Oh actually could you send this response to the full list (so they see
>> closure too)?  Thanks.
>>
>> Mike
>>
>> On Tue, Jun 9, 2009 at 6:32 PM, Mateusz Berezecki<mateuszb@gmail.com> wrote:
>>> Hi Michael,
>>>
>>> It took a while but thanks to your suggestions I've started poking
>>> around and it turned out that the bottleneck was with GC for JVM and
>>> in my use of String instead of StringBuffer. The thing is now over 20
>>> times faster !
>>>
>>> Thanks a lot !
>>>
>>> Mateusz
>>>
>>> On Mon, Jun 8, 2009 at 2:13 PM, Michael
>>> McCandless<lucene@mikemccandless.com> wrote:
>>>> On Mon, Jun 8, 2009 at 7:54 AM, Mateusz Berezecki<mateuszb@gmail.com>
wrote:
>>>>
>>>>> Thanks for a prompt response.
>>>>
>>>> You're welcome!
>>>>
>>>>>> A mergeFactor of 150 is way too high; I'd put that back to 10 and
see
>>>>>> if the problem persists.  Also make sure you're using
>>>>>> autoCommit=false, and try the suggestions here:
>>>>>>
>>>>>>    http://wiki.apache.org/lucene-java/ImproveIndexingSpeed
>>>>>
>>>>> I've set mergeFactor to 10, 15 and 20 before trying out 150 and the
>>>>> problem persisted, although I have to admit that 2.9 gives some
>>>>> serious speed improvements as compared to 2.4.1 which I believe is a
>>>>> good sign, i.e. it reaches the same document that causes deadlock much
>>>>> faster than 2.4.1 does
>>>>
>>>> Hmm: do you know for certain that a particular document causes this?
>>>> If you make a standalone test indexing only that document, does the
>>>> problem happen?
>>>>
>>>>>> You're sure the JRE's heap size is big enough?
>>>>>
>>>>> I've set it to 3.8 GB and I'm running it on a desktop with 4 GB of RAM.
>>>>
>>>> OK sounds like plenty, though likely the OS won't give you 3.8 GB (if
>>>> the JRE is 32-bit).
>>>>
>>>>>> If the problem persists... can you turn on IndexWriter's infoStream
>>>>>> and post the resulting output leading up to the 100% CPU?  You might
>>>>>> also try "kill -QUIT" when the 100% CPU problem is happening, to
catch
>>>>>> the stack trace of all threads, and post that too...
>>>>>
>>>>> Not sure how do I turn on the infoStream and autoCommit? WRT to
>>>>> autoCommit I did not use the deprecated API with autoCommit flags in
>>>>> constructors, so assuming I used the recommended API is the autoCommit
>>>>> on/off by default?
>>>>
>>>> For infoStream, eg:  IndexWriter.setInfoStream(System.out);
>>>>
>>>> And, yes, since you're using a non-deprecated ctor of IndexWriter, you
>>>> are getting autoCommit=false, so that's good.
>>>>
>>>> Mike
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message