lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexey Kovyrin <ale...@kovyrin.net>
Subject Re: Solr branch_3x problems
Date Sat, 25 Dec 2010 19:15:25 GMT
Today I've managed to get the following from a "dead" server:
- gc log since start till the death of the service
- jutil -gc -t 1000 output since start till the end
- thread stack dump before killing the server
- heap histogram before killing the server
- heap dump



On Fri, Dec 24, 2010 at 11:36 PM, Lance Norskog <goksron@gmail.com> wrote:
> More details, please. You tried all of the different GC
> implementations? Is there enough memory assign to the JVM to run
> comfortably but no much more? (The OS uses spare memory as disk
> buffers a lot better than Java does.)

We have 24Gb ram on the server, we use 6Gb (used to have 12Gb but that
did not make any difference) dedicated to java vm (xmx6000m).

> How many threads are there? Distributed search uses two searches, both
> parallelized with 1 thread per shard. Perhaps they're building up?


There are usually around 150-200 threads in the jvm.

> Do a heap scan with text output every, say, 6 hours. If there is
> something building up, you might spot it.

I did heap dump + heap histogram before killing the jvm today and the
only really suspicious thing was the top line in the histogram:
class [B,
81883 instances,
3,974,092,842 bytes

Most of the instances (actually all of around a hundred of them I
checked with jhat) look almost the same in terms of references:

org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput@0x2aab14a47120
(98 bytes) : field buffer
java.nio.HeapByteBuffer@0x2aab14a475a8 (55 bytes) : field hb


> Also RMI is very bad on GC. Are you connecting to Solr or the Tomcat with it?

I believe we don't use it here.

> On Tue, Dec 21, 2010 at 7:09 PM, Alexey Kovyrin <alexey@kovyrin.net> wrote:
>> Hello guys,
>>
>> We at scribd.com have recently deployed our new search cluster based
>> on Dec 1st, 2010 branch_3x solr code and we're very happy about the
>> new features in brings.
>> Though looks like we have a weird problem here: once a day our servers
>> handling sharded search queries (frontend servers that receive
>> requests and then fan them out to backend machines) die. Everything
>> looks cool for a day, memory usage is stable, GC is doing its work as
>> usual.... and then eventually we get a weird GC activity spike that
>> kills whole VM and the only way to bring it back is to kill -9 the
>> tomcat6 vm and restart it. We've tried different GC tuning options,
>> tried to reduce caches to almost a zero size, still no luck.
>>
>> So I was wondering if there were any known issues with solr branch 3x
>> in the last month that could have caused this kind of problems or if
>> we could provide any more information that could help to track down
>> the issue.
>>
>> Thanks.
>>
>> --
>> Alexey Kovyrin
>> http://kovyrin.net/
>>
>
>
>
> --
> Lance Norskog
> goksron@gmail.com
>



-- 
Alexey Kovyrin
http://kovyrin.net/

Mime
View raw message