lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: [JENKINS] Lucene-Solr-4.x-Linux-Java7-64 - Build # 288 - Failure!
Date Sun, 01 Jul 2012 18:01:27 GMT
Yeah,
The problems I have seen on my server (all cpu frozen in kernel, cores no
longer responding) was a Linux kernel bug, affecting Ubuntu 10.04 LTS, too: 

http://www.heise.de/newsticker/meldung/Verlaengertes-Wochenende-kann-Linux-e
infrieren-1629683.html
http://serverfault.com/questions/403732/anyone-else-experiencing-high-rates-
of-linux-server-crashes-during-a-leap-second

I have seen this:
Jun 30 23:59:59 serv1 kernel: [43447.313430] Clock: inserting leap second
23:59:60 UTC
Jul  1 01:32:57 serv1 kernel: [49022.733141] BUG: soft lockup - CPU#3 stuck
for 78s! [ksoftirqd/3:13]
Jul  1 01:32:57 serv1 kernel: [49022.747702] Pid: 13, comm: ksoftirqd/3 Not
tainted 2.6.32-41-server #91-Ubuntu PRIMERGY RX100 S6
Jul  1 01:32:57 serv1 kernel: [49022.747705] RIP: 0010:[<ffffffff810586e9>]
[<ffffffff810586e9>] finish_task_switch+0x59/0xe0
Jul  1 01:32:57 serv1 kernel: [49022.747715] RSP: 0018:ffff88027711fd90
EFLAGS: 00000206
Jul  1 01:32:57 serv1 kernel: [49022.747717] RAX: 0000000000011bc0 RBX:
ffff88027711fdc0 RCX: ffff88026b219700
Jul  1 01:32:57 serv1 kernel: [49022.747720] RDX: 0000000000000003 RSI:
0000000000000003 RDI: ffff880008ed5e00
Jul  1 01:32:57 serv1 kernel: [49022.747723] RBP: ffffffff81013ace R08:
ffff88027711e000 R09: 0000000000000000
Jul  1 01:32:57 serv1 kernel: [49022.747726] R10: 00007f4b9099d828 R11:
0000000000000001 R12: ffff88027711fd10
Jul  1 01:32:57 serv1 kernel: [49022.747729] R13: 0000000300000c00 R14:
ffff880008ecfba0 R15: 0000000200000000
Jul  1 01:32:57 serv1 kernel: [49022.747732] FS:  0000000000000000(0000)
GS:ffff880008ec0000(0000) knlGS:0000000000000000
Jul  1 01:32:57 serv1 kernel: [49022.747735] CS:  0010 DS: 0018 ES: 0018
CR0: 000000008005003b
Jul  1 01:32:57 serv1 kernel: [49022.747738] CR2: 00007f8e1e9e63a0 CR3:
000000026baed000 CR4: 00000000000006e0
Jul  1 01:32:57 serv1 kernel: [49022.747741] DR0: 0000000000000000 DR1:
0000000000000000 DR2: 0000000000000000
Jul  1 01:32:57 serv1 kernel: [49022.747743] DR3: 0000000000000000 DR6:
00000000ffff0ff0 DR7: 0000000000000400
Jul  1 01:32:57 serv1 kernel: [49022.747746] Call Trace:
Jul  1 01:32:57 serv1 kernel: [49022.747752]  [<ffffffff8155e8d9>] ?
thread_return+0x48/0x41f
Jul  1 01:32:57 serv1 kernel: [49022.747758]  [<ffffffff8106493a>] ?
__cond_resched+0x2a/0x40
Jul  1 01:32:57 serv1 kernel: [49022.747761]  [<ffffffff8155ee00>] ?
_cond_resched+0x30/0x40
Jul  1 01:32:59 serv1 kernel: [49022.747766]  [<ffffffff8106edf5>] ?
ksoftirqd+0x85/0x110
Jul  1 01:32:59 serv1 kernel: [49022.747769]  [<ffffffff8106ed70>] ?
ksoftirqd+0x0/0x110
Jul  1 01:32:59 serv1 kernel: [49022.747774]  [<ffffffff810860f6>] ?
kthread+0x96/0xa0
Jul  1 01:32:59 serv1 kernel: [49022.747778]  [<ffffffff810141aa>] ?
child_rip+0xa/0x20
Jul  1 01:32:59 serv1 kernel: [49022.747782]  [<ffffffff81086060>] ?
kthread+0x0/0xa0
Jul  1 01:32:59 serv1 kernel: [49022.747785]  [<ffffffff810141a0>] ?
child_rip+0x0/0x20

And this multiple times, one time for all cores. I had to reboot to get the
load of 3 times the number of cores away.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Mark Miller [mailto:markrmiller@gmail.com]
> Sent: Sunday, July 01, 2012 7:39 PM
> To: dev@lucene.apache.org
> Subject: Re: [JENKINS] Lucene-Solr-4.x-Linux-Java7-64 - Build # 288 -
Failure!
> 
> 
> On Jul 1, 2012, at 12:32 PM, Uwe Schindler wrote:
> 
> > My other Ubuntu box with 11.04 (non-LTS, 2.6.38) was responsible, but no
> Java processes, so I have no idea about Java there. All three machines had
NTP
> to de.pool.ntp.org.
> 
> I think Linux had its own problems with the leap second even without Java.
> 
> http://www.wired.com/wiredenterprise/2012/07/leap-second-bug-wreaks-
> havoc-with-java-linux/
> 
> http://gigaom.com/2012/07/01/leap-second-bugs-take-out-some-prominent-
> websites/
> 
> http://googleblog.blogspot.com/2011/09/time-technology-and-leaping-
> seconds.html?m=1
> 
> 
> - Mark Miller
> lucidimagination.com
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional
> commands, e-mail: dev-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message