hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Hunt <ph...@apache.org>
Subject Re: Zookeeper session lost
Date Wed, 31 Mar 2010 15:23:00 GMT
Ok, well swapping, esp if combined with GC, can def. account for very 
long delays.

Not sure if anyone provided this before but take a look at the swapping 
section on the ZK troubleshooting page. That section, or perhaps one of 
the other sections on that page, might give you addl insight.
http://wiki.apache.org/hadoop/ZooKeeper/Troubleshooting

Good Luck,

Patrick

Peter Falk wrote:
> We had 4GB head for the region server, on a machine with 8GB that was 
> also running a data node and a zoo keeper. We have tried with the 
> incremental garbage collector before, but had problem with a running 
> away heap size, resulting in swapping. We were/are running with 
> the parallel GC now. When the session expire problem occurred, we 
> noticed swapping on the node just before. Therefore, we are a bit afraid 
> to increase heap size more, or to try to incremental GC again. We are 
> not running in any virtualized environment.
> 
> Thanks for the various responses, and the recommendations. I think it 
> would be nice with an option to automatically restart region server for 
> situations like this.
> 
> TIA,
> Peter
> 
> On Tue, Mar 30, 2010 at 18:25, Patrick Hunt <phunt@apache.org 
> <mailto:phunt@apache.org>> wrote:
> 
>     Are you running in a virtualized environment by chance? (ec2,
>     vmware, etc...) vms, esp oversubscribed/overloaded vms, can result
>     in significant io/memory related performance problems.
> 
>     Patrick
> 
> 
>     Peter Falk wrote:
> 
>         Thanks Jean-Daniel. I was not clear about what we have already
>         tried, and we
>         have tried all that you recommend in the updated wiki page,
>         including uppin'
>         the zookeepers session timeout. The node was heavily loaded at
>         the time and
>         it seems the cluster was simply overloaded.
> 
>         However, would it not be possible to automatically start the
>         region server
>         again and let it request new regions? Seems to be dangerous to
>         let region
>         servers die under heavy load like this, and increase the load
>         further on
>         remaining nodes...
> 
>         Sincerely,
>         Peter
> 
>         On Mon, Mar 29, 2010 at 19:38, Jean-Daniel Cryans
>         <jdcryans@apache.org <mailto:jdcryans@apache.org>>wrote:
> 
>             We already had an entry in the wiki for this issue but it
>             wasn't super
>             explicit about what's happening, so I completely rewrote it
>             using the
>             logs from this thread. See
>             http://wiki.apache.org/hadoop/Hbase/Troubleshooting#A9
> 
>             Also I created a jira about putting that link directly into
>             the "We
>             slept Xms, ..." message so that people can get some answers
>             quickly.
>             See https://issues.apache.org/jira/browse/HBASE-2388
> 
>             J-D
> 

Mime
View raw message