hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ruben Quintero <rfq_...@yahoo.com>
Subject Re: hbase -0.90.x upgrade - zookeeper exception in mapreduce job
Date Wed, 13 Apr 2011 19:45:54 GMT
The problem is the connections are never closed... so they just keep piling up 
until it hits the max. My max is at 400 right now, so after 14-15 hours of 
running, it gets stuck in an endless connection retry.

I saw that the HConnectionManager will kick older HConnections out, but the 
problem is that their ZooKeeper threads continue on. Those need to be explicitly 
closed. 



Again, this is only an issue inside JVMs set to run forever, like Venkatesh 
said, because that's when the orphaned ZK connections will have a chance to 
build up to whatever your maximum is. Setting that higher and higher is just 
prolonging uptime before the eventual crash. It's essentially a memory 
(connection) leak within TableInputFormat, since there is no way that I can see 
to properly access and close those spawned connections.

One question for you, JD: Inside of TableInputFormat.setConf, does the 
Configuration need to be cloned? (i.e. setHTable(new HTable(new 
Configuration(conf), tableName)); ). I'm guessing this is to prevent changes 
within the job from affecting the table and vice-versa...but if it weren't 
cloned, then you could use the job configuration (job.getConfiguration()) to 
close the connection....
      
Other quick fixes that I can think of, none of which are very pretty:
1 - Just call deleteAllConnections(bool), and have any other processes using 
HConnections recover from that.
2 - Make the static HBASE_INSTANCES map accessible (public).... then you could 
iterate through open connections and try to match configs....

Venkatesh - unless you have other processes in your JVM accessing HBase (I have 
one), #1 might be the best bet.

- Ruben



________________________________
From: Jean-Daniel Cryans <jdcryans@apache.org>
To: user@hbase.apache.org
Sent: Wed, April 13, 2011 3:22:48 PM
Subject: Re: hbase -0.90.x upgrade - zookeeper exception in mapreduce job

Like I said, it's a zookeeper configuration that you can change. If
hbase is managing your zookeeper then set
hbase.zookeeper.property.maxClientCnxns to something higher than 30
and restart the zk server (can be done while hbase is running).

J-D

On Wed, Apr 13, 2011 at 12:04 PM, Venkatesh <vramanathan00@aol.com> wrote:
> Reuben:
> Yes..I've the exact same issue now..& I'm also kicking off from another jvm 
>that runs for ever..
> I don't have an alternate solution..either modify hbase code (or) modify my 
>code to kick off
> as a standalone jvm (or) hopefully 0.90.3 release soon :)
> J-D/St.Ack may have some suggestions
>
>
> V
>
>
>
>
>
>
>
>
> -----Original Message-----
> From: Ruben Quintero <rfq_dev@yahoo.com>
> To: user@hbase.apache.org
> Sent: Wed, Apr 13, 2011 2:39 pm
> Subject: Re: hbase -0.90.x upgrade - zookeeper exception in mapreduce job
>
>
> The problem I'm having is in getting the conf that is used to init the table
>
> within TableInputFormat. That's the one that's leaving open ZK connections for
>
> me.
>
>
>
> Following the code through, TableInputFormat initializes its HTable with new
>
> Configuration(new JobConf(conf)), where conf is the config I pass in via job
>
> initiation. I don't see a way of getting the initalized TableInputFormat in
>
> order to then get its table and its config to be able to properly close that
>
> connection. Cloned configs don't appear to produce similar hashes, either. The
>
> only other option I'm left with is closing all connections, but that disrupts
>
> things across the board.
>
>
>
>
>
> For MapReduce jobs run in their own JVM, this wouldn't be much of an issue, as
>
> the connection would just be closed on completion, but in my case (our code
>
> triggers the jobs internally), they simply pile up until the ConnectionLoss 
>hits
>
>
>
> due to too many ZK connections.
>
>
>
> Am I missing a way to get that buried table's config, or another way to kill 
>the
>
>
>
> orphaned connections?
>
>
>
> - Ruben
>
>
>
>
>
>
>
> ________________________________
>
> From: Venkatesh <vramanathan00@aol.com>
>
> To: user@hbase.apache.org
>
> Sent: Wed, April 13, 2011 10:20:50 AM
>
> Subject: Re: hbase -0.90.x upgrade - zookeeper exception in mapreduce job
>
>
>
> Thanks J-D
>
> I made sure to pass conf objects around in places where I was n't..
>
> will give it a try
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> -----Original Message-----
>
> From: Jean-Daniel Cryans <jdcryans@apache.org>
>
> To: user@hbase.apache.org
>
> Sent: Tue, Apr 12, 2011 6:40 pm
>
> Subject: Re: hbase -0.90.x upgrade - zookeeper exception in mapreduce job
>
>
>
>
>
> Yes there are a few places like that. Also when you create new
>
>
>
> HTables, you should also close their connections (this is not done in
>
>
>
> htable.close).
>
>
>
>
>
>
>
> See HTable's javadoc which says:
>
>
>
>
>
>
>
> Instances of HTable passed the same Configuration instance will share
>
>
>
> connections to servers out on the cluster and to the zookeeper
>
>
>
> ensemble as well as caches of region locations. This is usually a
>
>
>
> *good* thing. This happens because they will all share the same
>
>
>
> underlying HConnection instance. See HConnectionManager for more on
>
>
>
> how this mechanism works.
>
>
>
>
>
>
>
> and it points to HCM which has more information:
>
>
>
>http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HConnectionManager.html
>l
>
>
>
>
>
>
>
>
>
> J-D
>
>
>
>
>
>
>
> On Tue, Apr 12, 2011 at 3:09 PM, Ruben Quintero <rfq_dev@yahoo.com> wrote:
>
>
>
>> I'm running into the same issue, but did some poking around and it seems that
>
>
>
>> Zookeeper connections are being left open by an HBase internal.
>
>
>
>>
>
>
>
>> Basically, I'm running a mapreduce job within another program, and noticed in
>
>
>
>> the logs that every time the job is run, a connection is open, but I never 
see
>
>
>
>> it closed again. The connection is opened within the job.submit().
>
>
>
>>
>
>
>
>> I looked closer and checked the jstack after running it for just under an
>
>
>
> hour,
>
>
>
>> and sure enough there are a ton of Zookeeper threads just sitting there.
>
>
>
> Here's
>
>
>
>> a pastebin link: http://pastebin.com/MccEuvrc
>
>
>
>>
>
>
>
>> I'm running 0.90.0 right now.
>
>
>
>>
>
>
>
>> - Ruben
>
>
>
>>
>
>
>
>>
>
>
>
>>
>
>
>
>>
>
>
>
>>
>
>
>
>>
>
>
>
>> ________________________________
>
>
>
>> From: Jean-Daniel Cryans <jdcryans@apache.org>
>
>
>
>> To: user@hbase.apache.org
>
>
>
>> Sent: Tue, April 12, 2011 4:23:05 PM
>
>
>
>> Subject: Re: hbase -0.90.x upgrade - zookeeper exception in mapreduce job
>
>
>
>>
>
>
>
>> It's more in the vain of
>
>
>
>> https://issues.apache.org/jira/browse/HBASE-3755 and
>
>
>
>> https://issues.apache.org/jira/browse/HBASE-3771
>
>
>
>>
>
>
>
>> Basically 0.90 has a regression regarding the handling of zookeeper
>
>
>
>> connections that make it that you have to be super careful not to have
>
>
>
>> more than 30 per machine (each new Configuration is one new ZK
>
>
>
>> connection). Upping your zookeeper max connection config should get
>
>
>
>> rid of your issues since you only get it occasionally.
>
>
>
>>
>
>
>
>> J-D
>
>
>
>>
>
>
>
>> On Tue, Apr 12, 2011 at 7:59 AM, Venkatesh <vramanathan00@aol.com> wrote:
>
>
>
>>>
>
>
>
>>>  I get this occasionally..(not all the time)..Upgrading from 0.20.6 to 
0.90.2
>
>
>
>>> Is this issue same as this JIRA
>
>
>
>>> https://issues.apache.org/jira/browse/HBASE-3578
>
>
>
>>>
>
>
>
>>> I'm using HBaseConfiguration.create() & setting that in job
>
>
>
>>> thx
>
>
>
>>> v
>
>
>
>>>
>
>
>
>>>
>
>
>
>>>  2011-04-12 02:13:06,870 ERROR Timer-0
>
>
>
>>>org.apache.hadoop.hbase.mapreduce.TableInputFormat -
>
>
>
>>>org.apache.hadoop.hbase.ZooKeeperConnectionException:
>
>
>
>>>org.apache.hadoop.hbase.ZooKeeperConnectionException:
>
>
>
>>>org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode
>
>
>
> =
>
>
>
>>>ConnectionLoss for /hbase        at
>
>
>
>>>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getZooKeeperWatcher(HConnectionManager.java:1000)
>>>
>
>>>
>
>
>
>>>
>
>
>
>>>        at
>
>
>
>>>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:303)
>>>
>
>>>
>
>
>
>>>
>
>
>
>>>        at
>
>
>
>>>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.<init>(HConnectionManager.java:294)
>>>
>
>>>
>
>
>
>>>
>
>
>
>>>        at
>
>
>
>>>org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:156)
>>>
>
>>>
>
>
>
>>>        at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:167)
>
>
>
>>>        at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:145)
>
>
>
>>>        at
>
>
>
>>>org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(TableInputFormat.java:91)
>>>
>
>>>
>
>
>
>>>
>
>
>
>>>        at
>
>
>
>>>org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
>
>
>
>>>        at
>
>
>
>>>org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>
>
>
>>>        at
>
>
>
>> org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:882)
>
>
>
>>>        at
>
>
>
>>>org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779)
>
>
>
>>>        at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
>
>
>
>>>        at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:448)
>
>
>
>>>
>
>
>
>>>
>
>
>
>>>
>
>
>
>>>
>
>
>
>>
>
>
>
>>
>
>
>
>>
>
>
>
>>
>
>
>
>
>
>
>
>
>
>



      
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message