hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: hbase -0.90.x upgrade - zookeeper exception in mapreduce job
Date Wed, 13 Apr 2011 20:41:50 GMT
My comment on HBASE-3734 at 05/Apr/11 05:20 is related to this discussion.

On Wed, Apr 13, 2011 at 1:25 PM, Ruben Quintero <rfq_dev@yahoo.com> wrote:

> Venkatesh, I guess the two quick and dirty solutions are:
>
> - Call deleteAllConnections(bool) at the end of your MapReduce jobs, or
> periodically. If you have no other tables or pools, etc. open, then no
> problem.
> If you do, they'll start throwing IOExceptions, but you can re-instantiate
> them
> with a new config and then continue as usual. (You do have to change the
> config
> or it'll simply grab the closed, cached one from the HCM).
>
> - As J-D said, subclasss TIF and basically copy the old setConf, except
> don't
> clone the conf that gets sent to the table.
>
> Each one has a downside and are definitely not ideal, but if you either
> don't
> modify the config in your job or don't have any other important hbase
> connections, then you can use the appropriate one.
>
>
> Thanks for the assistance, J-D. It's great that these forums are active and
> helpful.
>
> - Ruben
>
>
>
>
> ________________________________
> From: Jean-Daniel Cryans <jdcryans@apache.org>
> To: user@hbase.apache.org
> Sent: Wed, April 13, 2011 3:50:42 PM
> Subject: Re: hbase -0.90.x upgrade - zookeeper exception in mapreduce job
>
> Yeah for a JVM running forever it won't work.
>
> If you know for a fact that the configuration passed to TIF won't be
> changed then you can subclass it and override setConf to not clone the
> conf.
>
> J-D
>
> On Wed, Apr 13, 2011 at 12:45 PM, Ruben Quintero <rfq_dev@yahoo.com>
> wrote:
> > The problem is the connections are never closed... so they just keep
> piling up
> > until it hits the max. My max is at 400 right now, so after 14-15 hours
> of
> > running, it gets stuck in an endless connection retry.
> >
> > I saw that the HConnectionManager will kick older HConnections out, but
> the
> > problem is that their ZooKeeper threads continue on. Those need to be
> >explicitly
> > closed.
> >
> >
> >
> > Again, this is only an issue inside JVMs set to run forever, like
> Venkatesh
> > said, because that's when the orphaned ZK connections will have a chance
> to
> > build up to whatever your maximum is. Setting that higher and higher is
> just
> > prolonging uptime before the eventual crash. It's essentially a memory
> > (connection) leak within TableInputFormat, since there is no way that I
> can
> see
> > to properly access and close those spawned connections.
> >
> > One question for you, JD: Inside of TableInputFormat.setConf, does the
> > Configuration need to be cloned? (i.e. setHTable(new HTable(new
> > Configuration(conf), tableName)); ). I'm guessing this is to prevent
> changes
> > within the job from affecting the table and vice-versa...but if it
> weren't
> > cloned, then you could use the job configuration (job.getConfiguration())
> to
> > close the connection....
> >
> > Other quick fixes that I can think of, none of which are very pretty:
> > 1 - Just call deleteAllConnections(bool), and have any other processes
> using
> > HConnections recover from that.
> > 2 - Make the static HBASE_INSTANCES map accessible (public).... then you
> could
> > iterate through open connections and try to match configs....
> >
> > Venkatesh - unless you have other processes in your JVM accessing HBase
> (I
> have
> > one), #1 might be the best bet.
> >
> > - Ruben
> >
> >
> >
> > ________________________________
> > From: Jean-Daniel Cryans <jdcryans@apache.org>
> > To: user@hbase.apache.org
> > Sent: Wed, April 13, 2011 3:22:48 PM
> > Subject: Re: hbase -0.90.x upgrade - zookeeper exception in mapreduce job
> >
> > Like I said, it's a zookeeper configuration that you can change. If
> > hbase is managing your zookeeper then set
> > hbase.zookeeper.property.maxClientCnxns to something higher than 30
> > and restart the zk server (can be done while hbase is running).
> >
> > J-D
> >
> > On Wed, Apr 13, 2011 at 12:04 PM, Venkatesh <vramanathan00@aol.com>
> wrote:
> >> Reuben:
> >> Yes..I've the exact same issue now..& I'm also kicking off from another
> jvm
> >>that runs for ever..
> >> I don't have an alternate solution..either modify hbase code (or) modify
> my
> >>code to kick off
> >> as a standalone jvm (or) hopefully 0.90.3 release soon :)
> >> J-D/St.Ack may have some suggestions
> >>
> >>
> >> V
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> -----Original Message-----
> >> From: Ruben Quintero <rfq_dev@yahoo.com>
> >> To: user@hbase.apache.org
> >> Sent: Wed, Apr 13, 2011 2:39 pm
> >> Subject: Re: hbase -0.90.x upgrade - zookeeper exception in mapreduce
> job
> >>
> >>
> >> The problem I'm having is in getting the conf that is used to init the
> table
> >>
> >> within TableInputFormat. That's the one that's leaving open ZK
> connections
> for
> >>
> >> me.
> >>
> >>
> >>
> >> Following the code through, TableInputFormat initializes its HTable with
> new
> >>
> >> Configuration(new JobConf(conf)), where conf is the config I pass in via
> job
> >>
> >> initiation. I don't see a way of getting the initalized TableInputFormat
> in
> >>
> >> order to then get its table and its config to be able to properly close
> that
> >>
> >> connection. Cloned configs don't appear to produce similar hashes,
> either.
> The
> >>
> >> only other option I'm left with is closing all connections, but that
> disrupts
> >>
> >> things across the board.
> >>
> >>
> >>
> >>
> >>
> >> For MapReduce jobs run in their own JVM, this wouldn't be much of an
> issue,
> as
> >>
> >> the connection would just be closed on completion, but in my case (our
> code
> >>
> >> triggers the jobs internally), they simply pile up until the
> ConnectionLoss
> >>hits
> >>
> >>
> >>
> >> due to too many ZK connections.
> >>
> >>
> >>
> >> Am I missing a way to get that buried table's config, or another way to
> kill
> >>the
> >>
> >>
> >>
> >> orphaned connections?
> >>
> >>
> >>
> >> - Ruben
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> ________________________________
> >>
> >> From: Venkatesh <vramanathan00@aol.com>
> >>
> >> To: user@hbase.apache.org
> >>
> >> Sent: Wed, April 13, 2011 10:20:50 AM
> >>
> >> Subject: Re: hbase -0.90.x upgrade - zookeeper exception in mapreduce
> job
> >>
> >>
> >>
> >> Thanks J-D
> >>
> >> I made sure to pass conf objects around in places where I was n't..
> >>
> >> will give it a try
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> -----Original Message-----
> >>
> >> From: Jean-Daniel Cryans <jdcryans@apache.org>
> >>
> >> To: user@hbase.apache.org
> >>
> >> Sent: Tue, Apr 12, 2011 6:40 pm
> >>
> >> Subject: Re: hbase -0.90.x upgrade - zookeeper exception in mapreduce
> job
> >>
> >>
> >>
> >>
> >>
> >> Yes there are a few places like that. Also when you create new
> >>
> >>
> >>
> >> HTables, you should also close their connections (this is not done in
> >>
> >>
> >>
> >> htable.close).
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> See HTable's javadoc which says:
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> Instances of HTable passed the same Configuration instance will share
> >>
> >>
> >>
> >> connections to servers out on the cluster and to the zookeeper
> >>
> >>
> >>
> >> ensemble as well as caches of region locations. This is usually a
> >>
> >>
> >>
> >> *good* thing. This happens because they will all share the same
> >>
> >>
> >>
> >> underlying HConnection instance. See HConnectionManager for more on
> >>
> >>
> >>
> >> how this mechanism works.
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> and it points to HCM which has more information:
> >>
> >>
> >>
> >>
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HConnectionManager.html
> >>
> >>l
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> J-D
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Tue, Apr 12, 2011 at 3:09 PM, Ruben Quintero <rfq_dev@yahoo.com>
> wrote:
> >>
> >>
> >>
> >>> I'm running into the same issue, but did some poking around and it
> seems
> that
> >>
> >>
> >>
> >>> Zookeeper connections are being left open by an HBase internal.
> >>
> >>
> >>
> >>>
> >>
> >>
> >>
> >>> Basically, I'm running a mapreduce job within another program, and
> noticed
> in
> >>
> >>
> >>
> >>> the logs that every time the job is run, a connection is open, but I
> never
> > see
> >>
> >>
> >>
> >>> it closed again. The connection is opened within the job.submit().
> >>
> >>
> >>
> >>>
> >>
> >>
> >>
> >>> I looked closer and checked the jstack after running it for just under
> an
> >>
> >>
> >>
> >> hour,
> >>
> >>
> >>
> >>> and sure enough there are a ton of Zookeeper threads just sitting
> there.
> >>
> >>
> >>
> >> Here's
> >>
> >>
> >>
> >>> a pastebin link: http://pastebin.com/MccEuvrc
> >>
> >>
> >>
> >>>
> >>
> >>
> >>
> >>> I'm running 0.90.0 right now.
> >>
> >>
> >>
> >>>
> >>
> >>
> >>
> >>> - Ruben
> >>
> >>
> >>
> >>>
> >>
> >>
> >>
> >>>
> >>
> >>
> >>
> >>>
> >>
> >>
> >>
> >>>
> >>
> >>
> >>
> >>>
> >>
> >>
> >>
> >>>
> >>
> >>
> >>
> >>> ________________________________
> >>
> >>
> >>
> >>> From: Jean-Daniel Cryans <jdcryans@apache.org>
> >>
> >>
> >>
> >>> To: user@hbase.apache.org
> >>
> >>
> >>
> >>> Sent: Tue, April 12, 2011 4:23:05 PM
> >>
> >>
> >>
> >>> Subject: Re: hbase -0.90.x upgrade - zookeeper exception in mapreduce
> job
> >>
> >>
> >>
> >>>
> >>
> >>
> >>
> >>> It's more in the vain of
> >>
> >>
> >>
> >>> https://issues.apache.org/jira/browse/HBASE-3755 and
> >>
> >>
> >>
> >>> https://issues.apache.org/jira/browse/HBASE-3771
> >>
> >>
> >>
> >>>
> >>
> >>
> >>
> >>> Basically 0.90 has a regression regarding the handling of zookeeper
> >>
> >>
> >>
> >>> connections that make it that you have to be super careful not to have
> >>
> >>
> >>
> >>> more than 30 per machine (each new Configuration is one new ZK
> >>
> >>
> >>
> >>> connection). Upping your zookeeper max connection config should get
> >>
> >>
> >>
> >>> rid of your issues since you only get it occasionally.
> >>
> >>
> >>
> >>>
> >>
> >>
> >>
> >>> J-D
> >>
> >>
> >>
> >>>
> >>
> >>
> >>
> >>> On Tue, Apr 12, 2011 at 7:59 AM, Venkatesh <vramanathan00@aol.com>
> wrote:
> >>
> >>
> >>
> >>>>
> >>
> >>
> >>
> >>>>  I get this occasionally..(not all the time)..Upgrading from 0.20.6
to
> > 0.90.2
> >>
> >>
> >>
> >>>> Is this issue same as this JIRA
> >>
> >>
> >>
> >>>> https://issues.apache.org/jira/browse/HBASE-3578
> >>
> >>
> >>
> >>>>
> >>
> >>
> >>
> >>>> I'm using HBaseConfiguration.create() & setting that in job
> >>
> >>
> >>
> >>>> thx
> >>
> >>
> >>
> >>>> v
> >>
> >>
> >>
> >>>>
> >>
> >>
> >>
> >>>>
> >>
> >>
> >>
> >>>>  2011-04-12 02:13:06,870 ERROR Timer-0
> >>
> >>
> >>
> >>>>org.apache.hadoop.hbase.mapreduce.TableInputFormat -
> >>
> >>
> >>
> >>>>org.apache.hadoop.hbase.ZooKeeperConnectionException:
> >>
> >>
> >>
> >>>>org.apache.hadoop.hbase.ZooKeeperConnectionException:
> >>
> >>
> >>
> >>>>org.apache.zookeeper.KeeperException$ConnectionLossException:
> KeeperErrorCode
> >>
> >>
> >>
> >> =
> >>
> >>
> >>
> >>>>ConnectionLoss for /hbase        at
> >>
> >>
> >>
>
> >>>>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getZooKeeperWatcher(HConnectionManager.java:1000)
> >>>>
> >>>>
> >>
> >>>>
> >>
> >>
> >>
> >>>>
> >>
> >>
> >>
> >>>>        at
> >>
> >>
> >>
>
> >>>>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:303)
> >>>>
> >>>>
> >>
> >>>>
> >>
> >>
> >>
> >>>>
> >>
> >>
> >>
> >>>>        at
> >>
> >>
> >>
>
> >>>>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.<init>(HConnectionManager.java:294)
> >>>>
> >>>>
> >>
> >>>>
> >>
> >>
> >>
> >>>>
> >>
> >>
> >>
> >>>>        at
> >>
> >>
> >>
>
> >>>>org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:156)
> >>>>
> >>>>
> >>
> >>>>
> >>
> >>
> >>
> >>>>        at
> org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:167)
> >>
> >>
> >>
> >>>>        at
> org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:145)
> >>
> >>
> >>
> >>>>        at
> >>
> >>
> >>
>
> >>>>org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(TableInputFormat.java:91)
> >>>>
> >>>>
> >>
> >>>>
> >>
> >>
> >>
> >>>>
> >>
> >>
> >>
> >>>>        at
> >>
> >>
> >>
> >>>>org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
> >>
> >>
> >>
> >>>>        at
> >>
> >>
> >>
>
> >>>>org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
> >>
> >>
> >>
> >>>>        at
> >>
> >>
> >>
> >>> org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:882)
> >>
> >>
> >>
> >>>>        at
> >>
> >>
> >>
>
> >>>>org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779)
> >>
> >>
> >>
> >>>>        at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
> >>
> >>
> >>
> >>>>        at
> org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:448)
> >>
> >>
> >>
> >>>>
> >>
> >>
> >>
> >>>>
> >>
> >>
> >>
> >>>>
> >>
> >>
> >>
> >>>>
> >>
> >>
> >>
> >>>
> >>
> >>
> >>
> >>>
> >>
> >>
> >>
> >>>
> >>
> >>
> >>
> >>>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >
> >
> >
> >
>
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message