accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Fuchs <afu...@apache.org>
Subject Re: Using Accumulo as input to a MapReduce job frequently hangs due to lost Zookeeper connection
Date Thu, 16 Aug 2012 13:32:55 GMT
That was going to be my suggestion as well, except the zookeeper property
is maxclientcnxns.

Cheers,
Adam
On Aug 16, 2012 7:22 AM, "Jim Klucar" <klucar@gmail.com> wrote:

> Just shooting from the hip here.
>
> Zookeeper maxclientcxns in zoo.cfg should be increased from the default to
> something like 100. Check the zookeeper log file to see if it is shutting
> down connections.
>
> Check your what your max open files setting is for your OS with 'ulimit
> -n' and increase it if necessary.
>
>
>
>
>
> Sent from my iPhone
>
> On Aug 16, 2012, at 4:00 AM, Arjumand Bonhomme <jumand@gmail.com> wrote:
>
> Hello,
>
> I'm fairly new to both Accumulo and Hadoop, so I think my problem may be
> due to poor configuration on my part, but I'm running out of ideas.
>
> I'm running this on a mac laptop, with hadoop (hadoop-0.20.2 from cdh3u4)
> in pseudo-distributed mode.
> zookeeper version zookeeper-3.3.5 from cdh3u4
> I'm using the 1.4.1 release of accumulo with a configuration copied from
> "conf/examples/512MB/standalone"
>
> I've got a Map task that is using an accumulo table as the input.
> I'm fetching all rows, but just a single column family, that has hundreds
> or even thousands of different column qualifiers.
> The table has a SummingCombiner installed for the given the column family.
>
> The task runs fine at first, but after ~9-15K records (I print the record
> count to the console every 1K records), it hangs and the following messages
> are printed to the console where I'm running the job:
> 12/08/16 02:57:08 INFO zookeeper.ClientCnxn: Unable to read additional
> data from server sessionid 0x1392cc35b460d1c, likely server has closed
> socket, closing socket connection and attempting reconnect
> 12/08/16 02:57:08 INFO zookeeper.ClientCnxn: Opening socket connection to
> server localhost/fe80:0:0:0:0:0:0:1%1:2181
> 12/08/16 02:57:08 INFO zookeeper.ClientCnxn: Socket connection established
> to localhost/fe80:0:0:0:0:0:0:1%1:2181, initiating session
> 12/08/16 02:57:08 INFO zookeeper.ClientCnxn: Unable to reconnect to
> ZooKeeper service, session 0x1392cc35b460d1c has expired, closing socket
> connection
> 12/08/16 02:57:08 INFO zookeeper.ClientCnxn: EventThread shut down
> 12/08/16 02:57:10 INFO zookeeper.ZooKeeper: Initiating client connection,
> connectString=localhost sessionTimeout=30000
> watcher=org.apache.accumulo.core.zookeeper.ZooSession$AccumuloWatcher@32f5c51c
> 12/08/16 02:57:10 INFO zookeeper.ClientCnxn: Opening socket connection to
> server localhost/0:0:0:0:0:0:0:1:2181
> 12/08/16 02:57:10 INFO zookeeper.ClientCnxn: Socket connection established
> to localhost/0:0:0:0:0:0:0:1:2181, initiating session
> 12/08/16 02:57:10 INFO zookeeper.ClientCnxn: Session establishment
> complete on server localhost/0:0:0:0:0:0:0:1:2181, sessionid =
> 0x1392cc35b460d25, negotiated timeout = 30000
> 12/08/16 02:57:11 INFO mapred.LocalJobRunner:
> 12/08/16 02:57:14 INFO mapred.LocalJobRunner:
> 12/08/16 02:57:17 INFO mapred.LocalJobRunner:
>
> Sometimes the messages contain a stacktrace like this below:
> 12/08/16 01:57:40 WARN zookeeper.ClientCnxn: Session 0x1392cc35b460b40 for
> server localhost/fe80:0:0:0:0:0:0:1%1:2181, unexpected error, closing
> socket connection and attempting reconnect
> java.io.IOException: Connection reset by peer
>  at sun.nio.ch.FileDispatcher.read0(Native Method)
> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
>  at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)
> at sun.nio.ch.IOUtil.read(IOUtil.java:166)
>  at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:245)
> at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:856)
>  at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1154)
> 12/08/16 01:57:40 INFO zookeeper.ClientCnxn: Opening socket connection to
> server localhost/127.0.0.1:2181
> 12/08/16 01:57:40 INFO zookeeper.ClientCnxn: Socket connection established
> to localhost/127.0.0.1:2181, initiating session
> 12/08/16 01:57:40 INFO zookeeper.ClientCnxn: Unable to reconnect to
> ZooKeeper service, session 0x1392cc35b460b40 has expired, closing socket
> connection
> 12/08/16 01:57:40 INFO zookeeper.ClientCnxn: EventThread shut down
> 12/08/16 01:57:41 INFO zookeeper.ZooKeeper: Initiating client connection,
> connectString=localhost sessionTimeout=30000
> watcher=org.apache.accumulo.core.zookeeper.ZooSession$AccumuloWatcher@684a26e8
> 12/08/16 01:57:41 INFO zookeeper.ClientCnxn: Opening socket connection to
> server localhost/fe80:0:0:0:0:0:0:1%1:2181
> 12/08/16 01:57:41 INFO zookeeper.ClientCnxn: Socket connection established
> to localhost/fe80:0:0:0:0:0:0:1%1:2181, initiating session
> 12/08/16 01:57:41 INFO zookeeper.ClientCnxn: Session establishment
> complete on server localhost/fe80:0:0:0:0:0:0:1%1:2181, sessionid =
> 0x1392cc35b460b46, negotiated timeout = 30000
>
>
> I've poked through the logs in accumulo, and I've noticed that when it
> hangs, the following is written to the "logger_HOSTNAME.debug.log" file:
> 16 03:29:46,332 [logger.LogService] DEBUG: event null None Disconnected
> 16 03:29:47,248 [zookeeper.ZooSession] DEBUG: Session expired, state of
> current session : Expired
> 16 03:29:47,248 [logger.LogService] DEBUG: event null None Expired
> 16 03:29:47,249 [logger.LogService] WARN : Logger lost zookeeper
> registration at null
> 16 03:29:47,452 [logger.LogService] INFO : Logger shutting down
> 16 03:29:47,453 [logger.LogWriter] INFO : Shutting down
>
>
> I've noticed that if I make the map task print out the record count more
> frequently (ie every 10 records), it seems to be able get through more
> records than when I only print every 1K records. My assumption was that
> this had something to do with more time being spent in the map task, and
> not fetching data from accumulo.  There was at least one occasion where I
> printed to the console for every record, and in that situation it managed
> to process 47K records, although I have been unable to repeat that behavior.
>
> I've also noticed that if I stop and start accumulo, the map-reduce job
> will pickup where it left off, but seems to fail quicker.
>
>
>
> Could someone make some suggestions as to what my problem might be? It
> would be greatly appreciated.  If you need any additional information from
> me, just let me know.  I'd paste my config files, driver setup, and example
> data into this post, but I think it's probably long enough already.
>
>
> Thanks in advance,
> -Arjumand
>
>

Mime
View raw message