Return-Path: Delivered-To: apmail-hadoop-hbase-user-archive@minotaur.apache.org Received: (qmail 12206 invoked from network); 21 Feb 2009 18:56:06 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 21 Feb 2009 18:56:06 -0000 Received: (qmail 20114 invoked by uid 500); 21 Feb 2009 18:55:57 -0000 Delivered-To: apmail-hadoop-hbase-user-archive@hadoop.apache.org Received: (qmail 19101 invoked by uid 500); 21 Feb 2009 18:55:55 -0000 Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-user@hadoop.apache.org Delivered-To: mailing list hbase-user@hadoop.apache.org Received: (qmail 19033 invoked by uid 99); 21 Feb 2009 18:55:54 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 21 Feb 2009 10:55:54 -0800 X-ASF-Spam-Status: No, hits=3.7 required=10.0 tests=HTML_MESSAGE,NORMAL_HTTP_TO_IP,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of amansk@gmail.com designates 209.85.217.170 as permitted sender) Received: from [209.85.217.170] (HELO mail-gx0-f170.google.com) (209.85.217.170) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 21 Feb 2009 18:55:44 +0000 Received: by gxk18 with SMTP id 18so3740147gxk.5 for ; Sat, 21 Feb 2009 10:55:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=NLO3yFkJp8y/JmOTr9UZLIhlHbRH+Z74aK6VTKpmp/Q=; b=NarAOHgAJyrttcieERSvo0fXm82RsGl+siUB1gyMdOtTjbujRW8iNBMkiIBjNK5nr0 8oHEVlGjvifROriyoUHKBdLxFBzMOGFJYY55/nXs0H9+uxFu0UmkDti2YySud96+/cwb Qx8DVn88IUBDzKo90iSWw/+8UyrzM7KFHdAl0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=ViTxHr+tbCaaC+x2tifNQvSev0eKHFFUtKVSw68upcx1qnfgDg4K77VK6fz0IoMhEP WX+kfuIMDt6ndSt2qkbiUnMNT/deAmHloszpqW1+4MKODFwPZuWz5EDVZceVAjbRUuJ/ NfqSYp6PHWiyfxOxZvd4QMGZ6AGo927BUTJ9g= MIME-Version: 1.0 Received: by 10.231.16.199 with SMTP id p7mr3051532iba.40.1235242522522; Sat, 21 Feb 2009 10:55:22 -0800 (PST) In-Reply-To: <35a22e220902210317g70bafdb6wf18fabb56fcef960@mail.gmail.com> References: <35a22e220902202143u69afc33cq47eef604d7fc3021@mail.gmail.com> <35a22e220902202155m682a8e28j8aa85dc2ecb71f20@mail.gmail.com> <7c962aed0902210010x16bc9328i8ea9043b906f6598@mail.gmail.com> <35a22e220902210101x2b8dd7fdq92920e7947890dcb@mail.gmail.com> <7c962aed0902210114m549d2362o3c809a447fe9013@mail.gmail.com> <35a22e220902210214j4ea81516we9006696e4de03c8@mail.gmail.com> <78568af10902210216w5af3d97akc988baed9d3a7497@mail.gmail.com> <35a22e220902210221x72d909bas58b4e5b114ecfce4@mail.gmail.com> <78568af10902210308w52012b8fofb7f76f1e3c08376@mail.gmail.com> <35a22e220902210317g70bafdb6wf18fabb56fcef960@mail.gmail.com> Date: Sat, 21 Feb 2009 10:55:22 -0800 Message-ID: <35a22e220902211055m4f5518dg353e7e917577fd6c@mail.gmail.com> Subject: Re: Connection problem during data import into hbase From: Amandeep Khurana To: hbase-user@hadoop.apache.org, core-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=0022152d7f4dc436400463724fcd X-Virus-Checked: Checked by ClamAV on apache.org --0022152d7f4dc436400463724fcd Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Here's whats happening in the logs... I get these messages pretty often: 2009-02-21 10:47:27,252 INFO org.apache.hadoop.hdfs.DFSClient: Could not complete file /hbase/in_table/compaction.dir/29712919/b2b/mapfiles/6353513045069085254/data retrying... Sometimes I get these too: 2009-02-21 10:48:46,273 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 5 on 60020' on region in_table,,1235241411727: Memcache size 128.0m is >= than blocking 128.0m size Here's what it logs when the job starts to fail: 2009-02-21 10:50:52,510 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception: java.net.SocketTimeoutException: 5000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/171.69.102.51:8270 remote=/ 171.69.102.51:50010] at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:162) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107) at java.io.BufferedOutputStream.write(Unknown Source) at java.io.DataOutputStream.write(Unknown Source) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2209) 2009-02-21 10:50:52,511 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Finished memcache flush of ~64.0m for region in_table,,1235241411727 in 5144ms, sequence id=30842181, compaction requested=true 2009-02-21 10:50:52,511 DEBUG org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction requested for region in_table,,1235241411727/29712919 because: regionserver/0:0:0:0:0:0:0:0:60020.cacheFlusher 2009-02-21 10:50:52,512 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_-2896903198415069285_18306 bad datanode[0] 171.69.102.51:50010 2009-02-21 10:50:52,513 FATAL org.apache.hadoop.hbase.regionserver.LogRoller: Log rolling failed with ioe: java.io.IOException: All datanodes 171.69.102.51:50010 are bad. Aborting... at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2442) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:1997) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2160) 2009-02-21 10:50:52,513 FATAL org.apache.hadoop.hbase.regionserver.HLog: Could not append. Requesting close of log java.io.IOException: All datanodes 171.69.102.51:50010 are bad. Aborting... at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2442) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:1997) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2160) 2009-02-21 10:50:52,515 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: java.io.IOException: All datanodes 171.69.102.51:50010 are bad. Aborting... 2009-02-21 10:50:52,515 FATAL org.apache.hadoop.hbase.regionserver.HLog: Could not append. Requesting close of log java.io.IOException: All datanodes 171.69.102.51:50010 are bad. Aborting... at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2442) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:1997) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2160) 2009-02-21 10:50:52,515 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics: request=11, regions=2, stores=33, storefiles=167, storefileIndexSize=0, memcacheSize=1, usedHeap=156, maxHeap=963 2009-02-21 10:50:52,515 INFO org.apache.hadoop.hbase.regionserver.LogRoller: LogRoller exiting. 2009-02-21 10:50:52,516 FATAL org.apache.hadoop.hbase.regionserver.HLog: Could not append. Requesting close of log java.io.IOException: All datanodes 171.69.102.51:50010 are bad. Aborting... at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2442) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:1997) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2160) 2009-02-21 10:50:52,516 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: java.io.IOException: All datanodes 171.69.102.51:50010 are bad. Aborting... 2009-02-21 10:50:52,516 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: java.io.IOException: All datanodes 171.69.102.51:50010 are bad. Aborting... 2009-02-21 10:50:52,516 FATAL org.apache.hadoop.hbase.regionserver.HLog: Could not append. Requesting close of log java.io.IOException: All datanodes 171.69.102.51:50010 are bad. Aborting... at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2442) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:1997) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2160) 2009-02-21 10:50:52,516 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: java.io.IOException: All datanodes 171.69.102.51:50010 are bad. Aborting... 2009-02-21 10:50:52,517 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 9 on 60020, call batchUpdates([B@cfdbc2, [Lorg.apache.hadoop.hbase.io.BatchUpdate;@64c0d9) from 171.69.102.51:8468: error: java.io.IOException: All datanodes 171.69.102.51:50010 are bad. Aborting... java.io.IOException: All datanodes 171.69.102.51:50010 are bad. Aborting... at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2442) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:1997) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2160) 2009-02-21 10:50:52,517 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 1 on 60020, call batchUpdates([B@b0fc4d, [Lorg.apache.hadoop.hbase.io.BatchUpdate;@184425c) from 171.69.102.51:8469: error: java.io.IOException: All datanodes 171.69.102.51:50010 are bad. Aborting... java.io.IOException: All datanodes 171.69.102.51:50010 are bad. Aborting... at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2442) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:1997) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2160) 2009-02-21 10:50:52,518 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 0 on 60020, call batchUpdates([B@20a9de, [Lorg.apache.hadoop.hbase.io.BatchUpdate;@706d7c) from 171.69.102.52:9279: error: java.io.IOException: All datanodes 171.69.102.51:50010 are bad. Aborting... java.io.IOException: All datanodes 171.69.102.51:50010 are bad. Aborting... at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2442) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:1997) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2160) 2009-02-21 10:50:52,518 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 5 on 60020, call batchUpdates([B@1240afe, [Lorg.apache.hadoop.hbase.io.BatchUpdate;@14dc2e6) from 171.69.102.52:9280: error: java.io.IOException: All datanodes 171.69.102.51:50010 are bad. Aborting... java.io.IOException: All datanodes 171.69.102.51:50010 are bad. Aborting... at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2442) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:1997) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2160) 2009-02-21 10:50:53,049 DEBUG org.apache.hadoop.hbase.RegionHistorian: Offlined 2009-02-21 10:50:53,050 INFO org.apache.hadoop.ipc.HBaseServer: Stopping server on 60020 2009-02-21 10:50:53,050 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 0 on 60020: exiting 2009-02-21 10:50:53,050 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC Server listener on 60020 2009-02-21 10:50:53,051 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Stopping infoServer 2009-02-21 10:50:53,051 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC Server Responder 2009-02-21 10:50:53,052 INFO org.mortbay.util.ThreadedServer: Stopping Acceptor ServerSocket[addr=0.0.0.0/0.0.0.0,port=0,localport=60030] 2009-02-21 10:50:53,052 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 1 on 60020: exiting 2009-02-21 10:50:53,052 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 3 on 60020: exiting 2009-02-21 10:50:53,052 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 5 on 60020: exiting 2009-02-21 10:50:53,052 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 7 on 60020: exiting 2009-02-21 10:50:53,052 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 9 on 60020: exiting 2009-02-21 10:50:53,052 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 2 on 60020: exiting 2009-02-21 10:50:53,053 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 4 on 60020: exiting 2009-02-21 10:50:53,053 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 6 on 60020: exiting 2009-02-21 10:50:53,053 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 8 on 60020: exiting 2009-02-21 10:50:54,503 INFO org.mortbay.http.SocketListener: Stopped SocketListener on 0.0.0.0:60030 2009-02-21 10:50:54,670 INFO org.mortbay.util.Container: Stopped HttpContext[/logs,/logs] 2009-02-21 10:50:54,671 INFO org.mortbay.util.Container: Stopped org.mortbay.jetty.servlet.WebApplicationHandler@c3c315 2009-02-21 10:50:54,771 INFO org.mortbay.util.Container: Stopped WebApplicationContext[/static,/static] 2009-02-21 10:50:54,772 INFO org.mortbay.util.Container: Stopped org.mortbay.jetty.servlet.WebApplicationHandler@aae86e 2009-02-21 10:50:54,893 INFO org.mortbay.util.Container: Stopped WebApplicationContext[/,/] 2009-02-21 10:50:54,893 INFO org.mortbay.util.Container: Stopped org.mortbay.jetty.Server@1f3ce5c 2009-02-21 10:50:54,893 DEBUG org.apache.hadoop.hbase.regionserver.HLog: closing log writer in hdfs://rndpc0:9000/hbase/log_171.69.102.51_1235215460389_60020 2009-02-21 10:50:54,893 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to close log in abort java.io.IOException: All datanodes 171.69.102.51:50010 are bad. Aborting... at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2442) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:1997) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2160) 2009-02-21 10:50:54,893 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: closing region in_table,,1235241411727 2009-02-21 10:50:54,893 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Closing in_table,,1235241411727: compactions & flushes disabled 2009-02-21 10:50:54,893 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: waiting for compaction to complete for region in_table,,1235241411727 2009-02-21 10:50:54,893 INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: regionserver/0:0:0:0:0:0:0:0:60020.cacheFlusher exiting 2009-02-21 10:50:54,894 INFO org.apache.hadoop.hbase.regionserver.LogFlusher: regionserver/0:0:0:0:0:0:0:0:60020.logFlusher exiting 2009-02-21 10:50:54,894 INFO org.apache.hadoop.hbase.regionserver.HRegionServer$MajorCompactionChecker: regionserver/0:0:0:0:0:0:0:0:60020.majorCompactionChecker exiting 2009-02-21 10:50:57,251 INFO org.apache.hadoop.hbase.Leases: regionserver/0:0:0:0:0:0:0:0:60020.leaseChecker closing leases2009-02-21 10:50:57,251 INFO org.apache.hadoop.hbase.Leases: regionserver/0:0:0:0:0:0:0:0:60020.leaseChecker closed leases 2009-02-21 10:51:01,199 DEBUG org.apache.hadoop.hbase.regionserver.HStore: moving /hbase/in_table/compaction.dir/29712919/tac_product_hw_key/mapfiles/6810647399799866363 to /hbase/in_table/29712919/tac_product_hw_key/mapfiles/4524637696729699312 2009-02-21 10:51:01,943 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: worker thread exiting 2009-02-21 10:51:02,587 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Completed compaction of 29712919/tac_product_hw_key store size is 17.0m 2009-02-21 10:51:02,828 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Compaction size of 29712919/summary: 23.9m; Skipped 1 file(s), size: 10737371 2009-02-21 10:51:03,896 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Started compaction of 7 file(s) into /hbase/in_table/compaction.dir/29712919/summary/mapfiles/3985892544092790173 and then followed by: 2009-02-21 10:50:54,503 INFO org.mortbay.http.SocketListener: Stopped SocketListener on 0.0.0.0:60030 2009-02-21 10:50:54,670 INFO org.mortbay.util.Container: Stopped HttpContext[/logs,/logs] 2009-02-21 10:50:54,671 INFO org.mortbay.util.Container: Stopped org.mortbay.jetty.servlet.WebApplicationHandler@c3c315 2009-02-21 10:50:54,771 INFO org.mortbay.util.Container: Stopped WebApplicationContext[/static,/static] 2009-02-21 10:50:54,772 INFO org.mortbay.util.Container: Stopped org.mortbay.jetty.servlet.WebApplicationHandler@aae86e 2009-02-21 10:50:54,893 INFO org.mortbay.util.Container: Stopped WebApplicationContext[/,/] 2009-02-21 10:50:54,893 INFO org.mortbay.util.Container: Stopped org.mortbay.jetty.Server@1f3ce5c 2009-02-21 10:50:54,893 DEBUG org.apache.hadoop.hbase.regionserver.HLog: closing log writer in hdfs://rndpc0:9000/hbase/log_171.69.102.51_1235215460389_60020 2009-02-21 10:50:54,893 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to close log in abort java.io.IOException: All datanodes 171.69.102.51:50010 are bad. Aborting... at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2442) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:1997) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2160) Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz On Sat, Feb 21, 2009 at 3:17 AM, Amandeep Khurana wrote: > I changed the config and restarted the cluster. This time the job went upto > 16% and the same problem started. > > I'll do the stuff with the logs now and see what comes out. > > > Amandeep Khurana > Computer Science Graduate Student > University of California, Santa Cruz > > > On Sat, Feb 21, 2009 at 3:08 AM, Ryan Rawson wrote: > >> you have to change hadoop-site.xml and restart HDFS. >> >> you should also change the logging to be more verbose in hbase - check out >> the hbase FAQ (link missing -ed). >> >> if you get the problem again, peruse the hbase logs and post what is going >> on there. the client errors dont really include the root cause on the >> regionserver side. >> >> good luck, >> -ryan >> >> >> On Sat, Feb 21, 2009 at 2:21 AM, Amandeep Khurana >> wrote: >> >> > I have 1 master + 2 slaves. I did set the timout to zero. I'll set the >> > xceivers to 2047 and try again. Can this be done in the job config or >> does >> > the site.xml need to be changed and the cluster restarted? >> > >> > Amandeep >> > >> > >> > Amandeep Khurana >> > Computer Science Graduate Student >> > University of California, Santa Cruz >> > >> > >> > On Sat, Feb 21, 2009 at 2:16 AM, Ryan Rawson >> wrote: >> > >> > > So the usual suspects are: >> > > >> > > - xcievers (i hvae mine set to 2047) >> > > - timeout (i have mine set to 0) >> > > >> > > I can import a few hundred million records with these settings. >> > > >> > > how many nodes do you have again? >> > > >> > > On Sat, Feb 21, 2009 at 2:14 AM, Amandeep Khurana >> > > wrote: >> > > >> > > > Yes, I noticed it this time. The regionserver gets slow or stops >> > > responding >> > > > and then this error comes. How do I get this to work? Is there a way >> of >> > > > limiting the resources that the map red job should take? >> > > > >> > > > I did make the changes in the config site similar to Larry Comton's >> > > config. >> > > > It only made the job go from dying at 7% to 12% this time. >> > > > >> > > > Amandeep >> > > > >> > > > >> > > > Amandeep Khurana >> > > > Computer Science Graduate Student >> > > > University of California, Santa Cruz >> > > > >> > > > >> > > > On Sat, Feb 21, 2009 at 1:14 AM, stack wrote: >> > > > >> > > > > It looks like regionserver hosting root crashed: >> > > > > >> > > > > org.apache.hadoop.hbase.client.NoServerForRegionException: Timed >> out >> > > > trying >> > > > > to locate root region >> > > > > >> > > > > How many servers you running? >> > > > > >> > > > > You made similar config. to that reported by Larry Compton in a >> mail >> > > from >> > > > > earlier today? (See FAQ and Troubleshooting page for more on his >> > > listed >> > > > > configs.) >> > > > > >> > > > > St.Ack >> > > > > >> > > > > >> > > > > On Sat, Feb 21, 2009 at 1:01 AM, Amandeep Khurana < >> amansk@gmail.com> >> > > > > wrote: >> > > > > >> > > > > > Yes, the table exists before I start the job. >> > > > > > >> > > > > > I am not using TableOutputFormat. I picked up the sample code >> from >> > > the >> > > > > docs >> > > > > > and am using it. >> > > > > > >> > > > > > Here's the job conf: >> > > > > > >> > > > > > JobConf conf = new JobConf(getConf(), IN_TABLE_IMPORT.class); >> > > > > > FileInputFormat.setInputPaths(conf, new >> > Path("import_data")); >> > > > > > conf.setMapperClass(MapClass.class); >> > > > > > conf.setNumReduceTasks(0); >> > > > > > conf.setOutputFormat(NullOutputFormat.class); >> > > > > > JobClient.runJob(conf); >> > > > > > >> > > > > > Interestingly, the hbase shell isnt working now either. Its >> giving >> > > > errors >> > > > > > even when I give the command "list"... >> > > > > > >> > > > > > >> > > > > > >> > > > > > Amandeep Khurana >> > > > > > Computer Science Graduate Student >> > > > > > University of California, Santa Cruz >> > > > > > >> > > > > > >> > > > > > On Sat, Feb 21, 2009 at 12:10 AM, stack >> wrote: >> > > > > > >> > > > > > > The table exists before you start the MR job? >> > > > > > > >> > > > > > > When you say 'midway through the job', are you using >> > > > tableoutputformat >> > > > > to >> > > > > > > insert into your table? >> > > > > > > >> > > > > > > Which version of hbase? >> > > > > > > >> > > > > > > St.Ack >> > > > > > > >> > > > > > > On Fri, Feb 20, 2009 at 9:55 PM, Amandeep Khurana < >> > > amansk@gmail.com> >> > > > > > > wrote: >> > > > > > > >> > > > > > > > I dont know if this is related or not, but it seems to be. >> > After >> > > > this >> > > > > > map >> > > > > > > > reduce job, I tried to count the number of entries in the >> table >> > > in >> > > > > > hbase >> > > > > > > > through the shell. It failed with the following error: >> > > > > > > > >> > > > > > > > hbase(main):002:0> count 'in_table' >> > > > > > > > NativeException: java.lang.NullPointerException: null >> > > > > > > > from java.lang.String:-1:in `' >> > > > > > > > from org/apache/hadoop/hbase/util/Bytes.java:92:in >> > `toString' >> > > > > > > > from >> > > > > > > >> > org/apache/hadoop/hbase/client/RetriesExhaustedException.java:50:in >> > > > > > > > `getMessage' >> > > > > > > > from >> > > > > > > >> > org/apache/hadoop/hbase/client/RetriesExhaustedException.java:40:in >> > > > > > > > `' >> > > > > > > > from >> > > > org/apache/hadoop/hbase/client/HConnectionManager.java:841:in >> > > > > > > > `getRegionServerWithRetries' >> > > > > > > > from >> org/apache/hadoop/hbase/client/MetaScanner.java:56:in >> > > > > > `metaScan' >> > > > > > > > from >> org/apache/hadoop/hbase/client/MetaScanner.java:30:in >> > > > > > `metaScan' >> > > > > > > > from >> > > > org/apache/hadoop/hbase/client/HConnectionManager.java:411:in >> > > > > > > > `getHTableDescriptor' >> > > > > > > > from org/apache/hadoop/hbase/client/HTable.java:219:in >> > > > > > > > `getTableDescriptor' >> > > > > > > > from sun.reflect.NativeMethodAccessorImpl:-2:in `invoke0' >> > > > > > > > from sun.reflect.NativeMethodAccessorImpl:-1:in `invoke' >> > > > > > > > from sun.reflect.DelegatingMethodAccessorImpl:-1:in >> `invoke' >> > > > > > > > from java.lang.reflect.Method:-1:in `invoke' >> > > > > > > > from org/jruby/javasupport/JavaMethod.java:250:in >> > > > > > > > `invokeWithExceptionHandling' >> > > > > > > > from org/jruby/javasupport/JavaMethod.java:219:in >> `invoke' >> > > > > > > > from org/jruby/javasupport/JavaClass.java:416:in >> `execute' >> > > > > > > > ... 145 levels... >> > > > > > > > from >> > > org/jruby/internal/runtime/methods/DynamicMethod.java:74:in >> > > > > > > `call' >> > > > > > > > from >> > > > org/jruby/internal/runtime/methods/CompiledMethod.java:48:in >> > > > > > > `call' >> > > > > > > > from org/jruby/runtime/CallSite.java:123:in >> `cacheAndCall' >> > > > > > > > from org/jruby/runtime/CallSite.java:298:in `call' >> > > > > > > > from >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> ruby/hadoop/install/hbase_minus_0_dot_19_dot_0/bin//hadoop/install/hbase/bin/../bin/hirb.rb:429:in >> > > > > > > > `__file__' >> > > > > > > > from >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> ruby/hadoop/install/hbase_minus_0_dot_19_dot_0/bin//hadoop/install/hbase/bin/../bin/hirb.rb:-1:in >> > > > > > > > `__file__' >> > > > > > > > from >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> ruby/hadoop/install/hbase_minus_0_dot_19_dot_0/bin//hadoop/install/hbase/bin/../bin/hirb.rb:-1:in >> > > > > > > > `load' >> > > > > > > > from org/jruby/Ruby.java:512:in `runScript' >> > > > > > > > from org/jruby/Ruby.java:432:in `runNormally' >> > > > > > > > from org/jruby/Ruby.java:312:in `runFromMain' >> > > > > > > > from org/jruby/Main.java:144:in `run' >> > > > > > > > from org/jruby/Main.java:89:in `run' >> > > > > > > > from org/jruby/Main.java:80:in `main' >> > > > > > > > from /hadoop/install/hbase/bin/../bin/HBase.rb:444:in >> > `count' >> > > > > > > > from /hadoop/install/hbase/bin/../bin/hirb.rb:348:in >> `count' >> > > > > > > > from (hbase):3:in `binding' >> > > > > > > > >> > > > > > > > >> > > > > > > > Amandeep Khurana >> > > > > > > > Computer Science Graduate Student >> > > > > > > > University of California, Santa Cruz >> > > > > > > > >> > > > > > > > >> > > > > > > > On Fri, Feb 20, 2009 at 9:46 PM, Amandeep Khurana < >> > > > amansk@gmail.com> >> > > > > > > > wrote: >> > > > > > > > >> > > > > > > > > Here's what it throws on the console: >> > > > > > > > > >> > > > > > > > > 09/02/20 21:45:29 INFO mapred.JobClient: Task Id : >> > > > > > > > > attempt_200902201300_0019_m_000006_0, Status : FAILED >> > > > > > > > > java.io.IOException: table is null >> > > > > > > > > at >> > > IN_TABLE_IMPORT$MapClass.map(IN_TABLE_IMPORT.java:33) >> > > > > > > > > at >> > IN_TABLE_IMPORT$MapClass.map(IN_TABLE_IMPORT.java:1) >> > > > > > > > > at >> > > > > org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) >> > > > > > > > > at >> > > org.apache.hadoop.mapred.MapTask.run(MapTask.java:332) >> > > > > > > > > at >> > org.apache.hadoop.mapred.Child.main(Child.java:155) >> > > > > > > > > >> > > > > > > > > attempt_200902201300_0019_m_000006_0: >> > > > > > > > > org.apache.hadoop.hbase.client.NoServerForRegionException: >> > > Timed >> > > > > out >> > > > > > > > trying >> > > > > > > > > to locate root region >> > > > > > > > > attempt_200902201300_0019_m_000006_0: at >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:768) >> > > > > > > > > attempt_200902201300_0019_m_000006_0: at >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:448) >> > > > > > > > > attempt_200902201300_0019_m_000006_0: at >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:430) >> > > > > > > > > attempt_200902201300_0019_m_000006_0: at >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:557) >> > > > > > > > > attempt_200902201300_0019_m_000006_0: at >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:457) >> > > > > > > > > attempt_200902201300_0019_m_000006_0: at >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:430) >> > > > > > > > > attempt_200902201300_0019_m_000006_0: at >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:557) >> > > > > > > > > attempt_200902201300_0019_m_000006_0: at >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:461) >> > > > > > > > > attempt_200902201300_0019_m_000006_0: at >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:423) >> > > > > > > > > attempt_200902201300_0019_m_000006_0: at >> > > > > > > > > >> org.apache.hadoop.hbase.client.HTable.(HTable.java:114) >> > > > > > > > > attempt_200902201300_0019_m_000006_0: at >> > > > > > > > > >> org.apache.hadoop.hbase.client.HTable.(HTable.java:97) >> > > > > > > > > attempt_200902201300_0019_m_000006_0: at >> > > > > > > > > >> IN_TABLE_IMPORT$MapClass.configure(IN_TABLE_IMPORT.java:120) >> > > > > > > > > attempt_200902201300_0019_m_000006_0: at >> > > > > > > > > >> > > > > > >> > > >> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58) >> > > > > > > > > attempt_200902201300_0019_m_000006_0: at >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:83) >> > > > > > > > > attempt_200902201300_0019_m_000006_0: at >> > > > > > > > > >> > org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34) >> > > > > > > > > attempt_200902201300_0019_m_000006_0: at >> > > > > > > > > >> > > > > > >> > > >> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58) >> > > > > > > > > attempt_200902201300_0019_m_000006_0: at >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:83) >> > > > > > > > > attempt_200902201300_0019_m_000006_0: at >> > > > > > > > > org.apache.hadoop.mapred.MapTask.run(MapTask.java:328) >> > > > > > > > > attempt_200902201300_0019_m_000006_0: at >> > > > > > > > > org.apache.hadoop.mapred.Child.main(Child.java:155) >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > Amandeep Khurana >> > > > > > > > > Computer Science Graduate Student >> > > > > > > > > University of California, Santa Cruz >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > On Fri, Feb 20, 2009 at 9:43 PM, Amandeep Khurana < >> > > > > amansk@gmail.com >> > > > > > > > >wrote: >> > > > > > > > > >> > > > > > > > >> I am trying to import data from a flat file into Hbase >> using >> > a >> > > > Map >> > > > > > > > Reduce >> > > > > > > > >> job. There are close to 2 million rows. Mid way into the >> > job, >> > > it >> > > > > > > starts >> > > > > > > > >> giving me connection problems and eventually kills the >> job. >> > > When >> > > > > the >> > > > > > > > error >> > > > > > > > >> comes, the hbase shell also stops working. >> > > > > > > > >> >> > > > > > > > >> This is what I get: >> > > > > > > > >> >> > > > > > > > >> 2009-02-20 21:37:14,407 INFO >> > org.apache.hadoop.ipc.HBaseClass: >> > > > > > > Retrying >> > > > > > > > connect to server: /171.69.102.52:60020. Already tried 0 >> > > time(s). >> > > > > > > > >> >> > > > > > > > >> What could be going wrong? >> > > > > > > > >> >> > > > > > > > >> Amandeep >> > > > > > > > >> >> > > > > > > > >> >> > > > > > > > >> Amandeep Khurana >> > > > > > > > >> Computer Science Graduate Student >> > > > > > > > >> University of California, Santa Cruz >> > > > > > > > >> >> > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> > > --0022152d7f4dc436400463724fcd--