Return-Path: Delivered-To: apmail-hadoop-hbase-dev-archive@locus.apache.org Received: (qmail 52351 invoked from network); 21 Dec 2008 01:05:14 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 21 Dec 2008 01:05:14 -0000 Received: (qmail 79327 invoked by uid 500); 21 Dec 2008 01:05:14 -0000 Delivered-To: apmail-hadoop-hbase-dev-archive@hadoop.apache.org Received: (qmail 79318 invoked by uid 500); 21 Dec 2008 01:05:14 -0000 Mailing-List: contact hbase-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-dev@hadoop.apache.org Delivered-To: mailing list hbase-dev@hadoop.apache.org Received: (qmail 79307 invoked by uid 99); 21 Dec 2008 01:05:13 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 20 Dec 2008 17:05:13 -0800 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 21 Dec 2008 01:05:05 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 6F6AB234C403 for ; Sat, 20 Dec 2008 17:04:44 -0800 (PST) Message-ID: <107076199.1229821484454.JavaMail.jira@brutus> Date: Sat, 20 Dec 2008 17:04:44 -0800 (PST) From: "Jim Kellerman (JIRA)" To: hbase-dev@hadoop.apache.org Subject: [jira] Reopened: (HBASE-1052) Stopping a HRegionServer with unflushed cache causes data loss from org.apache.hadoop.hbase.DroppedSnapshotException In-Reply-To: <538440540.1228842164296.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HBASE-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman reopened HBASE-1052: ---------------------------------- Reopening. This fix makes tests (TestRegionRebalancing) fail when there are multiple region servers and one is stopped "nicely" > Stopping a HRegionServer with unflushed cache causes data loss from org.apache.hadoop.hbase.DroppedSnapshotException > --------------------------------------------------------------------------------------------------------------------- > > Key: HBASE-1052 > URL: https://issues.apache.org/jira/browse/HBASE-1052 > Project: Hadoop HBase > Issue Type: Bug > Components: regionserver > Affects Versions: 0.18.0, 0.18.1 > Reporter: Cosmin Lehene > Assignee: Jim Kellerman > Priority: Critical > Fix For: 0.19.0, 0.18.2 > > > 1. Start a Hbase cluster > 2. Create a table t1: create 't1', {NAME => 'f1'} > 3. Put a cell in the table: put 't1', 'r1', 'f1:', 'value' > 4. Scan it, see it's fine > 5. Stop the HRegionSever hosting the t1 region: hbase/bin/hbase-daemon.sh stop regionserver. > 6. Watch the region being reassigned from the original HRegionServer > 7. Scan the t1 table again. It's empty now. > If between step 4 and step 5 the cache is flushed (e.g. Hbase cluster restart) no data is loss. However it means that if you stop a region server with dirty cache you will loose some data. > HRegionServer log after issuing hbase-daemon.sh stop regionserver: > 2008-12-09 06:37:46,873 INFO org.apache.hadoop.ipc.Server: IPC Server handler 7 on 60020: exiting > 2008-12-09 06:37:46,873 INFO org.apache.hadoop.ipc.Server: IPC Server handler 8 on 60020: exiting > 2008-12-09 06:37:46,873 INFO org.apache.hadoop.ipc.Server: IPC Server handler 9 on 60020: exiting > 2008-12-09 06:37:46,873 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 60020: exiting > 2008-12-09 06:37:46,874 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Stopping infoServer > 2008-12-09 06:37:46,874 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder > 2008-12-09 06:37:46,874 INFO org.mortbay.util.ThreadedServer: Stopping Acceptor ServerSocket[addr=0.0.0.0/0.0.0.0,port=0,localport=60030] > 2008-12-09 06:37:46,886 INFO org.mortbay.http.SocketListener: Stopped SocketListener on 0.0.0.0:60030 > 2008-12-09 06:37:46,948 INFO org.mortbay.util.Container: Stopped HttpContext[/static,/static] > 2008-12-09 06:37:47,007 INFO org.mortbay.util.Container: Stopped HttpContext[/logs,/logs] > 2008-12-09 06:37:47,007 INFO org.mortbay.util.Container: Stopped org.mortbay.jetty.servlet.WebApplicationHandler@60ded0f0 > 2008-12-09 06:37:47,094 INFO org.mortbay.util.Container: Stopped WebApplicationContext[/,/] > 2008-12-09 06:37:47,094 INFO org.mortbay.util.Container: Stopped org.mortbay.jetty.Server@6490832e > 2008-12-09 06:37:47,094 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: closing region t1,,1228833363456 > 2008-12-09 06:37:47,094 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Compactions and cache flushes disabled for region t1,,1228833363456 > 2008-12-09 06:37:47,094 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Scanners disabled for region t1,,1228833363456 > 2008-12-09 06:37:47,094 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: No more active scanners for region t1,,1228833363456 > 2008-12-09 06:37:47,095 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Updates disabled for region t1,,1228833363456 > 2008-12-09 06:37:47,095 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: No more row locks outstanding on region t1,,1228833363456 > 2008-12-09 06:37:47,095 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memcache flush for region t1,,1228833363456. Current region memcache size 18.0 > 2008-12-09 06:37:47,095 INFO org.apache.hadoop.hbase.regionserver.Flusher: regionserver/0:0:0:0:0:0:0:0:60020.cacheFlusher exiting > 2008-12-09 06:37:47,096 INFO org.apache.hadoop.hbase.regionserver.LogRoller: LogRoller exiting. > 2008-12-09 06:37:47,096 INFO org.apache.hadoop.hbase.regionserver.CompactSplitThread: regionserver/0:0:0:0:0:0:0:0:60020.compactor exiting > 2008-12-09 06:37:47,099 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: error closing region t1,,1228833363456 > org.apache.hadoop.hbase.DroppedSnapshotException: region: t1,,1228833363456 > at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1071) > at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:619) > at org.apache.hadoop.hbase.regionserver.HRegionServer.closeAllRegions(HRegionServer.java:951) > at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:459) > at java.lang.Thread.run(Thread.java:619) > Caused by: java.io.IOException: Filesystem closed > at org.apache.hadoop.dfs.DFSClient.checkOpen(DFSClient.java:196) > at org.apache.hadoop.dfs.DFSClient.getFileInfo(DFSClient.java:564) > at org.apache.hadoop.dfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:390) > at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:667) > at org.apache.hadoop.hbase.regionserver.HStoreFile.(HStoreFile.java:152) > at org.apache.hadoop.hbase.regionserver.HStore.internalFlushCache(HStore.java:599) > at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:577) > at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1058) > ... 4 more > 2008-12-09 06:37:47,100 DEBUG org.apache.hadoop.hbase.regionserver.HLog: closing log writer in hdfs://h1:54310/hbase/log_10.131.237.51_1228833326838_60020 > 2008-12-09 06:37:47,101 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: Close and delete failed > java.io.IOException: Filesystem closed > at org.apache.hadoop.dfs.DFSClient.checkOpen(DFSClient.java:196) > at org.apache.hadoop.dfs.DFSClient.access$600(DFSClient.java:59) > at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:2689) > at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.close(DFSClient.java:2655) > at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:59) > at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:79) > at org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:962) > at org.apache.hadoop.hbase.regionserver.HLog.close(HLog.java:349) > at org.apache.hadoop.hbase.regionserver.HLog.closeAndDelete(HLog.java:333) > at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:461) > at java.lang.Thread.run(Thread.java:619) > 2008-12-09 06:37:47,102 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: telling master that region server is shutting down at: 10.131.237.51:60020 > 2008-12-09 06:37:47,104 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server at: 10.131.237.51:60020 > 2008-12-09 06:37:47,882 INFO org.apache.hadoop.hbase.Leases: regionserver/0:0:0:0:0:0:0:0:60020.leaseChecker closing leases > 2008-12-09 06:37:47,882 INFO org.apache.hadoop.hbase.Leases: regionserver/0:0:0:0:0:0:0:0:60020.leaseChecker closed leases > 2008-12-09 06:37:54,919 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: worker thread exiting > 2008-12-09 06:37:54,920 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver/0:0:0:0:0:0:0:0:60020 exiting > 2008-12-09 06:37:54,920 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Shutdown thread complete -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.