hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bryan Beaudreault <bbeaudrea...@hubspot.com>
Subject IOException: Cannot append; log is closed -- data lost?
Date Tue, 18 Sep 2012 18:37:51 GMT
We are running cdh3u2 on a 150 node cluster, where 50 are HBase and 100 are
map reduce.  The underlying hdfs spans all nodes.

We just had to restart one of our region servers because it started having
problems serving data.  Looking back on the logs, while the restart was
happening and the memstores were being flushed we saw a bunch of messages
like this:

12/09/18 12:34:00 DEBUG regionserver.HRegion: Started memstore flush for
> visitor-activities-c7,\x00\x00\x00\x00n\x8E62608,1345736694990.fe4fe1d88a1b5cbcb9cdb6121e902f76.,
> current region memstore size 5.8m
> 12/09/18 12:34:00 DEBUG regionserver.HRegion: Finished snapshotting,
> commencing flushing stores
> 12/09/18 12:34:00 DEBUG wal.HLog: Moved 34 log files to /hbase/.oldlogs
> 12/09/18 12:34:00 INFO regionserver.HRegionServer: stopping server at:
> SERVER_IP,60020,1347074761294
> 12/09/18 12:34:00 DEBUG catalog.CatalogTracker: Stopping catalog tracker
> org.apache.hadoop.hbase.catalog.CatalogTracker@1f415056
> 12/09/18 12:34:00 INFO regionserver.HRegionServer: Waiting on 223 regions
> to close
> 12/09/18 12:34:01 ERROR regionserver.HRegionServer:
> java.io.IOException: Cannot append; log is closed
>         at
> org.apache.hadoop.hbase.regionserver.wal.HLog.append(HLog.java:912)
>         at
> org.apache.hadoop.hbase.regionserver.HRegion.delete(HRegion.java:1317)
>         at
> org.apache.hadoop.hbase.regionserver.HRegion.delete(HRegion.java:1237)
>         at
> org.apache.hadoop.hbase.regionserver.HRegionServer.delete(HRegionServer.java:1957)
>         at
> org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:2563)
>         at sun.reflect.GeneratedMethodAccessor135.invoke(Unknown Source)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at
> org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
>         at
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)

We saw this for a bunch of regions.  This may or may not be related (thats
what I'm trying to figure out), but now we are starting to see evidence
that some data may not have been persisted.  For instance we use the result
of an increment value as the id for another record, and we are starting to
see the increment return values that we have already used as ids.

Does this exception mean that we lost data?  Is there any way to recover it
if so?  I don't see any hlogs in the .corrupt folder and not sure how to

One important thing to note is that we were decommissioning 50 of the 150
datanodes at the time, using the dfs.exclude.hosts setting in the namenode.
 I thought that was supposed to be a safe operation, so hopefully it didn't
cause this.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message