hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "gaojinchao (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-4695) WAL logs get deleted before region server can fully flush
Date Mon, 31 Oct 2011 07:54:32 GMT

    [ https://issues.apache.org/jira/browse/HBASE-4695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13139986#comment-13139986
] 

gaojinchao commented on HBASE-4695:
-----------------------------------

Latest Trunk version, test passed in a real cluster:

Region Server logs:
2011-10-31 03:32:42,922 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: stopping
server C3S31,20020,1320034091400
2011-10-31 03:32:46,974 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: stopping
server C3S31,20020,1320034091400; all regions closed.
2011-10-31 03:32:48,633 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Moved 7 log files
to /hbase/.oldlogs
2011-10-31 03:32:49,200 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: stopping
server C3S31,20020,1320034091400; zookeeper connection closed.

Namenode logs:
2011-10-31 03:32:46,988 INFO  FSNamesystem.audit (FSNamesystem.java:logAuditEvent(192)) -
ugi=root,root,sfcb	ip=/158.1.130.31	cmd=listStatus	src=/hbase/.logs/C3S31,20020,1320034091400
perm=root:supergroup:rwxr-xr-x
2011-10-31 03:32:46,991 INFO  FSNamesystem.audit (FSNamesystem.java:logAuditEvent(177)) -
ugi=root,root,sfcb	ip=/158.1.130.31	cmd=rename	src=/hbase/.logs/C3S31,20020,1320034091400/C3S31%2C20020%2C1320034091400.1320045179340
dst=/hbase/.oldlogs/C3S31%2C20020%2C1320034091400.1320045179340	perm=root:supergroup:rw-r--r--
2011-10-31 03:32:46,992 INFO  FSNamesystem.audit (FSNamesystem.java:logAuditEvent(177)) -
ugi=root,root,sfcb	ip=/158.1.130.31	cmd=rename	src=/hbase/.logs/C3S31,20020,1320034091400/C3S31%2C20020%2C1320034091400.1320046155808
dst=/hbase/.oldlogs/C3S31%2C20020%2C1320034091400.1320046155808	perm=root:supergroup:rw-r--r--
2011-10-31 03:32:46,994 INFO  FSNamesystem.audit (FSNamesystem.java:logAuditEvent(177)) -
ugi=root,root,sfcb	ip=/158.1.130.31	cmd=rename	src=/hbase/.logs/C3S31,20020,1320034091400/C3S31%2C20020%2C1320034091400.1320046186294
dst=/hbase/.oldlogs/C3S31%2C20020%2C1320034091400.1320046186294	perm=root:supergroup:rw-r--r--
2011-10-31 03:32:46,996 INFO  FSNamesystem.audit (FSNamesystem.java:logAuditEvent(177)) -
ugi=root,root,sfcb	ip=/158.1.130.31	cmd=rename	src=/hbase/.logs/C3S31,20020,1320034091400/C3S31%2C20020%2C1320034091400.1320046216288
dst=/hbase/.oldlogs/C3S31%2C20020%2C1320034091400.1320046216288	perm=root:supergroup:rw-r--r--
2011-10-31 03:32:46,998 INFO  FSNamesystem.audit (FSNamesystem.java:logAuditEvent(177)) -
ugi=root,root,sfcb	ip=/158.1.130.31	cmd=rename	src=/hbase/.logs/C3S31,20020,1320034091400/C3S31%2C20020%2C1320034091400.1320046255166
dst=/hbase/.oldlogs/C3S31%2C20020%2C1320034091400.1320046255166	perm=root:supergroup:rw-r--r--
2011-10-31 03:32:47,206 INFO  FSNamesystem.audit (FSNamesystem.java:logAuditEvent(192)) -
ugi=webuser,webgroup	ip=/158.1.130.33	cmd=listStatus	src=/hbase/.logs/C3S31,20020,1320034091400
perm=root:supergroup:rwxr-xr-x
2011-10-31 03:32:48,518 INFO  FSNamesystem.audit (FSNamesystem.java:logAuditEvent(177)) -
ugi=root,root,sfcb	ip=/158.1.130.31	cmd=rename	src=/hbase/.logs/C3S31,20020,1320034091400/C3S31%2C20020%2C1320034091400.1320046295501
dst=/hbase/.oldlogs/C3S31%2C20020%2C1320034091400.1320046295501	perm=root:supergroup:rw-r--r--
2011-10-31 03:32:48,633 INFO  FSNamesystem.audit (FSNamesystem.java:logAuditEvent(177)) -
ugi=root,root,sfcb	ip=/158.1.130.31	cmd=rename	src=/hbase/.logs/C3S31,20020,1320034091400/C3S31%2C20020%2C1320034091400.1320046325013
dst=/hbase/.oldlogs/C3S31%2C20020%2C1320034091400.1320046325013	perm=root:supergroup:rw-r--r--
2011-10-31 03:32:48,650 INFO  FSNamesystem.audit (FSNamesystem.java:logAuditEvent(206)) -
ugi=root,root,sfcb	ip=/158.1.130.31	cmd=delete	src=/hbase/.logs/C3S31,20020,1320034091400

2011-10-31 03:32:49,389 INFO  FSNamesystem.audit (FSNamesystem.java:logAuditEvent(206)) -
ugi=root,root,sfcb	ip=/158.1.130.32	cmd=delete	src=/hbase/.META./1028785192/.tmp	


                
> WAL logs get deleted before region server can fully flush
> ---------------------------------------------------------
>
>                 Key: HBASE-4695
>                 URL: https://issues.apache.org/jira/browse/HBASE-4695
>             Project: HBase
>          Issue Type: Bug
>          Components: wal
>    Affects Versions: 0.90.4
>            Reporter: jack levin
>            Assignee: gaojinchao
>            Priority: Blocker
>             Fix For: 0.90.5
>
>         Attachments: HBASE-4695_branch90_trial.patch, hbase-4695-0.92.txt
>
>
> To replicate the problem do the following:
> 1. check /hbase/.logs/XXXX directory to see if you have WAL logs for the region server
you are shutting down.
> 2. executing kill <pid> (where pid is a regionserver pid)
> 3. Watch the regionserver log to start flushing, you will see how many regions are left
to flush:
> 09:36:54,665 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting on 489
regions to close
> 09:56:35,779 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting on 116
regions to close
> 4. Check /hbase/.logs/XXXX -- you will notice that it has dissapeared.
> 5. Check namenode logs:
> 09:26:41,607 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=root
ip=/10.101.1.5 cmd=delete src=/hbase/.logs/rdaa5.prod.imageshack.com,60020,1319749
> Note that, if you kill -9 the RS now, and it crashes on flush, you won't have any WAL
logs to replay.  We need to make sure that logs are deleted or moved out only when RS has
fully flushed. Otherwise its possible to lose data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message