accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Luke Brassard (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (ACCUMULO-1759) Empty walogs block recovery after power outage.
Date Wed, 09 Oct 2013 20:00:46 GMT

     [ https://issues.apache.org/jira/browse/ACCUMULO-1759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Luke Brassard updated ACCUMULO-1759:
------------------------------------

    Description: 

Power was abruptly cut to the cluster. Upon restart of HDFS, there was a single Rfile that
was missing a block. 



After restarting Accumulo, the Master was complaining with a series of these:

{code}
2013-10-09 18:15:16,649 [recovery.HadoopLogCloser] INFO : Waiting for file to be closed /accumulo/wal/10.10.0.1+9997/d52ab315-5ac1-4a5c-9085-67ae29b98b88
2013-10-09 18:15:16,663 [recovery.HadoopLogCloser] INFO : Waiting for file to be closed /accumulo/wal/10.10.0.2+9997/d0192739-74e2-43a0-985f-3ed668259995
2013-10-09 18:15:16,742 [recovery.HadoopLogCloser] INFO : Waiting for file to be closed /accumulo/wal/10.10.0.3+9997/de54e6dc-964a-4b33-b4fb-052e81749913
2013-10-09 18:15:16,833 [recovery.HadoopLogCloser] INFO : Waiting for file to be closed /accumulo/wal/10.10.0.4+9997/cda5daec-25f3-443b-818a-990d3eddd56f
{code}

Inspection of the files above showed that they were all empty, but referenced in the {{!METADATA}}
table. The solution was to move or remove the files from HDFS and delete the references from
the metadata. The instance was then able to stabilize and assign the rest of the tablets.

It is unclear why these empty walogs existed in the first place. Is it possible that there
should have been data in these walogs? Or should the files have been disregarded since they
were empty?

  was:
Power was abruptly cut to the cluster. Upon restart of HDFS, there was a single Rfile that
was missing a block. After restarting Accumulo, the Master was complaining with a series of
these:

{code}
2013-10-09 18:15:16,649 [recovery.HadoopLogCloser] INFO : Waiting for file to be closed /accumulo/wal/10.10.0.1+9997/d52ab315-5ac1-4a5c-9085-67ae29b98b88
2013-10-09 18:15:16,663 [recovery.HadoopLogCloser] INFO : Waiting for file to be closed /accumulo/wal/10.10.0.2+9997/d0192739-74e2-43a0-985f-3ed668259995
2013-10-09 18:15:16,742 [recovery.HadoopLogCloser] INFO : Waiting for file to be closed /accumulo/wal/10.10.0.3+9997/de54e6dc-964a-4b33-b4fb-052e81749913
2013-10-09 18:15:16,833 [recovery.HadoopLogCloser] INFO : Waiting for file to be closed /accumulo/wal/10.10.0.4+9997/cda5daec-25f3-443b-818a-990d3eddd56f
{code}

Inspection of the files above showed that they were all empty, but referenced in the {{!METADATA}}
table. The solution was to move or remove the files from HDFS and delete the references from
the metadata. The instance was then able to stabilize and assign the rest of the tablets.

It is unclear why these empty walogs existed in the first place. Is it possible that there
should have been data in these walogs? Or should the files have been disregarded since they
were empty?


> Empty walogs block recovery after power outage.
> -----------------------------------------------
>
>                 Key: ACCUMULO-1759
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-1759
>             Project: Accumulo
>          Issue Type: Bug
>    Affects Versions: 1.5.0
>         Environment: * HDP 1.3
> ** {{dfs.durable.sync=true}}
> ** {{dfs.datanode.synconclose=true}}
> * encrytion patch from ACCUMULO-998
>            Reporter: Luke Brassard
>
> Power was abruptly cut to the cluster. Upon restart of HDFS, there was a single Rfile
that was missing a block. 
> After restarting Accumulo, the Master was complaining with a series of these:
> {code}
> 2013-10-09 18:15:16,649 [recovery.HadoopLogCloser] INFO : Waiting for file to be closed
/accumulo/wal/10.10.0.1+9997/d52ab315-5ac1-4a5c-9085-67ae29b98b88
> 2013-10-09 18:15:16,663 [recovery.HadoopLogCloser] INFO : Waiting for file to be closed
/accumulo/wal/10.10.0.2+9997/d0192739-74e2-43a0-985f-3ed668259995
> 2013-10-09 18:15:16,742 [recovery.HadoopLogCloser] INFO : Waiting for file to be closed
/accumulo/wal/10.10.0.3+9997/de54e6dc-964a-4b33-b4fb-052e81749913
> 2013-10-09 18:15:16,833 [recovery.HadoopLogCloser] INFO : Waiting for file to be closed
/accumulo/wal/10.10.0.4+9997/cda5daec-25f3-443b-818a-990d3eddd56f
> {code}
> Inspection of the files above showed that they were all empty, but referenced in the
{{!METADATA}} table. The solution was to move or remove the files from HDFS and delete the
references from the metadata. The instance was then able to stabilize and assign the rest
of the tablets.
> It is unclear why these empty walogs existed in the first place. Is it possible that
there should have been data in these walogs? Or should the files have been disregarded since
they were empty?



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message