hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Marc Spaggiari <jean-m...@spaggiari.org>
Subject Re: HBase all files corrupt / missing blocks
Date Tue, 03 Feb 2015 22:11:59 GMT
Those files and related data are most probably lost.... I don't see any
other option than deleting them.

Are you sure those blocks where not missing before the migration? Did you
have any crash over the migration process?

JM

2015-02-03 13:14 GMT-08:00 Ellimilial K <ellimilial@googlemail.com>:

> Thank you for the responses!
>
> @Jean-Mark
> This comes from fsck /, I see a flood of those going in at least hundreds,
> for this particular region:
>
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/extracted/18c428413d7b4a89959911c9112a6eb9:
> CORRUPT blockpool BP-2037521063-37.59.17.102-1418127576413 block
> blk_1076062948
>
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/extracted/18c428413d7b4a89959911c9112a6eb9:
> MISSING 1 blocks of total size 52243482 B..
>
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/extracted/49b265ba5c7942b0b8e2b788fd9d7362:
> CORRUPT blockpool BP-2037521063-37.59.17.102-1418127576413 block
> blk_1076077963
>
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/extracted/49b265ba5c7942b0b8e2b788fd9d7362:
> MISSING 1 blocks of total size 6181 B...
>
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/pipeline/ef3fc67a835b451aa7d18094ea141451:
> CORRUPT blockpool BP-2037521063-37.59.17.102-1418127576413 block
> blk_1076062891
>
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/pipeline/ef3fc67a835b451aa7d18094ea141451:
> MISSING 1 blocks of total size 11747149 B..
>
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/pipeline/fedeb8062c454238bf1d1112b0f80b4b:
> CORRUPT blockpool BP-2037521063-37.59.17.102-1418127576413 block
> blk_1076077964
>
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/pipeline/fedeb8062c454238bf1d1112b0f80b4b:
> MISSING 1 blocks of total size 10431742 B..
>
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/processed/35186fe43fed47989ddb4ace3648b109:
> CORRUPT blockpool BP-2037521063-37.59.17.102-1418127576413 block
> blk_1076062900
>
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/processed/35186fe43fed47989ddb4ace3648b109:
> MISSING 1 blocks of total size 929610 B...
>
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/processed/bd41ca895f3749188c08dd2e540bc127:
> CORRUPT blockpool BP-2037521063-37.59.17.102-1418127576413 block
> blk_1076077966
>
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/processed/bd41ca895f3749188c08dd2e540bc127:
> MISSING 1 blocks of total size 119139 B.........
> (...) ending with:
> ..........Status: CORRUPT
>  Total size: 23155170955674 B (Total open files size: 1577 B)
>  Total dirs: 21232
>  Total files: 33311
>  Total symlinks: 0 (Files currently being written: 61)
>  Total blocks (validated): 199618 (avg. block size 115997409 B) (Total open
> file blocks (not validated): 19)
>   ********************************
>   CORRUPT FILES: 8245
>   MISSING BLOCKS: 8245
>   MISSING SIZE: 162010861748 B
>   CORRUPT BLOCKS:  8245
>   ********************************
>  Minimally replicated blocks: 191373 (95.86961 %)
>  Over-replicated blocks: 3241 (1.6236011 %)
>  Under-replicated blocks: 0 (0.0 %)
>  Mis-replicated blocks: 0 (0.0 %)
>  Default replication factor: 3
>  Average block replication: 2.916185
>  Corrupt blocks: 8245
>  Missing replicas: 0 (0.0 %)
>  Number of data-nodes: 17
>  Number of racks: 1
>
> There are 8 files in directories within
> hbase/data/default/table/ffa95306f599dbff99497e71841724fe so I imagine 6/8
> is affected.
> The size of missing blocks differs from 2kb up to ~ 70MB. The table
> concerned had ~3500 regions. All datanodes are up and look like they report
> correctly so unfortunately no replica lying around.
>
> @esteban I double checked, the volumes seem fine, total HDFS size also
> looks unchanged. Datanodes look fine. It is a single cluster (i.e. no
> cluster replication if I'm answering the question?),freshly after an
> upgrade to 0.98 from 0.94 (or CDH 4.7 to 5.3), with HDFS replication set to
> 3.
>
> Many thanks,
> Mateusz
>
> On 3 February 2015 at 20:30, Esteban Gutierrez <esteban@cloudera.com>
> wrote:
>
> > Hi Mateusz,
> >
> > As JMS mentioned, is very likely the data is lost, but that type of
> > corruption is usually due some DNs down or data volumes removed for some
> > reason, have you tried to recover that data from those DNs first?
> >
> > From "for what looks like a continuous stream of regions" sounds like you
> > had a single replica configured for HBase is that the case?
> >
> > esteban.
> >
> > --
> > Cloudera, Inc.
> >
> >
> > On Tue, Feb 3, 2015 at 12:04 PM, Jean-Marc Spaggiari <
> > jean-marc@spaggiari.org> wrote:
> >
> > > Hi Mateusz,
> > >
> > > Data from this HFile is most probably lost. Is the block also reporting
> > > missing from fsck? Do you have any datanode down which might contain
> this
> > > block? How big is tis HFile? 929610 bytes only? If so, one option might
> > > just to to delete this HFile.
> > >
> > > How many HFiles are within this region?
> > >
> > > JM
> > >
> > > 2015-02-03 10:04 GMT-08:00 Ellimilial K <ellimilial@googlemail.com>:
> > >
> > > > We have recently experienced some issues with our namenodes in HA
> > > > arrangement and had to recreate namenode metadata from a backup while
> > > some
> > > > new data has been pushed to the regions ervers in the meantime. We're
> > on
> > > > HBase 98.6.
> > > >
> > > > After launching the cluster again, we have realised that we're
> missing
> > > > ~8000/190000 blocks. Looking at fsck output, we can see, for what
> looks
> > > > like a continuous stream of regions:
> > > >
> > > >
> > > >
> > >
> >
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/processed/35186fe43fed47989ddb4ace3648b109:
> > > > MISSING 1 blocks of total size 929610 B...
> > > >
> > > >
> > >
> >
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/processed/bd41ca895f3749188c08dd2e540bc127:
> > > > CORRUPT blockpool BP-2037521063-<IP>-1418127576413 block
> blk_1076077966
> > > >
> > > > I did not want to run fsck -delete and hbck complains because the
> files
> > > > would not be allocated to region servers - reporting missing blocks.
> > > >
> > > > The total size of this table is circa 22TB on HDFS and recreating it
> > > would
> > > > be quite a drag (pushing it from our previous hbase cluster took
> about
> > a
> > > > month). Is there any known way of dealing with such situation?
> > > >
> > > > Mateusz KaczyƄski
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message