hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ellimilial K <ellimil...@googlemail.com>
Subject Re: HBase all files corrupt / missing blocks
Date Tue, 03 Feb 2015 22:26:53 GMT
That's quite horrible, oh well, thanks for the help!

Yes, positive, we started having issues with HA quorum a couple of days
after the migration, HBase has constantly been taking ~200 requests a
second via stargate, things seemed to work fine.

Mateusz

On 3 February 2015 at 22:11, Jean-Marc Spaggiari <jean-marc@spaggiari.org>
wrote:

> Those files and related data are most probably lost.... I don't see any
> other option than deleting them.
>
> Are you sure those blocks where not missing before the migration? Did you
> have any crash over the migration process?
>
> JM
>
> 2015-02-03 13:14 GMT-08:00 Ellimilial K <ellimilial@googlemail.com>:
>
> > Thank you for the responses!
> >
> > @Jean-Mark
> > This comes from fsck /, I see a flood of those going in at least
> hundreds,
> > for this particular region:
> >
> >
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/extracted/18c428413d7b4a89959911c9112a6eb9:
> > CORRUPT blockpool BP-2037521063-37.59.17.102-1418127576413 block
> > blk_1076062948
> >
> >
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/extracted/18c428413d7b4a89959911c9112a6eb9:
> > MISSING 1 blocks of total size 52243482 B..
> >
> >
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/extracted/49b265ba5c7942b0b8e2b788fd9d7362:
> > CORRUPT blockpool BP-2037521063-37.59.17.102-1418127576413 block
> > blk_1076077963
> >
> >
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/extracted/49b265ba5c7942b0b8e2b788fd9d7362:
> > MISSING 1 blocks of total size 6181 B...
> >
> >
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/pipeline/ef3fc67a835b451aa7d18094ea141451:
> > CORRUPT blockpool BP-2037521063-37.59.17.102-1418127576413 block
> > blk_1076062891
> >
> >
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/pipeline/ef3fc67a835b451aa7d18094ea141451:
> > MISSING 1 blocks of total size 11747149 B..
> >
> >
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/pipeline/fedeb8062c454238bf1d1112b0f80b4b:
> > CORRUPT blockpool BP-2037521063-37.59.17.102-1418127576413 block
> > blk_1076077964
> >
> >
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/pipeline/fedeb8062c454238bf1d1112b0f80b4b:
> > MISSING 1 blocks of total size 10431742 B..
> >
> >
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/processed/35186fe43fed47989ddb4ace3648b109:
> > CORRUPT blockpool BP-2037521063-37.59.17.102-1418127576413 block
> > blk_1076062900
> >
> >
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/processed/35186fe43fed47989ddb4ace3648b109:
> > MISSING 1 blocks of total size 929610 B...
> >
> >
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/processed/bd41ca895f3749188c08dd2e540bc127:
> > CORRUPT blockpool BP-2037521063-37.59.17.102-1418127576413 block
> > blk_1076077966
> >
> >
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/processed/bd41ca895f3749188c08dd2e540bc127:
> > MISSING 1 blocks of total size 119139 B.........
> > (...) ending with:
> > ..........Status: CORRUPT
> >  Total size: 23155170955674 B (Total open files size: 1577 B)
> >  Total dirs: 21232
> >  Total files: 33311
> >  Total symlinks: 0 (Files currently being written: 61)
> >  Total blocks (validated): 199618 (avg. block size 115997409 B) (Total
> open
> > file blocks (not validated): 19)
> >   ********************************
> >   CORRUPT FILES: 8245
> >   MISSING BLOCKS: 8245
> >   MISSING SIZE: 162010861748 B
> >   CORRUPT BLOCKS:  8245
> >   ********************************
> >  Minimally replicated blocks: 191373 (95.86961 %)
> >  Over-replicated blocks: 3241 (1.6236011 %)
> >  Under-replicated blocks: 0 (0.0 %)
> >  Mis-replicated blocks: 0 (0.0 %)
> >  Default replication factor: 3
> >  Average block replication: 2.916185
> >  Corrupt blocks: 8245
> >  Missing replicas: 0 (0.0 %)
> >  Number of data-nodes: 17
> >  Number of racks: 1
> >
> > There are 8 files in directories within
> > hbase/data/default/table/ffa95306f599dbff99497e71841724fe so I imagine
> 6/8
> > is affected.
> > The size of missing blocks differs from 2kb up to ~ 70MB. The table
> > concerned had ~3500 regions. All datanodes are up and look like they
> report
> > correctly so unfortunately no replica lying around.
> >
> > @esteban I double checked, the volumes seem fine, total HDFS size also
> > looks unchanged. Datanodes look fine. It is a single cluster (i.e. no
> > cluster replication if I'm answering the question?),freshly after an
> > upgrade to 0.98 from 0.94 (or CDH 4.7 to 5.3), with HDFS replication set
> to
> > 3.
> >
> > Many thanks,
> > Mateusz
> >
> > On 3 February 2015 at 20:30, Esteban Gutierrez <esteban@cloudera.com>
> > wrote:
> >
> > > Hi Mateusz,
> > >
> > > As JMS mentioned, is very likely the data is lost, but that type of
> > > corruption is usually due some DNs down or data volumes removed for
> some
> > > reason, have you tried to recover that data from those DNs first?
> > >
> > > From "for what looks like a continuous stream of regions" sounds like
> you
> > > had a single replica configured for HBase is that the case?
> > >
> > > esteban.
> > >
> > > --
> > > Cloudera, Inc.
> > >
> > >
> > > On Tue, Feb 3, 2015 at 12:04 PM, Jean-Marc Spaggiari <
> > > jean-marc@spaggiari.org> wrote:
> > >
> > > > Hi Mateusz,
> > > >
> > > > Data from this HFile is most probably lost. Is the block also
> reporting
> > > > missing from fsck? Do you have any datanode down which might contain
> > this
> > > > block? How big is tis HFile? 929610 bytes only? If so, one option
> might
> > > > just to to delete this HFile.
> > > >
> > > > How many HFiles are within this region?
> > > >
> > > > JM
> > > >
> > > > 2015-02-03 10:04 GMT-08:00 Ellimilial K <ellimilial@googlemail.com>:
> > > >
> > > > > We have recently experienced some issues with our namenodes in HA
> > > > > arrangement and had to recreate namenode metadata from a backup
> while
> > > > some
> > > > > new data has been pushed to the regions ervers in the meantime.
> We're
> > > on
> > > > > HBase 98.6.
> > > > >
> > > > > After launching the cluster again, we have realised that we're
> > missing
> > > > > ~8000/190000 blocks. Looking at fsck output, we can see, for what
> > looks
> > > > > like a continuous stream of regions:
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/processed/35186fe43fed47989ddb4ace3648b109:
> > > > > MISSING 1 blocks of total size 929610 B...
> > > > >
> > > > >
> > > >
> > >
> >
> /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/processed/bd41ca895f3749188c08dd2e540bc127:
> > > > > CORRUPT blockpool BP-2037521063-<IP>-1418127576413 block
> > blk_1076077966
> > > > >
> > > > > I did not want to run fsck -delete and hbck complains because the
> > files
> > > > > would not be allocated to region servers - reporting missing
> blocks.
> > > > >
> > > > > The total size of this table is circa 22TB on HDFS and recreating
> it
> > > > would
> > > > > be quite a drag (pushing it from our previous hbase cluster took
> > about
> > > a
> > > > > month). Is there any known way of dealing with such situation?
> > > > >
> > > > > Mateusz KaczyƄski
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message