hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom Brown <tombrow...@gmail.com>
Subject Re: Why is this region compacting?
Date Tue, 24 Sep 2013 20:27:40 GMT
I tried the workaround, and it is working very well. The number of store
files for all regions is now sane (went from about 8000 total store files
to 1000), and scans are now much more efficient.

Thanks for all your help, Jean-Marc and Sergey!

--Tom


On Tue, Sep 24, 2013 at 2:11 PM, Jean-Marc Spaggiari <
jean-marc@spaggiari.org> wrote:

> Hi Tom,
>
> Thanks for this information and the offer. I think we have enought to start
> to look at this issue. I'm still trying to reproduce that locally. In the
> meantime, I sent a patch to fix the NullPointException your faced before.
>
> I will post back here if I'm able to reproduce. Have you tried Sergey's
> workarround?
>
> JM
>
>
> 2013/9/24 Tom Brown <tombrown52@gmail.com>
>
> > Yes, it is empty.
> >
> > 13/09/24 13:03:03 INFO hfile.CacheConfig: Allocating LruBlockCache with
> > maximum size 2.9g
> > 13/09/24 13:03:03 ERROR metrics.SchemaMetrics: Inconsistent
> configuration.
> > Previous configuration for using table name in metrics: true, new
> > configuration: false
> > 13/09/24 13:03:03 WARN metrics.SchemaConfigured: Could not determine
> table
> > and column family of the HFile path /fca0882dc7624342a8f4fce4b89420ff.
> > Expecting at least 5 path components.
> > 13/09/24 13:03:03 WARN snappy.LoadSnappy: Snappy native library is
> > available
> > 13/09/24 13:03:03 INFO util.NativeCodeLoader: Loaded the native-hadoop
> > library
> > 13/09/24 13:03:03 INFO snappy.LoadSnappy: Snappy native library loaded
> > 13/09/24 13:03:03 INFO compress.CodecPool: Got brand-new decompressor
> > Stats:
> > no data available for statistics
> > Scanned kv count -> 0
> >
> > If you want to examine the actual file, I would be happy to email it to
> you
> > directly.
> >
> > --Tom
> >
> >
> > On Tue, Sep 24, 2013 at 12:42 PM, Jean-Marc Spaggiari <
> > jean-marc@spaggiari.org> wrote:
> >
> > > Can you try with less parameters and see if you are able to get
> something
> > > from it? This exception is caused by the "printMeta", so if you remove
> -m
> > > it should be ok. However, printMeta was what I was looking for ;)
> > >
> > > getFirstKey for this file seems to return null. So it might simply be
> an
> > > empty file, not necessary a corrupted one.
> > >
> > >
> > > 2013/9/24 Tom Brown <tombrown52@gmail.com>
> > >
> > > > /usr/lib/hbase/bin/hbase org.apache.hadoop.hbase.io.hfile.HFile -m -s
> > -v
> > > -f
> > > >
> > > >
> > >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/fca0882dc7624342a8f4fce4b89420ff
> > > > 13/09/24 12:33:40 INFO util.ChecksumType: Checksum using
> > > > org.apache.hadoop.util.PureJavaCrc32
> > > > Scanning ->
> > > >
> > > >
> > >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/fca0882dc7624342a8f4fce4b89420ff
> > > > 13/09/24 12:33:41 INFO hfile.CacheConfig: Allocating LruBlockCache
> with
> > > > maximum size 2.9g
> > > > 13/09/24 12:33:41 ERROR metrics.SchemaMetrics: Inconsistent
> > > configuration.
> > > > Previous configuration for using table name in metrics: true, new
> > > > configuration: false
> > > > 13/09/24 12:33:41 WARN snappy.LoadSnappy: Snappy native library is
> > > > available
> > > > 13/09/24 12:33:41 INFO util.NativeCodeLoader: Loaded the
> native-hadoop
> > > > library
> > > > 13/09/24 12:33:41 INFO snappy.LoadSnappy: Snappy native library
> loaded
> > > > 13/09/24 12:33:41 INFO compress.CodecPool: Got brand-new decompressor
> > > > Block index size as per heapsize: 336
> > > > Exception in thread "main" java.lang.NullPointerException
> > > >         at
> > > org.apache.hadoop.hbase.KeyValue.keyToString(KeyValue.java:716)
> > > >         at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.io.hfile.AbstractHFileReader.toStringFirstKey(AbstractHFileReader.java:138)
> > > >         at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.io.hfile.AbstractHFileReader.toString(AbstractHFileReader.java:149)
> > > >         at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.printMeta(HFilePrettyPrinter.java:318)
> > > >         at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.processFile(HFilePrettyPrinter.java:234)
> > > >         at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.run(HFilePrettyPrinter.java:189)
> > > >         at
> org.apache.hadoop.hbase.io.hfile.HFile.main(HFile.java:756)
> > > >
> > > >
> > > > Does this mean the problem might have been caused by a corrupted
> > file(s)?
> > > >
> > > > --Tom
> > > >
> > > >
> > > > On Tue, Sep 24, 2013 at 12:21 PM, Jean-Marc Spaggiari <
> > > > jean-marc@spaggiari.org> wrote:
> > > >
> > > > > One more Tom,
> > > > >
> > > > > When you will have been able capture de HFile locally, please run
> run
> > > the
> > > > > HFile class on it to see the number of keys (is it empty?) and the
> > > other
> > > > > specific information.
> > > > >
> > > > > bin/hbase org.apache.hadoop.hbase.io.hfile.HFile -m -s -v -f
> > HFILENAME
> > > > >
> > > > > Thanks,
> > > > >
> > > > > JM
> > > > >
> > > > >
> > > > > 2013/9/24 Jean-Marc Spaggiari <jean-marc@spaggiari.org>
> > > > >
> > > > > > We get -1 because of this:
> > > > > >
> > > > > >       byte [] timerangeBytes = metadataMap.get(TIMERANGE_KEY);
> > > > > >       if (timerangeBytes != null) {
> > > > > >         this.reader.timeRangeTracker = new TimeRangeTracker();
> > > > > >         Writables.copyWritable(timerangeBytes,
> > > > > > this.reader.timeRangeTracker);
> > > > > >       }
> > > > > > this.reader.timeRangeTracker will return -1 for the
> > maximumTimestamp
> > > > > > value. So now, we need to figure if it's normal or not to have
> > > > > > TIMERANGE_KEY not null here.
> > > > > >
> > > > > > I have created the same table locally on 0,94.10 with the same
> > > > attributes
> > > > > > and I'm not facing this issue.
> > > > > >
> > > > > > We need to look at the related HFile, but files are rolled VERY
> > > > quickly,
> > > > > > so might be difficult to get one.
> > > > > >
> > > > > > Maybe something like
> > > > > > hadoop fs -get hdfs://
> > > > > >
> > > > >
> > > >
> > >
> >
> hdpmgr001.pse.movenetworks.com:8020/hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/*
> > > > > > .
> > > > > >
> > > > > > might help to get the file? Then we can start to look at it
and
> see
> > > > what
> > > > > > exactly trigger this behaviour?
> > > > > >
> > > > > > JM
> > > > > >
> > > > > >
> > > > > > 2013/9/24 Sergey Shelukhin <sergey@hortonworks.com>
> > > > > >
> > > > > >> Yeah, I think c3580bdb62d64e42a9eeac50f1c582d2 store file
is a
> > good
> > > > > >> example.
> > > > > >> Can you grep for c3580bdb62d64e42a9eeac50f1c582d2 and post
the
> log
> > > > just
> > > > > to
> > > > > >> be sure? Thanks.
> > > > > >> It looks like an interaction of deleting expired files and
> > > > > >>           // Create the writer even if no kv(Empty store
file is
> > > also
> > > > > ok),
> > > > > >>           // because we need record the max seq id for the
store
> > > file,
> > > > > see
> > > > > >>           // HBASE-6059
> > > > > >> in compactor.
> > > > > >> The newly created file is immediately collected the same
way and
> > > > > replaced
> > > > > >> by another file, which seems like not an intended behavior,
even
> > > > though
> > > > > >> both pieces of code are technically correct (the empty file
is
> > > > expired,
> > > > > >> and
> > > > > >> the new file is generally needed).
> > > > > >>
> > > > > >> I filed HBASE-9648
> > > > > >>
> > > > > >>
> > > > > >> On Tue, Sep 24, 2013 at 10:55 AM, Sergey Shelukhin
> > > > > >> <sergey@hortonworks.com>wrote:
> > > > > >>
> > > > > >> > To mitigate, you can change
> hbase.store.delete.expired.storefile
> > > to
> > > > > >> false
> > > > > >> > on one region server, or for entire table, and restart
this
> RS.
> > > > > >> > This will trigger a different compaction, hopefully.
> > > > > >> > We'd need to find what the bug is. My gut feeling (which
is
> > known
> > > to
> > > > > be
> > > > > >> > wrong often) is that it has to do with it selecting
one file,
> > > > probably
> > > > > >> > invalid check somewhere, or interaction with the code
that
> > ensures
> > > > > that
> > > > > >> at
> > > > > >> > least one file needs to be written to preserve metadata,
it
> > might
> > > be
> > > > > >> just
> > > > > >> > cycling thru such files.
> > > > > >> >
> > > > > >> >
> > > > > >> > On Tue, Sep 24, 2013 at 10:20 AM, Jean-Marc Spaggiari
<
> > > > > >> > jean-marc@spaggiari.org> wrote:
> > > > > >> >
> > > > > >> >> So. Looking at the code, this, for me, sound like
a bug.
> > > > > >> >>
> > > > > >> >> I will try to reproduce it locally. Seems to be
related to
> the
> > > > > >> combination
> > > > > >> >> of TTL + BLOOM.
> > > > > >> >>
> > > > > >> >> Creating a table for that right now, will keep
you posted
> very
> > > > > shortly.
> > > > > >> >>
> > > > > >> >> JM
> > > > > >> >>
> > > > > >> >>
> > > > > >> >> 2013/9/24 Tom Brown <tombrown52@gmail.com>
> > > > > >> >>
> > > > > >> >> > -rw-------   1 hadoop supergroup       2194
2013-09-21
> 14:32
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/014ead47a9484d67b55205be16802ff1
> > > > > >> >> > -rw-------   1 hadoop supergroup      31321
2013-09-24
> 05:49
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/1305d625bd4a4be39a98ae4d91a66140
> > > > > >> >> > -rw-------   1 hadoop supergroup       1350
2013-09-24
> 10:31
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/1352e0828f974f08b1f3d7a9dff04abd
> > > > > >> >> > -rw-------   1 hadoop supergroup       4194
2013-09-21
> 10:38
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/17a546064bd840619816809ae0fc4c49
> > > > > >> >> > -rw-------   1 hadoop supergroup       1061
2013-09-20
> 22:55
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/1cb3df115da244288bd076968ab4ccf6
> > > > > >> >> > -rw-------   1 hadoop supergroup       1375
2013-08-24
> 10:17
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/1e41a96c49fc4e5ab59392d26935978d
> > > > > >> >> > -rw-------   1 hadoop supergroup      96296
2013-08-26
> 15:48
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/22d72fd897e34424b5420a96483a571e
> > > > > >> >> > -rw-------   1 hadoop supergroup       1356
2013-08-26
> 15:23
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/25fee1ffadbe42549bd0b7b13d782b72
> > > > > >> >> > -rw-------   1 hadoop supergroup       6229
2013-09-21
> 11:14
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/26289c777ec14dc5b7021b4d6b1050c5
> > > > > >> >> > -rw-------   1 hadoop supergroup       1223
2013-09-21
> 02:42
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/2757d7ba9c8448d6a3d5d46bd4d59758
> > > > > >> >> > -rw-------   1 hadoop supergroup    5302248
2013-08-24
> 02:22
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/2ec40943787246ea983608dd6591db24
> > > > > >> >> > -rw-------   1 hadoop supergroup       1596
2013-08-24
> 03:37
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/3157fd1cabe4483aaa4d9a21f75e4d88
> > > > > >> >> > -rw-------   1 hadoop supergroup       1338
2013-09-22
> 04:25
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/36b0f80a4a7b492f97358b64d879a2df
> > > > > >> >> > -rw-------   1 hadoop supergroup       3264
2013-09-21
> 12:05
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/39e249fcb532400daed73aed6689ceeb
> > > > > >> >> > -rw-------   1 hadoop supergroup       4549
2013-09-21
> 08:56
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/3bc9e2a566ad460a9b0ed336b2fb5ed9
> > > > > >> >> > -rw-------   1 hadoop supergroup       1630
2013-09-22
> 03:22
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/48026d08aae748f08aad59e4eea903be
> > > > > >> >> > -rw-------   1 hadoop supergroup     105395
2013-09-20
> 21:12
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/53198825f085401cbbd4322faa0e3aae
> > > > > >> >> > -rw-------   1 hadoop supergroup       3859
2013-09-21
> 09:09
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/71c2f9b2a8ff4c049fcc5a9a22af5cfe
> > > > > >> >> > -rw-------   1 hadoop supergroup     311688
2013-09-20
> 21:12
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/97ff16d6da974c30835c6e0acc7c737a
> > > > > >> >> > -rw-------   1 hadoop supergroup       1897
2013-08-24
> 08:43
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/a172d7577641434d82abcce88a433213
> > > > > >> >> > -rw-------   1 hadoop supergroup       3380
2013-09-21
> 13:04
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/be678e5c60534c65a012a798fbc7e284
> > > > > >> >> > -rw-------   1 hadoop supergroup      43710
2013-09-22
> 02:15
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/e2508a23acf1491f9d38b9a8594e41e8
> > > > > >> >> > -rw-------   1 hadoop supergroup       5409
2013-09-21
> 10:10
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/f432846182714b93a1c3df0f5835c09b
> > > > > >> >> > -rw-------   1 hadoop supergroup        491
2013-09-24
> 11:18
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/f7d8669cf7a047b98c1d3b13c16cfaec
> > > > > >> >> > -rw-------   1 hadoop supergroup        491
2013-09-24
> 11:18
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/fa1b8f6cc9584eb28365dcd8f10d3f0a
> > > > > >> >> > -rw-------   1 hadoop supergroup        491
2013-09-13
> 11:28
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/fca0882dc7624342a8f4fce4b89420ff
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >> > On Tue, Sep 24, 2013 at 11:14 AM, Jean-Marc
Spaggiari <
> > > > > >> >> > jean-marc@spaggiari.org> wrote:
> > > > > >> >> >
> > > > > >> >> > > TTL seems to be fine.
> > > > > >> >> > >
> > > > > >> >> > > -1 is the default value for
> > > TimeRangeTracker.maximumTimestamp.
> > > > > >> >> > >
> > > > > >> >> > > Can you run:
> > > > > >> >> > > hadoop fs -lsr hdfs://
> > > > > >> >> > >
> > > > > >> >> > >
> > > > > >> >> >
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> hdpmgr001.pse.movenetworks.com:8020/hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/
> > > > > >> >> > >
> > > > > >> >> > > Thanks,
> > > > > >> >> > >
> > > > > >> >> > > JM
> > > > > >> >> > >
> > > > > >> >> > >
> > > > > >> >> > > 2013/9/24 Tom Brown <tombrown52@gmail.com>
> > > > > >> >> > >
> > > > > >> >> > > > 1. Hadoop version is 1.1.2.
> > > > > >> >> > > > 2. All servers are synched with
NTP.
> > > > > >> >> > > > 3. Table definition is: 'compound0',
{
> > > > > >> >> > > > NAME => 'd',
> > > > > >> >> > > > DATA_BLOCK_ENCODING => 'NONE',
> > > > > >> >> > > > BLOOMFILTER => 'ROW',
> > > > > >> >> > > > REPLICATION_SCOPE => '0',
> > > > > >> >> > > > VERSIONS => '1',
> > > > > >> >> > > > COMPRESSION => 'SNAPPY',
> > > > > >> >> > > > MIN_VERSIONS => '0',
> > > > > >> >> > > > TTL => '8640000',
> > > > > >> >> > > > KEEP_DELETED_CELLS => 'false',
> > > > > >> >> > > > BLOCKSIZE => '65536',
> > > > > >> >> > > > IN_MEMORY => 'false',
> > > > > >> >> > > > ENCODE_ON_DISK => 'true',
> > > > > >> >> > > > BLOCKCACHE => 'true'
> > > > > >> >> > > > }
> > > > > >> >> > > >
> > > > > >> >> > > > The TTL is supposed to be 100 days.
> > > > > >> >> > > >
> > > > > >> >> > > > --Tom
> > > > > >> >> > > >
> > > > > >> >> > > >
> > > > > >> >> > > > On Tue, Sep 24, 2013 at 10:53 AM,
Jean-Marc Spaggiari <
> > > > > >> >> > > > jean-marc@spaggiari.org> wrote:
> > > > > >> >> > > >
> > > > > >> >> > > > > Another important information
why might be the root
> > cause
> > > > of
> > > > > >> this
> > > > > >> >> > > > issue...
> > > > > >> >> > > > >
> > > > > >> >> > > > > Do you have any TTL defines
for this table?
> > > > > >> >> > > > >
> > > > > >> >> > > > > JM
> > > > > >> >> > > > >
> > > > > >> >> > > > >
> > > > > >> >> > > > > 2013/9/24 Jean-Marc Spaggiari
<
> jean-marc@spaggiari.org
> > >
> > > > > >> >> > > > >
> > > > > >> >> > > > > > Strange.
> > > > > >> >> > > > > >
> > > > > >> >> > > > > > Few questions then.
> > > > > >> >> > > > > > 1) What is your hadoop
version?
> > > > > >> >> > > > > > 2) Is the clock on all
your severs synched with
> NTP?
> > > > > >> >> > > > > > 3) What is you table definition?
Bloom filters,
> etc.?
> > > > > >> >> > > > > >
> > > > > >> >> > > > > > This is the reason why
it keep compacting:
> > > > > >> >> > > > > >
> > > > > >> >> > > > > > 2013-09-24 10:04:00,548
INFO
> > > > > >> >> > > > >
> > > > > >>
> org.apache.hadoop.hbase.regionserver.compactions.CompactSelection:
> > > > > >> >> > > > Deleting
> > > > > >> >> > > > > the expired store file by compaction:
hdfs://
> > > > > >> >> > > > >
> > > > > >> >> > > >
> > > > > >> >> > >
> > > > > >> >> >
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> hdpmgr001.pse.movenetworks.com:8020/hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/7426f128469242ec8ee09f3965fd5a1awhosemaxTimeStampis-1whilethemaxexpiredtimestampis1371398640548
> > > > > >> >> > > > > >
> > > > > >> >> > > > > > maxTimeStamp = -1
> > > > > >> >> > > > > >
> > > > > >> >> > > > > >
> > > > > >> >> > > > > > Each time there is a comparison
between
> maxTimeStamp
> > > for
> > > > > this
> > > > > >> >> store
> > > > > >> >> > > > file
> > > > > >> >> > > > > > and the configured maxExpiredTimeStamp
and since
> > > > > maxTimeStamp
> > > > > >> >> > returns
> > > > > >> >> > > > -1,
> > > > > >> >> > > > > > it's always elected for
a compaction. Now, we need
> to
> > > > find
> > > > > >> >> why...
> > > > > >> >> > > > > >
> > > > > >> >> > > > > > JM
> > > > > >> >> > > > > >
> > > > > >> >> > > > > >
> > > > > >> >> > > > > > 2013/9/24 Tom Brown <tombrown52@gmail.com>
> > > > > >> >> > > > > >
> > > > > >> >> > > > > >> My cluster is fully
distributed (2 regionserver
> > > nodes).
> > > > > >> >> > > > > >>
> > > > > >> >> > > > > >> Here is a snippet
of log entries that may explain
> > why
> > > it
> > > > > >> >> started:
> > > > > >> >> > > > > >> http://pastebin.com/wQECif8k.
I had to go back 2
> > days
> > > > to
> > > > > >> find
> > > > > >> >> > when
> > > > > >> >> > > it
> > > > > >> >> > > > > >> started for this region.
> > > > > >> >> > > > > >>
> > > > > >> >> > > > > >> This is not the only
region experiencing this
> issue
> > > (but
> > > > > >> this
> > > > > >> >> is
> > > > > >> >> > the
> > > > > >> >> > > > > >> smallest one it's
happened to).
> > > > > >> >> > > > > >>
> > > > > >> >> > > > > >> --Tom
> > > > > >> >> > > > > >>
> > > > > >> >> > > > > >>
> > > > > >> >> > > > > >> On Tue, Sep 24, 2013
at 10:13 AM, Jean-Marc
> > Spaggiari
> > > <
> > > > > >> >> > > > > >> jean-marc@spaggiari.org>
wrote:
> > > > > >> >> > > > > >>
> > > > > >> >> > > > > >> > Can you past
logs a bit before that? To see if
> > > > anything
> > > > > >> >> > triggered
> > > > > >> >> > > > the
> > > > > >> >> > > > > >> > compaction?
> > > > > >> >> > > > > >> > Before the 1M
compactions entries.
> > > > > >> >> > > > > >> >
> > > > > >> >> > > > > >> > Also, what is
your setup? Are you running in
> > > > Standalone?
> > > > > >> >> > > > Pseudo-Dist?
> > > > > >> >> > > > > >> > Fully-Dist?
> > > > > >> >> > > > > >> >
> > > > > >> >> > > > > >> > Thanks,
> > > > > >> >> > > > > >> >
> > > > > >> >> > > > > >> > JM
> > > > > >> >> > > > > >> >
> > > > > >> >> > > > > >> >
> > > > > >> >> > > > > >> > 2013/9/24 Tom
Brown <tombrown52@gmail.com>
> > > > > >> >> > > > > >> >
> > > > > >> >> > > > > >> > > There is
one column family, d. Each row has
> > about
> > > 10
> > > > > >> >> columns,
> > > > > >> >> > > and
> > > > > >> >> > > > > each
> > > > > >> >> > > > > >> > > row's total
data size is less than 2K.
> > > > > >> >> > > > > >> > >
> > > > > >> >> > > > > >> > > Here is
a small snippet of logs from the
> region
> > > > > server:
> > > > > >> >> > > > > >> > > http://pastebin.com/S2jE4ZAx
> > > > > >> >> > > > > >> > >
> > > > > >> >> > > > > >> > > --Tom
> > > > > >> >> > > > > >> > >
> > > > > >> >> > > > > >> > >
> > > > > >> >> > > > > >> > > On Tue,
Sep 24, 2013 at 9:59 AM, Bharath
> > > > Vissapragada
> > > > > <
> > > > > >> >> > > > > >> > > bharathv@cloudera.com
> > > > > >> >> > > > > >> > > > wrote:
> > > > > >> >> > > > > >> > >
> > > > > >> >> > > > > >> > > > It
would help if you can show your RS log
> (via
> > > > > >> >> pastebin?) .
> > > > > >> >> > > Are
> > > > > >> >> > > > > >> there
> > > > > >> >> > > > > >> > > > frequent
flushes for this region too?
> > > > > >> >> > > > > >> > > >
> > > > > >> >> > > > > >> > > >
> > > > > >> >> > > > > >> > > > On
Tue, Sep 24, 2013 at 9:20 PM, Tom Brown <
> > > > > >> >> > > > tombrown52@gmail.com>
> > > > > >> >> > > > > >> > wrote:
> > > > > >> >> > > > > >> > > >
> > > > > >> >> > > > > >> > > > >
I have a region that is very small, only
> > 5MB.
> > > > > >> Despite
> > > > > >> >> it's
> > > > > >> >> > > > size,
> > > > > >> >> > > > > >> it
> > > > > >> >> > > > > >> > has
> > > > > >> >> > > > > >> > > > 24
> > > > > >> >> > > > > >> > > > >
store files. The logs show that it's
> > > compacting
> > > > > >> (over
> > > > > >> >> and
> > > > > >> >> > > over
> > > > > >> >> > > > > >> > again).
> > > > > >> >> > > > > >> > > > >
> > > > > >> >> > > > > >> > > > >
The odd thing is that even though there
> are
> > 24
> > > > > store
> > > > > >> >> > files,
> > > > > >> >> > > it
> > > > > >> >> > > > > >> only
> > > > > >> >> > > > > >> > > does
> > > > > >> >> > > > > >> > > > >
one at a time. Even more strange is that
> my
> > > logs
> > > > > are
> > > > > >> >> > filling
> > > > > >> >> > > > up
> > > > > >> >> > > > > >> with
> > > > > >> >> > > > > >> > > > >
compacting this one region. In the last 10
> > > > hours,
> > > > > >> there
> > > > > >> >> > have
> > > > > >> >> > > > > been
> > > > > >> >> > > > > >> > > > 1,876,200
> > > > > >> >> > > > > >> > > > >
log entries corresponding to compacting
> this
> > > > > region
> > > > > >> >> alone.
> > > > > >> >> > > > > >> > > > >
> > > > > >> >> > > > > >> > > > >
My cluster is 0.94.10, and using almost
> all
> > > > > default
> > > > > >> >> > > settings.
> > > > > >> >> > > > > >> Only a
> > > > > >> >> > > > > >> > > few
> > > > > >> >> > > > > >> > > > >
are not default:
> > > > > >> >> > > > > >> > > > >
hbase.hregion.max.filesize = 4294967296
> > > > > >> >> > > > > >> > > > >
hbase.hstore.compaction.min = 6
> > > > > >> >> > > > > >> > > > >
> > > > > >> >> > > > > >> > > > >
I am at a total loss as to why this
> behavior
> > > is
> > > > > >> >> occurring.
> > > > > >> >> > > Any
> > > > > >> >> > > > > >> help
> > > > > >> >> > > > > >> > is
> > > > > >> >> > > > > >> > > > >
appreciated.
> > > > > >> >> > > > > >> > > > >
> > > > > >> >> > > > > >> > > > >
--Tom
> > > > > >> >> > > > > >> > > > >
> > > > > >> >> > > > > >> > > >
> > > > > >> >> > > > > >> > > >
> > > > > >> >> > > > > >> > > >
> > > > > >> >> > > > > >> > > > --
> > > > > >> >> > > > > >> > > > Bharath
Vissapragada
> > > > > >> >> > > > > >> > > > <http://www.cloudera.com>
> > > > > >> >> > > > > >> > > >
> > > > > >> >> > > > > >> > >
> > > > > >> >> > > > > >> >
> > > > > >> >> > > > > >>
> > > > > >> >> > > > > >
> > > > > >> >> > > > > >
> > > > > >> >> > > > >
> > > > > >> >> > > >
> > > > > >> >> > >
> > > > > >> >> >
> > > > > >> >>
> > > > > >> >
> > > > > >> >
> > > > > >>
> > > > > >> --
> > > > > >> CONFIDENTIALITY NOTICE
> > > > > >> NOTICE: This message is intended for the use of the individual
> or
> > > > entity
> > > > > >> to
> > > > > >> which it is addressed and may contain information that is
> > > > confidential,
> > > > > >> privileged and exempt from disclosure under applicable law.
If
> the
> > > > > reader
> > > > > >> of this message is not the intended recipient, you are hereby
> > > notified
> > > > > >> that
> > > > > >> any printing, copying, dissemination, distribution, disclosure
> or
> > > > > >> forwarding of this communication is strictly prohibited.
If you
> > have
> > > > > >> received this communication in error, please contact the
sender
> > > > > >> immediately
> > > > > >> and delete it from your system. Thank You.
> > > > > >>
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message