hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Marc Spaggiari <jean-m...@spaggiari.org>
Subject Re: Why is this region compacting?
Date Tue, 24 Sep 2013 18:21:26 GMT
One more Tom,

When you will have been able capture de HFile locally, please run run the
HFile class on it to see the number of keys (is it empty?) and the other
specific information.

bin/hbase org.apache.hadoop.hbase.io.hfile.HFile -m -s -v -f HFILENAME

Thanks,

JM


2013/9/24 Jean-Marc Spaggiari <jean-marc@spaggiari.org>

> We get -1 because of this:
>
>       byte [] timerangeBytes = metadataMap.get(TIMERANGE_KEY);
>       if (timerangeBytes != null) {
>         this.reader.timeRangeTracker = new TimeRangeTracker();
>         Writables.copyWritable(timerangeBytes,
> this.reader.timeRangeTracker);
>       }
> this.reader.timeRangeTracker will return -1 for the maximumTimestamp
> value. So now, we need to figure if it's normal or not to have
> TIMERANGE_KEY not null here.
>
> I have created the same table locally on 0,94.10 with the same attributes
> and I'm not facing this issue.
>
> We need to look at the related HFile, but files are rolled VERY quickly,
> so might be difficult to get one.
>
> Maybe something like
> hadoop fs -get hdfs://
> hdpmgr001.pse.movenetworks.com:8020/hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/*
> .
>
> might help to get the file? Then we can start to look at it and see what
> exactly trigger this behaviour?
>
> JM
>
>
> 2013/9/24 Sergey Shelukhin <sergey@hortonworks.com>
>
>> Yeah, I think c3580bdb62d64e42a9eeac50f1c582d2 store file is a good
>> example.
>> Can you grep for c3580bdb62d64e42a9eeac50f1c582d2 and post the log just to
>> be sure? Thanks.
>> It looks like an interaction of deleting expired files and
>>           // Create the writer even if no kv(Empty store file is also ok),
>>           // because we need record the max seq id for the store file, see
>>           // HBASE-6059
>> in compactor.
>> The newly created file is immediately collected the same way and replaced
>> by another file, which seems like not an intended behavior, even though
>> both pieces of code are technically correct (the empty file is expired,
>> and
>> the new file is generally needed).
>>
>> I filed HBASE-9648
>>
>>
>> On Tue, Sep 24, 2013 at 10:55 AM, Sergey Shelukhin
>> <sergey@hortonworks.com>wrote:
>>
>> > To mitigate, you can change hbase.store.delete.expired.storefile to
>> false
>> > on one region server, or for entire table, and restart this RS.
>> > This will trigger a different compaction, hopefully.
>> > We'd need to find what the bug is. My gut feeling (which is known to be
>> > wrong often) is that it has to do with it selecting one file, probably
>> > invalid check somewhere, or interaction with the code that ensures that
>> at
>> > least one file needs to be written to preserve metadata, it might be
>> just
>> > cycling thru such files.
>> >
>> >
>> > On Tue, Sep 24, 2013 at 10:20 AM, Jean-Marc Spaggiari <
>> > jean-marc@spaggiari.org> wrote:
>> >
>> >> So. Looking at the code, this, for me, sound like a bug.
>> >>
>> >> I will try to reproduce it locally. Seems to be related to the
>> combination
>> >> of TTL + BLOOM.
>> >>
>> >> Creating a table for that right now, will keep you posted very shortly.
>> >>
>> >> JM
>> >>
>> >>
>> >> 2013/9/24 Tom Brown <tombrown52@gmail.com>
>> >>
>> >> > -rw-------   1 hadoop supergroup       2194 2013-09-21 14:32
>> >> >
>> >> >
>> >>
>> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/014ead47a9484d67b55205be16802ff1
>> >> > -rw-------   1 hadoop supergroup      31321 2013-09-24 05:49
>> >> >
>> >> >
>> >>
>> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/1305d625bd4a4be39a98ae4d91a66140
>> >> > -rw-------   1 hadoop supergroup       1350 2013-09-24 10:31
>> >> >
>> >> >
>> >>
>> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/1352e0828f974f08b1f3d7a9dff04abd
>> >> > -rw-------   1 hadoop supergroup       4194 2013-09-21 10:38
>> >> >
>> >> >
>> >>
>> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/17a546064bd840619816809ae0fc4c49
>> >> > -rw-------   1 hadoop supergroup       1061 2013-09-20 22:55
>> >> >
>> >> >
>> >>
>> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/1cb3df115da244288bd076968ab4ccf6
>> >> > -rw-------   1 hadoop supergroup       1375 2013-08-24 10:17
>> >> >
>> >> >
>> >>
>> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/1e41a96c49fc4e5ab59392d26935978d
>> >> > -rw-------   1 hadoop supergroup      96296 2013-08-26 15:48
>> >> >
>> >> >
>> >>
>> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/22d72fd897e34424b5420a96483a571e
>> >> > -rw-------   1 hadoop supergroup       1356 2013-08-26 15:23
>> >> >
>> >> >
>> >>
>> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/25fee1ffadbe42549bd0b7b13d782b72
>> >> > -rw-------   1 hadoop supergroup       6229 2013-09-21 11:14
>> >> >
>> >> >
>> >>
>> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/26289c777ec14dc5b7021b4d6b1050c5
>> >> > -rw-------   1 hadoop supergroup       1223 2013-09-21 02:42
>> >> >
>> >> >
>> >>
>> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/2757d7ba9c8448d6a3d5d46bd4d59758
>> >> > -rw-------   1 hadoop supergroup    5302248 2013-08-24 02:22
>> >> >
>> >> >
>> >>
>> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/2ec40943787246ea983608dd6591db24
>> >> > -rw-------   1 hadoop supergroup       1596 2013-08-24 03:37
>> >> >
>> >> >
>> >>
>> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/3157fd1cabe4483aaa4d9a21f75e4d88
>> >> > -rw-------   1 hadoop supergroup       1338 2013-09-22 04:25
>> >> >
>> >> >
>> >>
>> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/36b0f80a4a7b492f97358b64d879a2df
>> >> > -rw-------   1 hadoop supergroup       3264 2013-09-21 12:05
>> >> >
>> >> >
>> >>
>> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/39e249fcb532400daed73aed6689ceeb
>> >> > -rw-------   1 hadoop supergroup       4549 2013-09-21 08:56
>> >> >
>> >> >
>> >>
>> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/3bc9e2a566ad460a9b0ed336b2fb5ed9
>> >> > -rw-------   1 hadoop supergroup       1630 2013-09-22 03:22
>> >> >
>> >> >
>> >>
>> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/48026d08aae748f08aad59e4eea903be
>> >> > -rw-------   1 hadoop supergroup     105395 2013-09-20 21:12
>> >> >
>> >> >
>> >>
>> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/53198825f085401cbbd4322faa0e3aae
>> >> > -rw-------   1 hadoop supergroup       3859 2013-09-21 09:09
>> >> >
>> >> >
>> >>
>> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/71c2f9b2a8ff4c049fcc5a9a22af5cfe
>> >> > -rw-------   1 hadoop supergroup     311688 2013-09-20 21:12
>> >> >
>> >> >
>> >>
>> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/97ff16d6da974c30835c6e0acc7c737a
>> >> > -rw-------   1 hadoop supergroup       1897 2013-08-24 08:43
>> >> >
>> >> >
>> >>
>> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/a172d7577641434d82abcce88a433213
>> >> > -rw-------   1 hadoop supergroup       3380 2013-09-21 13:04
>> >> >
>> >> >
>> >>
>> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/be678e5c60534c65a012a798fbc7e284
>> >> > -rw-------   1 hadoop supergroup      43710 2013-09-22 02:15
>> >> >
>> >> >
>> >>
>> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/e2508a23acf1491f9d38b9a8594e41e8
>> >> > -rw-------   1 hadoop supergroup       5409 2013-09-21 10:10
>> >> >
>> >> >
>> >>
>> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/f432846182714b93a1c3df0f5835c09b
>> >> > -rw-------   1 hadoop supergroup        491 2013-09-24 11:18
>> >> >
>> >> >
>> >>
>> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/f7d8669cf7a047b98c1d3b13c16cfaec
>> >> > -rw-------   1 hadoop supergroup        491 2013-09-24 11:18
>> >> >
>> >> >
>> >>
>> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/fa1b8f6cc9584eb28365dcd8f10d3f0a
>> >> > -rw-------   1 hadoop supergroup        491 2013-09-13 11:28
>> >> >
>> >> >
>> >>
>> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/fca0882dc7624342a8f4fce4b89420ff
>> >> >
>> >> >
>> >> >
>> >> > On Tue, Sep 24, 2013 at 11:14 AM, Jean-Marc Spaggiari <
>> >> > jean-marc@spaggiari.org> wrote:
>> >> >
>> >> > > TTL seems to be fine.
>> >> > >
>> >> > > -1 is the default value for TimeRangeTracker.maximumTimestamp.
>> >> > >
>> >> > > Can you run:
>> >> > > hadoop fs -lsr hdfs://
>> >> > >
>> >> > >
>> >> >
>> >>
>> hdpmgr001.pse.movenetworks.com:8020/hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/
>> >> > >
>> >> > > Thanks,
>> >> > >
>> >> > > JM
>> >> > >
>> >> > >
>> >> > > 2013/9/24 Tom Brown <tombrown52@gmail.com>
>> >> > >
>> >> > > > 1. Hadoop version is 1.1.2.
>> >> > > > 2. All servers are synched with NTP.
>> >> > > > 3. Table definition is: 'compound0', {
>> >> > > > NAME => 'd',
>> >> > > > DATA_BLOCK_ENCODING => 'NONE',
>> >> > > > BLOOMFILTER => 'ROW',
>> >> > > > REPLICATION_SCOPE => '0',
>> >> > > > VERSIONS => '1',
>> >> > > > COMPRESSION => 'SNAPPY',
>> >> > > > MIN_VERSIONS => '0',
>> >> > > > TTL => '8640000',
>> >> > > > KEEP_DELETED_CELLS => 'false',
>> >> > > > BLOCKSIZE => '65536',
>> >> > > > IN_MEMORY => 'false',
>> >> > > > ENCODE_ON_DISK => 'true',
>> >> > > > BLOCKCACHE => 'true'
>> >> > > > }
>> >> > > >
>> >> > > > The TTL is supposed to be 100 days.
>> >> > > >
>> >> > > > --Tom
>> >> > > >
>> >> > > >
>> >> > > > On Tue, Sep 24, 2013 at 10:53 AM, Jean-Marc Spaggiari <
>> >> > > > jean-marc@spaggiari.org> wrote:
>> >> > > >
>> >> > > > > Another important information why might be the root
cause of
>> this
>> >> > > > issue...
>> >> > > > >
>> >> > > > > Do you have any TTL defines for this table?
>> >> > > > >
>> >> > > > > JM
>> >> > > > >
>> >> > > > >
>> >> > > > > 2013/9/24 Jean-Marc Spaggiari <jean-marc@spaggiari.org>
>> >> > > > >
>> >> > > > > > Strange.
>> >> > > > > >
>> >> > > > > > Few questions then.
>> >> > > > > > 1) What is your hadoop version?
>> >> > > > > > 2) Is the clock on all your severs synched with
NTP?
>> >> > > > > > 3) What is you table definition? Bloom filters,
etc.?
>> >> > > > > >
>> >> > > > > > This is the reason why it keep compacting:
>> >> > > > > >
>> >> > > > > > 2013-09-24 10:04:00,548 INFO
>> >> > > > >
>> org.apache.hadoop.hbase.regionserver.compactions.CompactSelection:
>> >> > > > Deleting
>> >> > > > > the expired store file by compaction: hdfs://
>> >> > > > >
>> >> > > >
>> >> > >
>> >> >
>> >>
>> hdpmgr001.pse.movenetworks.com:8020/hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/7426f128469242ec8ee09f3965fd5a1awhosemaxTimeStampis-1whilethe
max expired timestamp is 1371398640548
>> >> > > > > >
>> >> > > > > > maxTimeStamp = -1
>> >> > > > > >
>> >> > > > > >
>> >> > > > > > Each time there is a comparison between maxTimeStamp
for this
>> >> store
>> >> > > > file
>> >> > > > > > and the configured maxExpiredTimeStamp and since
maxTimeStamp
>> >> > returns
>> >> > > > -1,
>> >> > > > > > it's always elected for a compaction. Now, we need
to find
>> >> why...
>> >> > > > > >
>> >> > > > > > JM
>> >> > > > > >
>> >> > > > > >
>> >> > > > > > 2013/9/24 Tom Brown <tombrown52@gmail.com>
>> >> > > > > >
>> >> > > > > >> My cluster is fully distributed (2 regionserver
nodes).
>> >> > > > > >>
>> >> > > > > >> Here is a snippet of log entries that may explain
why it
>> >> started:
>> >> > > > > >> http://pastebin.com/wQECif8k. I had to go back
2 days to
>> find
>> >> > when
>> >> > > it
>> >> > > > > >> started for this region.
>> >> > > > > >>
>> >> > > > > >> This is not the only region experiencing this
issue (but
>> this
>> >> is
>> >> > the
>> >> > > > > >> smallest one it's happened to).
>> >> > > > > >>
>> >> > > > > >> --Tom
>> >> > > > > >>
>> >> > > > > >>
>> >> > > > > >> On Tue, Sep 24, 2013 at 10:13 AM, Jean-Marc
Spaggiari <
>> >> > > > > >> jean-marc@spaggiari.org> wrote:
>> >> > > > > >>
>> >> > > > > >> > Can you past logs a bit before that? To
see if anything
>> >> > triggered
>> >> > > > the
>> >> > > > > >> > compaction?
>> >> > > > > >> > Before the 1M compactions entries.
>> >> > > > > >> >
>> >> > > > > >> > Also, what is your setup? Are you running
in Standalone?
>> >> > > > Pseudo-Dist?
>> >> > > > > >> > Fully-Dist?
>> >> > > > > >> >
>> >> > > > > >> > Thanks,
>> >> > > > > >> >
>> >> > > > > >> > JM
>> >> > > > > >> >
>> >> > > > > >> >
>> >> > > > > >> > 2013/9/24 Tom Brown <tombrown52@gmail.com>
>> >> > > > > >> >
>> >> > > > > >> > > There is one column family, d. Each
row has about 10
>> >> columns,
>> >> > > and
>> >> > > > > each
>> >> > > > > >> > > row's total data size is less than
2K.
>> >> > > > > >> > >
>> >> > > > > >> > > Here is a small snippet of logs from
the region server:
>> >> > > > > >> > > http://pastebin.com/S2jE4ZAx
>> >> > > > > >> > >
>> >> > > > > >> > > --Tom
>> >> > > > > >> > >
>> >> > > > > >> > >
>> >> > > > > >> > > On Tue, Sep 24, 2013 at 9:59 AM,
Bharath Vissapragada <
>> >> > > > > >> > > bharathv@cloudera.com
>> >> > > > > >> > > > wrote:
>> >> > > > > >> > >
>> >> > > > > >> > > > It would help if you can show
your RS log (via
>> >> pastebin?) .
>> >> > > Are
>> >> > > > > >> there
>> >> > > > > >> > > > frequent flushes for this region
too?
>> >> > > > > >> > > >
>> >> > > > > >> > > >
>> >> > > > > >> > > > On Tue, Sep 24, 2013 at 9:20
PM, Tom Brown <
>> >> > > > tombrown52@gmail.com>
>> >> > > > > >> > wrote:
>> >> > > > > >> > > >
>> >> > > > > >> > > > > I have a region that is
very small, only 5MB.
>> Despite
>> >> it's
>> >> > > > size,
>> >> > > > > >> it
>> >> > > > > >> > has
>> >> > > > > >> > > > 24
>> >> > > > > >> > > > > store files. The logs show
that it's compacting
>> (over
>> >> and
>> >> > > over
>> >> > > > > >> > again).
>> >> > > > > >> > > > >
>> >> > > > > >> > > > > The odd thing is that even
though there are 24 store
>> >> > files,
>> >> > > it
>> >> > > > > >> only
>> >> > > > > >> > > does
>> >> > > > > >> > > > > one at a time. Even more
strange is that my logs are
>> >> > filling
>> >> > > > up
>> >> > > > > >> with
>> >> > > > > >> > > > > compacting this one region.
In the last 10 hours,
>> there
>> >> > have
>> >> > > > > been
>> >> > > > > >> > > > 1,876,200
>> >> > > > > >> > > > > log entries corresponding
to compacting this region
>> >> alone.
>> >> > > > > >> > > > >
>> >> > > > > >> > > > > My cluster is 0.94.10,
and using almost all default
>> >> > > settings.
>> >> > > > > >> Only a
>> >> > > > > >> > > few
>> >> > > > > >> > > > > are not default:
>> >> > > > > >> > > > > hbase.hregion.max.filesize
= 4294967296
>> >> > > > > >> > > > > hbase.hstore.compaction.min
= 6
>> >> > > > > >> > > > >
>> >> > > > > >> > > > > I am at a total loss as
to why this behavior is
>> >> occurring.
>> >> > > Any
>> >> > > > > >> help
>> >> > > > > >> > is
>> >> > > > > >> > > > > appreciated.
>> >> > > > > >> > > > >
>> >> > > > > >> > > > > --Tom
>> >> > > > > >> > > > >
>> >> > > > > >> > > >
>> >> > > > > >> > > >
>> >> > > > > >> > > >
>> >> > > > > >> > > > --
>> >> > > > > >> > > > Bharath Vissapragada
>> >> > > > > >> > > > <http://www.cloudera.com>
>> >> > > > > >> > > >
>> >> > > > > >> > >
>> >> > > > > >> >
>> >> > > > > >>
>> >> > > > > >
>> >> > > > > >
>> >> > > > >
>> >> > > >
>> >> > >
>> >> >
>> >>
>> >
>> >
>>
>> --
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to
>> which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified
>> that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender
>> immediately
>> and delete it from your system. Thank You.
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message