hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From S L <slouie.at.w...@gmail.com>
Subject Re: How does changing TTL affect compaction
Date Wed, 19 Jul 2017 17:54:47 GMT
Thanks Ted for answering all my questions this week.

I don't remember when the change was made.  I want to say it occurred 4
months ago.  There are a lot of these RetriesExhaustedException errors so I
can't say what percentage of these "bad" rowkeys are before/after the
estimated change.  I don't think the cluster has been restarted after this
change though, although, I could be wrong.

Good catch on the mismatch between the server log and the Exception log.  I
logged into the wrong server and couldn't find the string but I wanted to
show the cluster was running major compaction and just randomly showed a
snippet from the log.  Here is the log from the correct server showing it
trying to delete the rowkey

>>>>>  from region server log snippet
hbase-cmf-hbase-REGIONSERVER-hslave35118.ams9.mydomain.com.log.out.1:2017-07-18
16:39:06,078 INFO org.apache.hadoop.hbase.regionserver.HStore: Completed
major compaction of 5 (all) file(s) in raw_data of
dbi_based_data,170_p3431.db160044.hkg1.dbi_1476454680,1490311699762.e00f8737bfc5010b6bd9be11520ca1d3.
into 7df0e1450cec483da48f93c87f1c044f(size=3.3 G), total size for store is
3.3 G. This selection was in queue for 0sec, and took 1mins, 47sec to
execute.

hbase-cmf-hbase-REGIONSERVER-hslave35118.ams9.mydomain.com.log.out.6:2017-07-14
22:49:48,808 INFO org.apache.hadoop.hbase.regionserver.HStore: Completed
major compaction of 4 (all) file(s) in raw_data of
dbi_based_data,170_p3431.db160044.hkg1.dbi_1476454680,1490311699762.e00f8737bfc5010b6bd9be11520ca1d3.
into bc26bfcd2e074bd4a73901500e43840c(size=3.3 G), total size for store is
3.3 G. This selection was in queue for 30mins, 20sec, and took 2mins, 12sec
to execute

>>>>>> end of region server log.


Regarding the parts from the Exception log, I'll look at the files you
advised me to look at and if I still have questions, I'll send another
question to the user list.

Thanks again.


On Wed, Jul 19, 2017 at 8:22 AM, Ted Yu <yuzhihong@gmail.com> wrote:

> bq. at some point we changed it to 30.
>
> Do you remember when the change was made ?
> After the change, was there server restart ?
>
> You showed part of log for 35188 but the RetriesExhaustedException was for
> 35118. Was this due to redaction or were you checking a different server ?
>
> The RetriesExhaustedException happend at Jul 18 13:49:59 PDT 2017 which
> didn't match the server log snippet (2017-07-14
> 22:25:58,692)
>
> The coloring didn't go thru.
>
> For SocketTimeoutException, please
> see hbase-client/src/main/java/org/apache/hadoop/hbase/
> client/RpcRetryingCaller.java
> around line 169 (in branch-1.3).
>
> For the remaining exception message,
> see getExceptionMessageAdditionalDetail()
> in hbase-client/src/main/java/org/apache/hadoop/hbase/
> client/RegionServerCallable.java
>
> On Tue, Jul 18, 2017 at 10:48 PM, S L <slouie.at.work@gmail.com> wrote:
>
> > Just wondering how changing TTL on a table affects minor and major
> > compaction when it's trying to combine/delete things.
> >
> > There is a rowkey that shows up when my program does a scan.  According
> to
> > the TTL, it shouldn't be there but it is.  It gives me
> > RetriesExhaustedException.  Looking at the table description, the TTL is
> 30
> > days.  We initially had it at 14 days, then at some point we changed it
> to
> > 30.
> >
> > Table description
> > describe 'dbi_based_data'
> > Table dbi_based_data is ENABLED
> >
> >
> > dbi_based_data
> >
> > COLUMN FAMILIES DESCRIPTION
> >
> >
> > {NAME => 'raw_data', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW',
> > REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'SNAPPY',
> > MIN_VERSIONS => '0', T
> >
> > TL => '2592000 SECONDS (30 DAYS)', KEEP_DELETED_CELLS => 'FALSE',
> BLOCKSIZE
> > => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
> >
> > 1 row(s) in 0.5310 seconds
> >
> > The rowkey exception shows up as follows in the hadoop output:
> >
> > Error: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed
> > after attempts=36, exceptions: Tue Jul 18 13:49:59 PDT 2017, null,
> > java.net.SocketTimeoutException: callTimeout=75000, callDuration=75301:
> > row
> > '170_p3431.db160044.hkg1.dbi_1476454680' on table 'dbi_based_data' at
> > region=dbi_based_data,170_p3431.db160044.hkg1.dbi_
> > 1476454680,1490311699762.e00f8737bfc5010b6bd9be11520ca1d3.,
> > hostname=hslave35118.ams9.mydomain.com,60020,1483570489305,
> > seqNum=16311452
> >
> > I went to the region server above to look at the logs
> > (/var/log/hbase/hbase*.out).  I didn't find anything with that rowkey
> and I
> > can see it is performing major_compaction on other elements.
> >
> > hbase-cmf-hbase-REGIONSERVER-hslave35188.ams9.mydomain.com.
> > log.out.7:2017-07-14
> > 22:25:58,692 INFO org.apache.hadoop.hbase.regionserver.HStore: Completed
> > major compaction of 4 (all) file(s) in raw_data of
> > dbi_based_data,109_p3400.db160093.sjc5.dbi_1493088600,1495713795634.
> > 358bee5b1db95d10e7533c7fe0365c0e.
> > into 51a8c31e32b8445eb781bdc580c2d5d1(size=4.6 G), total size for store
> is
> > 4.6 G. This selection was in queue for 6mins, 21sec, and took 2mins,
> 21sec
> > to execute.
> >
> > Two questions:
> > 1)   Can this change in TTL make some keys not get deleted due to timing
> or
> > whatever other reason?  If so, how do I delete these undeleteable keys.
> >
> > 2) What is the name of the java file that outputs the exception message
> > above?  What do the different fields from the output mean?
> > Tue Jul 18 13:49:59 PDT 2017, null, java.net.SocketTimeoutException:
> > callTimeout=75000, callDuration=75301: row
> > '170_p3431.db160044.hkg1.dbi_1476454680' on table 'dbi_based_data' at
> > region=dbi_based_data,170_p3431.db160044.hkg1.dbi_
> > 1476454680,1490311699762.
> > e00f8737bfc5010b6bd9be11520ca1d3., hostname=hslave35118.ams9.
> mydomain.com
> > ,60020,1483570489305, seqNum=16311452
> >
> >
> > I've color coded the fields that I don't know what they mean.  Hopefully
> > you can see the color.
> > a) What does "null" mean (light blue)?
> > b) The yellow is the rowkey but the red is some timestamp that I don't
> > know.  The red is from March 2017 so it's way past the 30 day TTL.
> > c) Don't know what the purple alphanumeric string is but it might be the
> > region name?
> > d) The orange timestamp again is past the 30day TTL and from Jan 2017 but
> > again, I don't know what it is from/for.
> >
> > Thanks in advance for your help.
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message