accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dickson, Matt MR" <>
Subject RE: Identify tablets with no new data loaded [SEC=UNOFFICIAL]
Date Wed, 30 Apr 2014 03:17:12 GMT

Based on this then I can query !METADATA, to get the timestamps for each rfile in a specified
table, filter it to timestamps older than a certain date and then force a compaction on those?

>From the shell I ran "scan -b 2e -c file -st" to get the timestamps for the files.  An
example result from this is:
2e;aaadfdsssdf_2 file:/t-34234afafas.rf [] 312312 519,13

So a "compact -b aaadfdsssdf_2 -e aaadfdsssdf_2" would force a compact on that rfile only?

From: David Medinets []
Sent: Wednesday, 30 April 2014 13:03
To: accumulo-user
Subject: Re: Identify tablets with no new data loaded [SEC=UNOFFICIAL]

Again, answering myself. I ran a major compaction after my insert but did not specify the
start and end values. That's why the rfile names changed and all of the timestamps.

On Tue, Apr 29, 2014 at 10:52 PM, David Medinets <<>>
Apparently using the timestamp of the !METATABLE entries won't work. I created a table with
four splits:

timestamp(56) row(2;1 file:/t-000008j/A000008n.rf [] 56 false)
timestamp(58) row(2;2 file:/t-000008g/A000004f.rf [] 58 false)
timestamp(57) row(2;3 file:/t-000008h/A000008o.rf [] 57 false)
timestamp(59) row(2;4 file:/t-000008k/A000004g.rf [] 59 false)
timestamp(60) row(2< file:/default_tablet/A000004h.rf [] 60 false)

Then I just inserted into the first split. But the timestamps of all tablets changed:

timestamp(1345) row(2;1 file:/t-000008j/A00000kj.rf [] 1345 false)
timestamp(1347) row(2;2 file:/t-000008g/A00000ha.rf [] 1347 false)
timestamp(1346) row(2;3 file:/t-000008h/A00000kk.rf [] 1346 false)
timestamp(1348) row(2;4 file:/t-000008k/A00000h8.rf [] 1348 false)
timestamp(1349) row(2< file:/default_tablet/A00000h9.rf [] 1349 false)

Hmm. I just noticed that the rfiles also changed. I did not expect that.

On Tue, Apr 29, 2014 at 10:22 PM, David Medinets <<>>
Wouldn't the timestamp of the !METATABLE entries for each tablet give the last time the tablet
was compacted since the number of entries in each tablet is tracked?

On Tue, Apr 29, 2014 at 9:41 PM, Mike Drob <<>>

It's a bit crude, but you could look at time stamps of the files in hdfs to get the time of
the last minor compact.

On Apr 29, 2014 7:35 PM, "Dickson, Matt MR" <<>>



Is there a way to identify tablets that have had no data loaded into them for a period of
time, eg 7 days?  My guess is that it this information is in the metadata table but I'm not
sure how to get it.  The reason for asking is that I'd like to be able to list these tablets
and force a compaction on them to ageoff old data.  Because no data is being added, the ageoff
never occurs and our disk space usage continues to climb.

Thanks in advance,

View raw message