accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dickson, Matt MR" <matt.dick...@defence.gov.au>
Subject Efficient Tablet Merging [SEC=UNOFFICIAL]
Date Wed, 02 Oct 2013 03:58:37 GMT
UNOFFICIAL

I have a table that we create splits of the form yyyymmdd-nnnn where nnnn ranges from 0000
to 0840.  The bulk of our data is loaded for the current date with no data loaded for days
older than 3 days so from my understanding it would be wise to merge splits older than 3 days
in order to reduce the overall tablet count.  It would still be optimal to maintain some distribution
of tablets for a day across the cluster so I'm looking at merging splits in 10 increments
eg, merge -b 20130901-0000 -e 20130901-0009, therefore reducing 840 splits per day to 84.

Currently we have 120K tablets (size 1G) on a cluster of 56 nodes and our ingest has slowed
as the data quantity and tablet count has grown.  Initialy we were achieving 200-300K, now
50-100K.

My question is, what is the best way to do this merge?  Should we use the merge command with
the size option set at something like 5G, or maybe use the compaction command?

>From my tests this process could take some time so I'm keen to understand the most efficient
approach.

Thanks in advance,
Matt Dickson

Mime
View raw message