accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Newton <>
Subject Re: How to pre-split a table for UUID rowkeys
Date Fri, 02 Aug 2013 22:35:15 GMT
Apparently 5M 1K documents isn't enough to split the tablet.  I'm guessing
that your documents are compressing well, or you are able to fit them all
in memory.  You could try flushing the table and see if it splits.

shell > flush -t table -w

Or, you could just add splits if you know the UUIDs are uniformly

shell > addsplits -t table 1 2 3 4 5 6 7 8 9 a b c d e f

Or, if you just want accumulo to split at a certain size under the 1G

shell > config -t table -s table.split.threshold=10M


On Fri, Aug 2, 2013 at 5:41 PM, Terry P. <> wrote:

> Greetings folks,
> Have a bit of a non-typical Accumulo use case using Accumulo as a backend
> data store for a search index to provide fault tolerance should the index
> get corrupted.  Max docs stored in Accumulo will be under 1 billion at full
> volume.
> The search index is used to "find" the data a user is interested in, and
> the search index then retrieves the document from Accumulo using its RowKey
> which was gotten from the search index.  The RowKey is a java.util.UUID
> string that has had the '-' dashes stripped out.
> I have a 3 node cluster and as a quick test have ingested 5 million 1K
> documents into it, yet they all went to a single TabletServer.  I was kind
> of surprised -- I knew this would be the case for a row key using a
> monotonically increasing number, but I thought with a UUID type rowkey the
> entries would have been spread across the TabletServers at least some, even
> without pre-splitting the table.
> Clearly my understanding of how Accumulo spreads the data out is lacking.
>  Can anyone shed more light on it?  And possibly recommend a table split
> strategy for a 3-node cluster such as I have described?
> Many thanks in advance,
> Terry

View raw message