accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Keith Turner (Commented) (JIRA)" <>
Subject [jira] [Commented] (ACCUMULO-348) Adding splits to table via the shell with addsplits is very slow when adding a lot of split points
Date Fri, 30 Mar 2012 18:32:27 GMT


Keith Turner commented on ACCUMULO-348:

I put together a workaround for 1.3.5 and 1.4.0 and posted it on github.  This adds lots of
splits to a table much faster.

While testing this I discovered more about why adding lots of splits is slow and another workaround.
 While trying to add 99,999 splits to a table using the addsplits command in the shell, I
noticed on the monitor page that the rate seemed to be slowing down.  I used jstack to look
at the process adding split points and noticed the stack traces were always doing metadata
lookups.  After a split the client has to refresh its tablet location cache by looking in
the metadata table.  I went to the tablet server and saw that metadata lookups were taking
more than a quater second.

30 17:36:09,458 [tabletserver.TabletServer] DEBUG: MultiScanSess 4 entries
in 0.29 secs (lookup_time:0.29 secs tablets:1 ranges:1)

I thought about why this was going on and it occurred to me that the code was always splitting
the last tablet.  This meant that columns in the metadata table were always getting updated
and therefore had lots of versions.  These versions were all kept in memory and suppressed
by the versioning iterator.  About 60k tablets had been added.  I knew if I flushed the metadata
table, it would get rid of all of these version.  Below is the minor compaction caused by
flushing the metadata table.   It read 1.4M and wrote 724K, so it dropped almost 700K old

30 17:36:09,698 [tabletserver.Compactor] DEBUG: Compaction !0;~;p\\;3c7 1,394,754 read | 724,252
written | 581,874 entries/sec |  2.397 secs

After the flush metadata lookups by the client doing the split were much faster and the rate
of adding splits shot up.

30 17:36:09,773 [tabletserver.TabletServer] DEBUG: MultiScanSess 4 entries
in 0.00 secs (lookup_time:0.00 secs tablets:1 ranges:1)

So another work around is to periodically flush the metadata table when adding lots of splits.

> Adding splits to table via the shell with addsplits is very slow when adding a lot of
split points
> --------------------------------------------------------------------------------------------------
>                 Key: ACCUMULO-348
>                 URL:
>             Project: Accumulo
>          Issue Type: Improvement
>    Affects Versions: 1.3.5
>            Reporter: Dave Marion
>            Priority: Minor
>             Fix For: 1.5.0

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message