accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Newton (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (ACCUMULO-3901) tserver.tablet.split.midpoint.files.max default value is probably too small
Date Tue, 23 Jun 2015 20:42:43 GMT

     [ https://issues.apache.org/jira/browse/ACCUMULO-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Eric Newton resolved ACCUMULO-3901.
-----------------------------------
    Resolution: Fixed

> tserver.tablet.split.midpoint.files.max default value is probably too small
> ---------------------------------------------------------------------------
>
>                 Key: ACCUMULO-3901
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-3901
>             Project: Accumulo
>          Issue Type: Bug
>          Components: tserver
>            Reporter: Eric Newton
>            Assignee: Eric Newton
>            Priority: Minor
>             Fix For: 1.8.0
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> On a large cluster, 50K files were bulk loaded into a single tablet.
> This is bad, and not a result of "normal" ingest.
> Each file was fairly small (50-100K).
> Once loaded, the tablet server decided to try and split the tablet.  Due to the number
of files, the tablet server attempted to determine the split files using multiple passes.
 This was taking a very long time, and held a tablet lock, preventing additional bulk imports.
> In desperation, we set tserver.tablet.split.midpoint.files.max and restarted the tablet
server. The tablet was re-hosted elsewhere, and the multi-pass approach was not used.  In
a few minutes, the tablet was examined and split.
> So, using tserver.tablet.split.midpoint.files.max=55000 works perfectly well. Of course
this is on production nodes, and we tend to make the default settings appropriate for a single-node
development system.
> Suggest that we update the default for this setting to be at least 300 without concern.
> I spoke offline with [~kturner], who confirms that the original default was arbitrarily
chosen.
> Examining other production systems, the multi-pass approach is being used more often
than expected, probably as a result of depending on massive numbers of bulk imports.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message