From notifications-return-44141-archive-asf-public=cust-asf.ponee.io@accumulo.apache.org  Wed Jul 18 17:58:15 2018
Return-Path: <notifications-return-44141-archive-asf-public=cust-asf.ponee.io@accumulo.apache.org>
X-Original-To: archive-asf-public@cust-asf.ponee.io
Delivered-To: archive-asf-public@cust-asf.ponee.io
Received: from mail.apache.org (hermes.apache.org [140.211.11.3])
	by mx-eu-01.ponee.io (Postfix) with SMTP id 42AD6180636
	for <archive-asf-public@cust-asf.ponee.io>; Wed, 18 Jul 2018 17:58:15 +0200 (CEST)
Received: (qmail 22457 invoked by uid 500); 18 Jul 2018 15:58:14 -0000
Mailing-List: contact notifications-help@accumulo.apache.org; run by ezmlm
Precedence: bulk
List-Help: <mailto:notifications-help@accumulo.apache.org>
List-Unsubscribe: <mailto:notifications-unsubscribe@accumulo.apache.org>
List-Post: <mailto:notifications@accumulo.apache.org>
List-Id: <notifications.accumulo.apache.org>
Reply-To: jira@apache.org
Delivered-To: mailing list notifications@accumulo.apache.org
Received: (qmail 22446 invoked by uid 99); 18 Jul 2018 15:58:14 -0000
Received: from ec2-52-202-80-70.compute-1.amazonaws.com (HELO gitbox.apache.org) (52.202.80.70)
    by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Jul 2018 15:58:14 +0000
From: GitBox <git@apache.org>
To: notifications@accumulo.apache.org
Subject: [GitHub] keith-turner opened a new issue #564: Add multiple compaction
 thread pools and allow multiple compactions per tablet
Message-ID: <153192949367.25853.5829216658486538049.gitbox@gitbox.apache.org>
Date: Wed, 18 Jul 2018 15:58:13 -0000
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit

keith-turner opened a new issue #564: Add multiple compaction thread pools and allow multiple compactions per tablet
URL: https://github.com/apache/accumulo/issues/564
 
 
   Currently there is a single thread pool/executor for compactions and only a single compaction can run per tablet.  This can cause problems when a user initiates a single long running filter or transform compaction because new files build up and are not compacted.  Ideally a long running compaction for a tablet could run in executor1 while new tablets files are compacted in executor2.
   
   The current user pluggable CompactionStrategy class is not well suited for handling this case of multiple executors and  compactions per tablet.  The following design is better suited for mananging this concurrency in a way that is easy to understand.  In this design the CompactionManger and CompactionPrioritizer are user pluggable.  Currently, prioritization of queued compaction are not configurable. 
   
   | Functional components | Description |
   |-----------------------|-------------|
   | CompactionJob         | Immutable class that describes work to be done.  Contains list of files to compact, info about iterators for user compactions, info about output file (like compression type). |
   | CompactionManager  | Per table class that decides what compactions to do for a tablet. Can create and cancel compactions jobs.  Can see list of existing jobs.  Can submit multiple jobs for a table as long as files are disjoint. This class decides which executor should process a job.   |
   | CompactionPrioritizer | Per executor class that decides which compaction job to execute next. |
   | CompactionExecutor    | Each tablet server has one or more executors that process compaction jobs.  These are configured system wide. Number of threads, rate limits, max file per compaction are some things that can be configured.  If a job exceeds the max files, then the executor will process it in multiple passes.|
   
   One major goal with this design is to make it easy for the user to write code that avoids concurrency mayhem.  The idea underlying this that a compaction manager will be called in the following way.
   
    * System gathers a snapshot of tablet files and current compaction jobs.
    * System calls compaction manger with gathered snapshots.
    * The compaction manager returns jobs to cancel and new jobs to run.
    * If the set of files and/or jobs has changed the decisions are ignored and the manager is called again.
   
   With this model the prioritizer is dealing with immutable jobs that will not magically change when its time to run the job (how current compaction strategy works).  This makes reasoning about creating, canceling, and prioritizing jobs sane.
   
   The following is an example of how this might work.  In this example assume executor E1 is intended for small compactions and executor E2 is for large compactions. Small vs large could be a function of the input file sizes.
   
    * Tablet T1 has three files F1,F2,F3
    * Compaction manger decides to compact F2 and F3 on executor E1 as job J1
    * A new file F4 is added to T1
    * J1 is still queued on E1
    * Compaction manger decides to cancel J1 and compact F1,F2,F3,andF4 on executor E2 as J2.
    * Nothing changed, so J1 is canceled and J2 is submitted. 
   
   For user initiated compactions, compaction strategies would still be used for compatibility.  The behavior should be the following :
     * Cancel existing queued jobs (that are system initiated) and prevent more jobs from qeueing
     * Wait for any running jobs to complete
     * Apply the users strategy and create a job.
     * Ask the compaction manager which executor the job should be queued on. 
   
   
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services