Return-Path: X-Original-To: apmail-accumulo-notifications-archive@minotaur.apache.org Delivered-To: apmail-accumulo-notifications-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 32876DC23 for ; Tue, 18 Dec 2012 18:54:15 +0000 (UTC) Received: (qmail 18525 invoked by uid 500); 18 Dec 2012 18:54:15 -0000 Delivered-To: apmail-accumulo-notifications-archive@accumulo.apache.org Received: (qmail 18379 invoked by uid 500); 18 Dec 2012 18:54:14 -0000 Mailing-List: contact notifications-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: jira@apache.org Delivered-To: mailing list notifications@accumulo.apache.org Received: (qmail 18347 invoked by uid 99); 18 Dec 2012 18:54:13 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Dec 2012 18:54:13 +0000 Date: Tue, 18 Dec 2012 18:54:13 +0000 (UTC) From: "Christopher Tubbs (JIRA)" To: notifications@accumulo.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (ACCUMULO-416) reevaluate limiting the number of open files given HDFS improvements MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/ACCUMULO-416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christopher Tubbs updated ACCUMULO-416: --------------------------------------- Fix Version/s: 1.5.0 > reevaluate limiting the number of open files given HDFS improvements > -------------------------------------------------------------------- > > Key: ACCUMULO-416 > URL: https://issues.apache.org/jira/browse/ACCUMULO-416 > Project: Accumulo > Issue Type: Improvement > Components: tserver > Reporter: Adam Fuchs > Assignee: Keith Turner > Fix For: 1.5.0 > > > Tablet servers limit the number of files that can be opened for scans and for major compactions. The two main reasons for this limit was to reduce our impact on HDFS, primarily regarding connections to data nodes, and to limit our memory usage related to preloading file indexes. A third reason might be that disk thrashing could become a problem if we try to read from too many places at once. > Two improvements may have made (or may soon make) this limit obsolete: HDFS now pools connections, and RFile now uses a multi-level index. With these improvements, is it reasonable to lift some of our open file restrictions? The tradeoff on query side might be availability vs. overall resource usage. On the compaction side, the tradeoff is probably write replication vs. thrashing on reads. I think we can make an argument that queries should be available at almost any cost, but the compaction tradeoff is not as clear. We should test the efficiency of compacting a large number of files to get a better feeling for how the two extremes effect read and write performance across the system. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira