Return-Path: X-Original-To: apmail-accumulo-notifications-archive@minotaur.apache.org Delivered-To: apmail-accumulo-notifications-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 03FE318763 for ; Mon, 14 Mar 2016 17:55:34 +0000 (UTC) Received: (qmail 71969 invoked by uid 500); 14 Mar 2016 17:55:33 -0000 Delivered-To: apmail-accumulo-notifications-archive@accumulo.apache.org Received: (qmail 71823 invoked by uid 500); 14 Mar 2016 17:55:33 -0000 Mailing-List: contact notifications-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: jira@apache.org Delivered-To: mailing list notifications@accumulo.apache.org Received: (qmail 71749 invoked by uid 99); 14 Mar 2016 17:55:33 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 14 Mar 2016 17:55:33 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 972112C1F58 for ; Mon, 14 Mar 2016 17:55:33 +0000 (UTC) Date: Mon, 14 Mar 2016 17:55:33 +0000 (UTC) From: "Keith Turner (JIRA)" To: notifications@accumulo.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (ACCUMULO-4164) Avoid copy of RFile Index blocks when in cache MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Keith Turner created ACCUMULO-4164: -------------------------------------- Summary: Avoid copy of RFile Index blocks when in cache Key: ACCUMULO-4164 URL: https://issues.apache.org/jira/browse/ACCUMULO-4164 Project: Accumulo Issue Type: Improvement Affects Versions: 1.7.1, 1.6.5 Reporter: Keith Turner Fix For: 1.6.6, 1.7.2, 1.8.0 I have been doing performance experiments with RFile. During the course of these experiments I noticed that RFile is not as fast at it should be in the case where index blocks are in cache and the RFile is not already open. The reason is that the RFile code copies and deserializes the index data even though its already in memory. I made the following change to RFile in a branch. * Avoid copy of index data when its in cache * Deserialize offsets lazily (instead of upfront) during binary search * Stopped calling lots of synchronized methods during deserialization of index info. The existing code use ByteArrayInputStream which results in lots of fine grained synchronization. Switching to an inputstream that offers the same functionality w/o sync showed a measurable performance difference. These changes lead to performance in the following two situations : * When an RFiles data is in cache, but its not open on the tserver. * For RFiles with multilevel indexes with index data in cache. Currently an open RFile only keeps the root node in memory. Lower level index nodes are always read from the cache or DFS. The changes I made would always avoid the copy and deserialization of lower level index nodes when in cache. I have seen significant performance improvements testing with the two cases above. My test are currently based on a new API I am creating for RFile, so I can not easily share them until I get that pushed. For the case where a tserver has all files frequently in use already open and those files have a single level index, these changes should not make a significant performance difference. These change should result in less memory use for opening the same rfile multiple times for different scans (when data is in cache). In this case all of the RFiles would share the same byte array holding the serialized index data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)