Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 4DC97200B33 for ; Wed, 15 Jun 2016 01:59:25 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 4C4F2160A56; Tue, 14 Jun 2016 23:59:25 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 93F60160A06 for ; Wed, 15 Jun 2016 01:59:24 +0200 (CEST) Received: (qmail 76224 invoked by uid 500); 14 Jun 2016 23:59:23 -0000 Mailing-List: contact dev-help@phoenix.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@phoenix.apache.org Delivered-To: mailing list dev@phoenix.apache.org Received: (qmail 76213 invoked by uid 99); 14 Jun 2016 23:59:23 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 14 Jun 2016 23:59:23 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 31C561A0CE6 for ; Tue, 14 Jun 2016 23:59:23 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -4.646 X-Spam-Level: X-Spam-Status: No, score=-4.646 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-1.426] autolearn=disabled Received: from mx2-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id kFeVgkQujGNd for ; Tue, 14 Jun 2016 23:59:22 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx2-lw-eu.apache.org (ASF Mail Server at mx2-lw-eu.apache.org) with SMTP id 42E2C5FB2C for ; Tue, 14 Jun 2016 23:59:21 +0000 (UTC) Received: (qmail 76134 invoked by uid 99); 14 Jun 2016 23:59:20 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 14 Jun 2016 23:59:20 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 65A8D2C033A for ; Tue, 14 Jun 2016 23:59:20 +0000 (UTC) Date: Tue, 14 Jun 2016 23:59:20 +0000 (UTC) From: "James Taylor (JIRA)" To: dev@phoenix.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (PHOENIX-2995) Write performance severely degrades with large number of views MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 14 Jun 2016 23:59:25 -0000 [ https://issues.apache.org/jira/browse/PHOENIX-2995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15330892#comment-15330892 ] James Taylor commented on PHOENIX-2995: --------------------------------------- Thanks, [~mujtabachohan]. What does a sample table/view DDL statement look like? Are the column names particularly long? You can take a look at the member variables in PTableImpl - does 7K or 11K per table add up? Where's all the space being used? Once PHOENIX-2940 is in, stats won't be stored in PTable any longer. We could potentially decrease the size further (probably by 1/2) if we don't store both the String and byte[] of column names, but then GC cost would go up a bit. We usually access by String. We also have a duplicate Map by byte[] and by String for column families. We could switch to TreeMap which use less memory. Or even not have a map and let the search be linear - this is probably fine for column families. Sounds like there's a discrepancy in the actual size versus estimated size that should be straightened out as well - would you mind filing a separate JIRA for that? Do you know what the requirements are in terms of caching? There are likely views that are more frequently accessed than others which should mitigate this some, no? > Write performance severely degrades with large number of views > --------------------------------------------------------------- > > Key: PHOENIX-2995 > URL: https://issues.apache.org/jira/browse/PHOENIX-2995 > Project: Phoenix > Issue Type: Bug > Reporter: Mujtaba Chohan > Assignee: James Taylor > Labels: Argus > Attachments: upsert_rate.png > > > Write performance for each 1K batch degrades significantly when there are *10K* views being written in random with default {{phoenix.client.maxMetaDataCacheSize}}. With all views created, upsert rate remains around 25 seconds per 1K batch i.e. ~2K rows/min upsert rate. > When {{phoenix.client.maxMetaDataCacheSize}} is increased to 100MB+ then view does not need to get re-resolved and upsert rate gets back to normal ~60K rows/min. > With *100K* views and {{phoenix.client.maxMetaDataCacheSize}} set to 1GB, I wasn't able create all 100K views as upsert time for each 1K batch keeps on steadily increasing. > Following graph shows 1K batch upsert rate over time with variation of number of views. Rows are upserted to random views {{CREATE VIEW IF NOT EXISTS ... APPEND_ONLY_SCHEMA = true, UPDATE_CACHE_FREQUENCY=900000}} is executed before upsert statement. > !upsert_rate.png! > Base table is also created with {{APPEND_ONLY_SCHEMA = true, UPDATE_CACHE_FREQUENCY = 900000, AUTO_PARTITION_SEQ}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)