Mailing-List: contact dev-help@phoenix.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@phoenix.apache.org
Date: Tue, 14 Jun 2016 23:59:20 +0000 (UTC)
From: "James Taylor (JIRA)" <jira@apache.org>
To: dev@phoenix.incubator.apache.org
Message-ID: <JIRA.12978461.1465857628000.3352.1465948760413@Atlassian.JIRA>
In-Reply-To: <JIRA.12978461.1465857628000@Atlassian.JIRA>
References: <JIRA.12978461.1465857628000@Atlassian.JIRA> <JIRA.12978461.1465857628898@arcas>
Subject: [jira] [Commented] (PHOENIX-2995) Write performance severely
 degrades with large number of views
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
archived-at: Tue, 14 Jun 2016 23:59:25 -0000


    [ https://issues.apache.org/jira/browse/PHOENIX-2995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15330892#comment-15330892 ] 

James Taylor commented on PHOENIX-2995:
---------------------------------------

Thanks, [~mujtabachohan]. What does a sample table/view DDL statement look like? Are the column names particularly long? You can take a look at the member variables in PTableImpl - does 7K or 11K per table add up? Where's all the space being used?

Once PHOENIX-2940 is in, stats won't be stored in PTable any longer. We could potentially decrease the size further (probably by 1/2) if we don't store both the String and byte[] of column names, but then GC cost would go up a bit. We usually access by String. We also have a duplicate Map by byte[] and by String for column families. We could switch to TreeMap which use less memory. Or even not have a map and let the search be linear - this is probably fine for column families.

Sounds like there's a discrepancy in the actual size versus estimated size that should be straightened out as well - would you mind filing a separate JIRA for that?

Do you know what the requirements are in terms of caching? There are likely views that are more frequently accessed than others which should mitigate this some, no? 

> Write performance severely degrades with large number of views 
> ---------------------------------------------------------------
>
>                 Key: PHOENIX-2995
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2995
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Mujtaba Chohan
>            Assignee: James Taylor
>              Labels: Argus
>         Attachments: upsert_rate.png
>
>
> Write performance for each 1K batch degrades significantly when there are *10K* views being written in random with default {{phoenix.client.maxMetaDataCacheSize}}. With all views created, upsert rate remains around 25 seconds per 1K batch i.e. ~2K rows/min upsert rate. 
> When {{phoenix.client.maxMetaDataCacheSize}} is increased to 100MB+ then view does not need to get re-resolved and upsert rate gets back to normal ~60K rows/min.
> With *100K* views and {{phoenix.client.maxMetaDataCacheSize}} set to 1GB, I wasn't able create all 100K views as upsert time for each 1K batch keeps on steadily increasing. 
> Following graph shows 1K batch upsert rate over time with variation of number of views. Rows are upserted to random views {{CREATE VIEW IF NOT EXISTS ... APPEND_ONLY_SCHEMA = true, UPDATE_CACHE_FREQUENCY=900000}} is executed before upsert statement.
> !upsert_rate.png!
> Base table is also created with {{APPEND_ONLY_SCHEMA = true, UPDATE_CACHE_FREQUENCY = 900000, AUTO_PARTITION_SEQ}}


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)