phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Taylor (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PHOENIX-2995) Write performance severely degrades with large number of views
Date Tue, 14 Jun 2016 23:59:20 GMT

    [ https://issues.apache.org/jira/browse/PHOENIX-2995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15330892#comment-15330892
] 

James Taylor commented on PHOENIX-2995:
---------------------------------------

Thanks, [~mujtabachohan]. What does a sample table/view DDL statement look like? Are the column
names particularly long? You can take a look at the member variables in PTableImpl - does
7K or 11K per table add up? Where's all the space being used?

Once PHOENIX-2940 is in, stats won't be stored in PTable any longer. We could potentially
decrease the size further (probably by 1/2) if we don't store both the String and byte[] of
column names, but then GC cost would go up a bit. We usually access by String. We also have
a duplicate Map by byte[] and by String for column families. We could switch to TreeMap which
use less memory. Or even not have a map and let the search be linear - this is probably fine
for column families.

Sounds like there's a discrepancy in the actual size versus estimated size that should be
straightened out as well - would you mind filing a separate JIRA for that?

Do you know what the requirements are in terms of caching? There are likely views that are
more frequently accessed than others which should mitigate this some, no? 

> Write performance severely degrades with large number of views 
> ---------------------------------------------------------------
>
>                 Key: PHOENIX-2995
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2995
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Mujtaba Chohan
>            Assignee: James Taylor
>              Labels: Argus
>         Attachments: upsert_rate.png
>
>
> Write performance for each 1K batch degrades significantly when there are *10K* views
being written in random with default {{phoenix.client.maxMetaDataCacheSize}}. With all views
created, upsert rate remains around 25 seconds per 1K batch i.e. ~2K rows/min upsert rate.

> When {{phoenix.client.maxMetaDataCacheSize}} is increased to 100MB+ then view does not
need to get re-resolved and upsert rate gets back to normal ~60K rows/min.
> With *100K* views and {{phoenix.client.maxMetaDataCacheSize}} set to 1GB, I wasn't able
create all 100K views as upsert time for each 1K batch keeps on steadily increasing. 
> Following graph shows 1K batch upsert rate over time with variation of number of views.
Rows are upserted to random views {{CREATE VIEW IF NOT EXISTS ... APPEND_ONLY_SCHEMA = true,
UPDATE_CACHE_FREQUENCY=900000}} is executed before upsert statement.
> !upsert_rate.png!
> Base table is also created with {{APPEND_ONLY_SCHEMA = true, UPDATE_CACHE_FREQUENCY =
900000, AUTO_PARTITION_SEQ}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message