phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Maryann Xue (JIRA)" <>
Subject [jira] [Commented] (PHOENIX-4666) Add a subquery cache that persists beyond the life of a query
Date Fri, 13 Apr 2018 07:55:00 GMT


Maryann Xue commented on PHOENIX-4666:

Thank you very much for your work, [~ortutay]! Here's a few things on the high-level:
1. First of all, I think it's important that we have an option to enable and disable the persistent
cache, making sure that users can still run join queries in the default temp-cache way.
2. Regarding to your change [2], can you explain what exactly is the problem of key-range
generation? Looks like checkCache() and addCache() are doing redundant work, and CachedSubqueryResultIterator
should be unnecessary. We do not wish to read the cache on the client side and then re-add
the cache again.
3. We need to be aware that the string representation of the sub-query statement is not reliable,
which means the same join-tables or sub-queries do not necessarily map to the same string
representation, and thus will have different generated cache-id. It'd be optimal if we can
have some normalization here. We can consider leaving this as a future improvement, yet at
this point we'd better have some test cases (counter cases as well) to cover this point.
4. Is there a way for us to update the cache content if tables have been updated? This might
be related to what approach we take to add and re-validate cache in (2).
5. A rather minor point as it just occurred to me: Can we have CacheEntry implement Closable?
Lastly, I understand that it's work in progress, but as we move on, could you please do a
little clean-up so it would be easier for discussions and code reviews? For example, correct
the indentation (make sure there's no tabs); instead of commenting out a line of code, can
you just remove it; and get rid of all "system.out.println" or replace them with logging if

> Add a subquery cache that persists beyond the life of a query
> -------------------------------------------------------------
>                 Key: PHOENIX-4666
>                 URL:
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Marcell Ortutay
>            Assignee: Marcell Ortutay
>            Priority: Major
> The user list thread for additional context is here: []
> ----
> A Phoenix query may contain expensive subqueries, and moreover those expensive subqueries
may be used across multiple different queries. While whole result caching is possible at the
application level, it is not possible to cache subresults in the application. This can cause
bad performance for queries in which the subquery is the most expensive part of the query,
and the application is powerless to do anything at the query level. It would be good if Phoenix
provided a way to cache subquery results, as it would provide a significant performance gain.
> An illustrative example:
>     SELECT * FROM table1 JOIN (SELECT id_1 FROM large_table WHERE x = 10) expensive_result
ON table1.id_1 = expensive_result.id_2 AND table1.id_1 = \{id}
> In this case, the subquery "expensive_result" is expensive to compute, but it doesn't
change between queries. The rest of the query does because of the \{id} parameter. This means
the application can't cache it, but it would be good if there was a way to cache expensive_result.
> Note that there is currently a coprocessor based "server cache", but the data in this
"cache" is not persisted across queries. It is deleted after a TTL expires (30sec by default),
or when the query completes.
> This is issue is fairly high priority for us at 23andMe and we'd be happy to provide
a patch with some guidance from Phoenix maintainers. We are currently putting together a design
document for a solution, and we'll post it to this Jira ticket for review in a few days.

This message was sent by Atlassian JIRA

View raw message