phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <>
Subject Re: [DISCUSS] Design for a "query log"
Date Fri, 02 Mar 2018 22:07:13 GMT
My gut reaction would be to avoid storing queries over the system tables 
as they would have more noise than value, but I think this is something 
that could be entertained!

On 3/2/18 8:02 AM, Artem Ervits wrote:
> +1
> Is idea here to collect information on all types of queries even against
> system tables? It would be nice to keep counts of how many times a query
> was executed over time. In my mssql days we also had ability to check when
> was the last time update stats ran and all kinds of impact information from
> that.
> On Mar 2, 2018 1:39 AM, "김영우 (YoungWoo Kim)" <> wrote:
>> Hi Josh,
>> Thanks for starting this discussion. Overall the design looks good to me!
>> Actually I have an internal implementation for logging user queries to text
>> file over PQS. Let me explain our use cases. We wanted to find out
>> following facts using our ugly hacks:
>> - Running queries against PQS
>> - Find out "WRONG" queries
>> On our systems and applications, I've realized that It's hard to control
>> unexpected large scans, full scans or running queries without indexes
>> against very large table on HBase/Phoenix. Sometimes users forgot to add
>> the "WHERE" clause for queries :-) And also we could not find an effective
>> way to stop or cancel the wrong queries immediately. Therefore, we should
>> find out the users and heavy queries from the query logs  and then we
>> noticed users about the impacts of their queries -- Please try this at
>> home, not production cluster :-)
>> Definitely it would be nice to have query logging out of the box! And if
>> the features are implemented like your proposal, we can replace an ugly
>> hacks with built-in feature from our cluster.
>> Thanks,
>> - Youngwoo
>> On Fri, Mar 2, 2018 at 10:05 AM, Josh Elser <> wrote:
>>> Any feedback from folks? Not sure if the silence should be interpreted
>>> as ambivalence or plain old being busy :)
>>> On Mon, Feb 26, 2018 at 4:57 PM, Josh Elser <> wrote:
>>>> Hiya,
>>>> I wanted to share this little design doc with you about some feature
>> work
>>>> we've been thinking about. The following is a Google doc in which
>> anyone
>>>> should be allowed to comment. Feel free to comment there, or here on
>> the
>>>> thread.
>>>> The high-level goal is to create a construct in which Phoenix clients
>>> will
>>>> automatically serialize information about the queries they run to a
>> table
>>>> for retrospective analysis. Ideally, this information would be stored
>> in
>>> a
>>>> Phoenix table. We want this data to help answer questions like:
>>>> * What queries are running against my system
>>>> * What specific queries started between 535AM and 620AM two days ago
>>>> * What queries are user "bob" running
>>>> * Are my user's queries effectively using the indexes in the system
>>>> Anti-goals for include:
>>>> * Cluster impact (computation/memory) usage of a query
>>>> * Query performance may be slowed to ensure all data is serialized
>>>> * A third-party service dedicated to ensuring query info is serialized
>>> (in
>>>> the event of client failure)
>>>> Take a look at the document and let us know what you think please. I'm
>>> happy
>>>> to try to explain this in greater detail.
>>>> - Josh (on behalf of myself, Ankit, Rajeshbabu, and Sergey)

View raw message