lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ramkumar Aiyengar (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SOLR-6275) Improve accuracy of QTime reporting
Date Mon, 28 Jul 2014 04:29:39 GMT

    [ https://issues.apache.org/jira/browse/SOLR-6275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14075884#comment-14075884
] 

Ramkumar Aiyengar edited comment on SOLR-6275 at 7/28/14 4:28 AM:
------------------------------------------------------------------

bq. Hmmm, I'm not sure I agree. QTime is useful for knowing how long the lower level stuff
took. Thus it's useful as it stands.

Can you elaborate what you mean by "lower level stuff"? If you are intending QTime to reflect
time taken by Lucene, i.e. roughly the time taken by the top-level collector to collect the
results needed, QTime covers way more than that already, including time taken for query parsing
and finishing up just in the non-distributed case. In the distributed case, it also covers
time taken waiting on the shard handler factory pool, network latency, any servlet container
pooling time at the processing shards, time taken waiting for the federating node to take
all responses, and time taken to merge all responses.

QTime as it stands is currently defined from the time the {{SolrQueryRequest}} object is created
till the response is rendered, which is hard to associate any semantic meaning to, and it
roughly is all steps except the logic required for resolving the core to send the request
to. All I am doing is adding that as well, to logically mean (again roughly, but less roughly)
"time taken by the servlet to service the request".

bq. I mean we're talking human-time here. When I'm asking how long a query took, my frail
human system is not capable of noticing a difference of a millisecond or two. nanoTime is
not necessarily all that accurate either. Sure, it gives a lot or precision, but that's different
from accuracy.

The problem is not because of millisecond vs. nanosecond accuracy. {{currentTimeMillis}} represents
the wall clock of the system and is subject to issues like clock skew. For example, if NTP
resets the time or if the sys admin changes the time for some reason or if some other action
changes the wall time, the difference between two such measurements can be totally incorrect
(including being negative). Which is why {{nanoTime}} is preferred for timing measurements
as it's guaranteed to be monotonic (where the OS supports it, i.e. everywhere except some
older versions of Windows). See SOLR-5734 for some context where we changed all references
of this thing, QTime was left out.

bq. Frankly, this doesn't seem worth the change to me. And especially if we return the full
nanoTime string which would render any current tools for reporting performance metrics invalid
by a factor 100,000, all for extremely questionable utility.

We are still reporting as milliseconds, no change in resolution here. The {{RTimer}} utility
class used for this purpose already does the conversion.



was (Author: andyetitmoves):
bq. Hmmm, I'm not sure I agree. QTime is useful for knowing how long the lower level stuff
took.
Thus it's useful as it stands.

Can you elaborate what you mean by "lower level stuff"? If you are intending QTime to reflect
time taken by Lucene, i.e. roughly the time taken by the top-level collector to collect the
results needed, QTime covers way more than that already, including time taken for query parsing
and finishing up just in the non-distributed case. In the distributed case, it also covers
time taken waiting on the shard handler factory pool, network latency, any servlet container
pooling time at the processing shards, time taken waiting for the federating node to take
all responses, and time taken to merge all responses.

QTime as it stands is currently defined from the time the {{SolrQueryRequest}} object is created
till the response is rendered, which is hard to associate any semantic meaning to, and it
roughly is all steps except the logic required for resolving the core to send the request
to. All I am doing is adding that as well, to logically mean (again roughly, but less roughly)
"time taken by the servlet to service the request".

bq. I mean we're talking human-time here. When I'm asking how long a query took, my frail
human system is not capable of noticing a difference of a millisecond or two. nanoTime is
not necessarily all that accurate either. Sure, it gives a lot or precision, but that's different
from accuracy.

The problem is not because of millisecond vs. nanosecond accuracy. {{currentTimeMillis}} represents
the wall clock of the system and is subject to issues like clock skew. For example, if NTP
resets the time or if the sys admin changes the time for some reason or if some other action
changes the wall time, the difference between two such measurements can be totally incorrect
(including being negative). Which is why {{nanoTime}} is preferred for timing measurements
as it's guaranteed to be monotonic (where the OS supports it, i.e. everywhere except some
older versions of Windows). See SOLR-5734 for some context where we changed all references
of this thing, QTime was left out.

bq. Frankly, this doesn't seem worth the change to me. And especially if we return the full
nanoTime string which would render any current tools for reporting performance metrics invalid
by a factor 100,000, all for extremely questionable utility.

We are still reporting as milliseconds, no change in resolution here. The {{RTimer}} utility
class used for this purpose already does the conversion.


> Improve accuracy of QTime reporting
> -----------------------------------
>
>                 Key: SOLR-6275
>                 URL: https://issues.apache.org/jira/browse/SOLR-6275
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>            Reporter: Ramkumar Aiyengar
>            Priority: Minor
>
> Currently, {{QTime}} uses {{currentTimeMillis}} instead of {{nanoTime}} and hence is
not suitable for time measurements. Further, it is really started after all the dispatch logic
in {{SolrDispatchFilter}} (same with the top level timing reported by {{debug=timing}}) which
may or may not be expensive, and hence may not fully represent the time taken by the search.
This is to remedy both cases.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message