lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arturas Mazeika <maze...@gmail.com>
Subject Re: top 10 query overall vs shard
Date Fri, 22 Jun 2018 16:07:03 GMT
Hi Shawn et al,

Thanks a lot for the prompt answer.

It looks to me that I made quite a few mistakes in formulating those solr
queries. Setting shards.qt to the name of the core was completely wrong. I
tried to search for shards.qt in http://lucene.apache.org/solr/guide/7_3/
but it did not give any answers. Googling for shards.qt was more successful
(I found an explanation what it means in two books, and a pointers and
numerous examples in usages in the top 40 results). Which means that I
would suggest adding a sentence saying 'use shards.qt as q` somewhere in
the documentation would not hurt :-)

Recomputing the queries:

http://localhost:9999/solr/de_wiki_all_shard1_replica_n1/terms?terms.limit=10&terms.fl=text&wt=json&distrib=false
returns
{

  "responseHeader":{
    "zkConnected":true,
    "status":0,
    "QTime":3400},
  "terms":{
    "text":[
      "8",671396,
      "application",671396,
      "articles",671396,
      "charset",671396,
      "de",671396,
      "f",671396,
      "utf",671396,
      "wiki",671396,
      "xhtml",671396,
      "xml",671396]}}

http://localhost:9999/solr/de_wiki_all/terms?terms.limit=10&terms.fl=text&wt=json

returns

{
  "responseHeader":{
    "zkConnected":true,
    "status":0,
    "QTime":2584},
  "terms":{
    "text":[
      "8",670564,
      "application",670564,
      "articles",670564,
      "charset",670564,
      "de",670564,
      "f",670564,
      "utf",670564,
      "wiki",670564,
      "xhtml",670564,
      "xml",670564]}}

LOG in CORE1:

INFO  -
2018-06-22 15:27:40.779; [c:de_wiki_all s:shard1 r:core_node3
x:de_wiki_all_shard1_replica_n1] org.apache.solr.core.SolrCore;
[de_wiki_all_shard1_replica_n1]  webapp=/solr path=/terms
params={distrib=false&terms.fl=text&terms.limit=10&wt=json}
status=0 QTime=3027
INFO  - 2018-06-22 15:27:42.059; [c:de_wiki_all
s:shard3 r:core_node11 x:de_wiki_all_shard3_replica_n8]
org.apache.solr.core.SolrCore; [de_wiki_all_shard3_replica_n8]
webapp=/solr path=/terms
params={distrib=false&terms.fl=text&terms.limit=10&wt=json}
status=0 QTime=2608

The number did not change also after

http://localhost:9999/solr/de_wiki_all/update?commit=true

(you correctly assumed that the collection is not getting any updates).


After I fired this query:

http://localhost:9999/solr/de_wiki_all/terms?terms.limit=10&terms.fl=text&wt=json&distrib=true

{
  "responseHeader":{
    "zkConnected":true,
    "status":0,
    "QTime":70245},
  "terms":{
    "text":{
      "8":2681402,
      "application":2681402,
      "articles":2681402,
      "charset":2681402,
      "de":2681402,
      "f":2681402,
      "utf":2681402,
      "wiki":2681402,
      "xhtml":2681402,
      "xml":2681402}}}

with the log line:

INFO  - 2018-06-22 15:32:54.805; [c:de_wiki_all s:shard1 r:core_node3
x:de_wiki_all_shard1_replica_n1] org.apache.solr.core.SolrCore;
[de_wiki_all_shard1_replica_n1]  webapp=/solr path=/terms
params={distrib=true&terms.fl=text&terms.limit=10&wt=json} status=0
QTime=70245

even the 1st query started returning the same results (shouldn't the
query be faster in the distributed settings?):

http://localhost:9999/solr/de_wiki_all_shard1_replica_n1/terms?terms.limit=10&terms.fl=text&wt=json&distrib=false

{
  "responseHeader":{
    "zkConnected":true,
    "status":0,
    "QTime":3438},
  "terms":{
    "text":[
      "8",671396,
      "application",671396,
      "articles",671396,
      "charset",671396,
      "de",671396,
      "f",671396,
      "utf",671396,
      "wiki",671396,
      "xhtml",671396,
      "xml",671396]}}


http://localhost:9999/solr/de_wiki_all/terms?terms.limit=10&terms.fl=text&wt=json

{
  "responseHeader":{
    "zkConnected":true,
    "status":0,
    "QTime":3325},
  "terms":{
    "text":[
      "8",671396,
      "application",671396,
      "articles",671396,
      "charset",671396,
      "de",671396,
      "f",671396,
      "utf",671396,
      "wiki",671396,
      "xhtml",671396,
      "xml",671396]}}

Also

http://localhost:9997/solr/de_wiki_all_shard2_replica_n4/terms?terms.limit=10&terms.fl=text&wt=json&distrib=false
{
  "responseHeader":{
    "zkConnected":true,
    "status":0,
    "QTime":2637},
  "terms":{
    "text":[
      "8",670221,
      "application",670221,
      "articles",670221,
      "charset",670221,
      "de",670221,
      "f",670221,
      "utf",670221,
      "wiki",670221,
      "xhtml",670221,
      "xml",670221]}}

http://localhost:9997/solr/de_wiki_all_shard4_replica_n12/terms?terms.limit=10&terms.fl=text&wt=json&distrib=false

{
  "responseHeader":{
    "zkConnected":true,
    "status":0,
    "QTime":2536},
  "terms":{
    "text":[
      "8",669221,
      "application",669221,
      "articles",669221,
      "charset",669221,
      "de",669221,
      "f",669221,
      "utf",669221,
      "wiki",669221,
      "xhtml",669221,
      "xml",669221]}}

http://localhost:9997/solr/de_wiki_all/terms?terms.limit=10&terms.fl=text&wt=json

{
  "responseHeader":{
    "zkConnected":true,
    "status":0,
    "QTime":2405},
  "terms":{
    "text":[
      "8",669221,
      "application",669221,
      "articles",669221,
      "charset",669221,
      "de",669221,
      "f",669221,
      "utf",669221,
      "wiki",669221,
      "xhtml",669221,
      "xml",669221]}}

which means that de_wiki_all/terms query is being redirected to only one of
the cores and computed locally.

On the performance part: the PC has 32GB of RAM with some 10GB left for the
OS to cache things. The complete index is ~40GB (the complete collection as
text documents was ~40GB), each replica is around 3.5GB large (shown e.g.,
in http://172.16.203.123:9999/solr/#/de_wiki_all_shard1_replica_n1). What
would be the easiest way to get all index/replicas listed with their
corresponding size in bytes?




What is the complexity of this terms query? Does solr need to go through
individual inverted indexes, or does solr needs to scan the list of terms
only (does every list have the number of IDs in the inverted index
precomputed?)?
This part of the question is particularly interesting as I was not able to
compute the
de_wiki_all/terms?terms.limit=10&terms.fl=text&wt=json&distrib=true with
2GB per core of memory (due to "unable to allocate memory in java heap" I
had to increase every instance it to 3GB).


Cheers,

Arturas



On Fri, Jun 22, 2018 at 4:28 PM, Shawn Heisey <apache@elyograg.org> wrote:

> On 6/22/2018 8:12 AM, Shawn Heisey wrote:
>
>> I wonder if having an invalid handler contributed to the speed.
>>
>
> Further thought about this:
>
> I can't say whether having an invalid handler name would cause speed
> problems, but based on my limited understanding of the code involved, I
> don't think it would.
>
> I'm guessing that with a shards.qt value that doesn't start with a slash,
> that the request gets sent to /select, with a qt parameter set to the
> value.  Solr would most likely ignore any qt value, because the
> handleSelect setting on requestDispatcher in solrconfig.xml has defaulted
> to false for many versions.
>
> Another possibility is that the OS had cached the information in a
> different replica for the full distributed query, and this made that query
> fast, but when the query directed to a specific shard replica was made,
> that data wasn't cached, and so Solr had to read the disk to satisfy the
> query, which is going to REALLY slow it down.  I would imagine that if you
> repeated the single-shard query multiple times, especially using the
> different URL that I gave you, the speed discrepancy might disappear.
>
> Thanks,
> Shawn
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message