lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-1931) Schema Browser does not scale with large indexes
Date Tue, 03 Jan 2012 03:20:22 GMT

    [ https://issues.apache.org/jira/browse/SOLR-1931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13178608#comment-13178608
] 

Erick Erickson commented on SOLR-1931:
--------------------------------------

bq: why is it still 39 seconds?

Histograms and collecting the top N terms by frequency. Still gotta spin through all the terms
to collect either statistic. Take that bit out and the response is less than 0.5 seconds.

39 seconds isn't bad at all for an index this size, and one can still specify particular fields
of interest if the index is more complex than this one. I can probably be argued out of their
importance although it'll take a little doing. This is really for, from my perspective, troubleshooting
at a high level and that information is valuable.

Besides, I *told* you I had to look it over after a while. I just saw something horribly trivial
that cuts it down to 15 seconds. There's a loop where, after the histo stuff is collected,
we test to see if the current term frequency is above the threshold of the already-collected
items.... and changing it from

if (freq < tiq.minfreq) continue;
to, essentially, 
if (freq <= tiq.minfreq) continue;

means that the pathological case of inserting every last <uniqueKey> in the tracking
priority queue doesn't happen. Siiigggh.

Oh, and the patch I'll attach in a couple of minutes actually compiles. I half cleaned up
the stupid recordDocCount parameter by removing the definition, but not getting it from the
parameters. Fella has to go to sleep more often.

Also, this index is a little peculiar in that many of the fields have only a very few values
so YMMV.


                
> Schema Browser does not scale with large indexes
> ------------------------------------------------
>
>                 Key: SOLR-1931
>                 URL: https://issues.apache.org/jira/browse/SOLR-1931
>             Project: Solr
>          Issue Type: Improvement
>          Components: web gui
>    Affects Versions: 3.6, 4.0
>            Reporter: Lance Norskog
>            Assignee: Erick Erickson
>            Priority: Minor
>         Attachments: SOLR-1931-3x.patch, SOLR-1931-3x.patch, SOLR-1931-trunk.patch, SOLR-1931-trunk.patch
>
>
> The Schema  Browser JSP by default causes the Luke handler to "scan the world". In large
indexes this make the UI useless.
> On an index with 64m documents & 8gb of disk space, the Schema Browser took 6 minutes
to open and hogged all disk I/O, making Solr useless.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message