lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson (Updated) (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SOLR-1931) Schema Browser does not scale with large indexes
Date Tue, 03 Jan 2012 02:26:21 GMT

     [ https://issues.apache.org/jira/browse/SOLR-1931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Erick Erickson updated SOLR-1931:
---------------------------------

    Attachment: SOLR-1931-3x.patch
                SOLR-1931-trunk.patch

Thanks Robert and Yonik for pointing me at the new 4x capabilities, they make a huge difference.
But you knew that.

The killer for 3.x was getting the document counts via a range query, I don't think there's
a good way to get the counts and not pay the penalty, so there's a new parameter recordDocCounts.

Here's my latest and close-to-last cut at this, both for 3x and 4x.

The data set is 89M documents, times in seconds.

3.5 
637 getting doc counts


3x with this patch
552 getting doc counts
 53 Stats without doc counts, but
    histogram etc. No option to do 
    this before.

4x, original
450 or so as I remember, getting doc
    counts, histograms, etc..

4x with patch, histograms still work.
158 Getting the doc counts the old way
   (span queries). I mean,
    you guys *said* ranges were going 
    to be faster.
 39 Getting the doc counts with
    terms.getDocCount(). 
    (including histograms)
 
 
Here's my proposal, I'll probably commit this next weekend at the latest unless there are
objections:

1> I'll let these stew for a couple of days, and look them over again. Anyone who wants
to look too, please feel free.

2> Live with getting the doc counts in 4x including the deleted docs and remove the reportDocCounts
parameter (it'll live in 3.6 and other 3x versions). I think the performance is fine without
carrying that kind of kludgy option forward. I could be persuaded otherwise, but an optimized
index will take care of the counting of deleted documents problem if anyone really cares.

                
> Schema Browser does not scale with large indexes
> ------------------------------------------------
>
>                 Key: SOLR-1931
>                 URL: https://issues.apache.org/jira/browse/SOLR-1931
>             Project: Solr
>          Issue Type: Improvement
>          Components: web gui
>    Affects Versions: 3.6, 4.0
>            Reporter: Lance Norskog
>            Assignee: Erick Erickson
>            Priority: Minor
>         Attachments: SOLR-1931-3x.patch, SOLR-1931-3x.patch, SOLR-1931-trunk.patch, SOLR-1931-trunk.patch
>
>
> The Schema  Browser JSP by default causes the Luke handler to "scan the world". In large
indexes this make the UI useless.
> On an index with 64m documents & 8gb of disk space, the Schema Browser took 6 minutes
to open and hogged all disk I/O, making Solr useless.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message