lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Rutherglen (JIRA)" <>
Subject [jira] Commented: (SOLR-1308) Cache docsets at the SegmentReader level
Date Fri, 04 Dec 2009 23:35:20 GMT


Jason Rutherglen commented on SOLR-1308:

{quote} Yeah... that's a pain. We could easily do per-segment
faceting for non-string types though (int, long, etc) since they
don't need to be merged. {quote}

I opened SOLR-1617 for this. I think doc sets can be handled
with a multi doc set (hopefully). Facets however, argh,
FacetComponent is really hairy, though I think it boils down to
simply adding field values of the same up? Then there seems to
be edge cases which I'm scared of. At least it's easy to test
whether we're fulfilling todays functionality by randomly unit
testing per-segment and multi-segment side by side (i.e. if the
results of one are different than the results of the other, we
know there's something to fix).

Perhaps we can initially add up field values, and test that
(which is enough for my project), and move from there. I'd still
like to genericize all of the distributed processes to work over
multiple segments (like Lucene distributed search uses a
MultiSearcher which also works locally), so that local or
distributed is the same API wise. However given I've had trouble
figuring out the existing distributed code (SOLR-1477 ran into a
wall). Maybe as part of SolrCloud, we can rework the
distributed APIs to be more user friendly (i.e. *MultiSearcher
is really easy to understand). If Solr's going to work well in
the cloud, distributed search probably needs to be easy to multi
tier for scaling (i.e. if we have 1 proxy server and 100 nodes,
we could have 1 top proxy, and 1 proxy per 10 nodes, etc). 

> Cache docsets at the SegmentReader level
> ----------------------------------------
>                 Key: SOLR-1308
>                 URL:
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 1.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>             Fix For: 1.5
>   Original Estimate: 504h
>  Remaining Estimate: 504h
> Solr caches docsets at the top level Multi*Reader level. After a
> commit, the filter/docset caches are flushed. Reloading the
> cache in near realtime (i.e. commits every 1s - 2min)
> unnecessarily consumes IO resources when reloading the filters,
> especially for largish indexes.
> We'll cache docsets at the SegmentReader level. The cache key
> will include the reader.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message