lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitry Kan <solrexp...@gmail.com>
Subject Re: Partial Counts in SOLR
Date Thu, 13 Mar 2014 12:38:54 GMT
1. What is your solr version? In 4.x family the proximity searches have
been optimized among other query types.
2. Do you use the filter queries? What is the situation with the cache
utilization ratios? Optimize (= i.e. bump up the respective cache sizes) if
you have low hitratios and many evictions.
3. Can you avoid storing some fields and only index them? When the field is
stored and it is retrieved in the result, there are couple of disk seeks
per field=> search slows down. Consider SSD disks.
4. Do you monitor your system in terms of RAM / cache stats / GC? Do you
observe STW GC pauses?
5. How often do you commit & do you have the autowarming / external warming
configured?
6. If you use faceting, consider storing DocValues for facet fields.

some solr wiki docs:
https://wiki.apache.org/solr/SolrPerformanceProblems?highlight=%28%28SolrPerformanceFactors%29%29





On Thu, Mar 13, 2014 at 8:52 AM, Salman Akram <
salman.akram@northbaysolutions.net> wrote:

> Well some of the searches take minutes.
>
> Below are some stats about this particular index that I am talking about:
>
> Index size = 400GB (Using CommonGrams so without that the index is around
> 180GB)
> Position File = 280GB
> Total Docs = 170 million (just indexed for searching - for highlighting
> contents are stored in another index)
> Avg Doc Size = Few hundred KBs
> RAM = 384GB (it has other indexes too but still OS cache can have 60-80% of
> the total index cached)
>
> Phrase queries run pretty fast with CG but complex versions of wildcard and
> proximity queries can be really slow. I know using CG will make them slow
> but they just take too long. By default sorting is on date but users have
> few other parameters too on which they can sort.
>
> I wanted to avoid creating multiple indexes (maybe based on years) but
> seems that to search on partial data that's the only feasible way.
>
>
>
>
> On Wed, Mar 12, 2014 at 2:47 PM, Dmitry Kan <solrexpert@gmail.com> wrote:
>
> > As Hoss pointed out above, different projects have different
> requirements.
> > Some want to sort by date of ingestion reverse, which means that having
> > posting lists organized in a reverse order with the early termination is
> > the way to go (no such feature in Solr directly). Some other projects
> want
> > to collect all docs matching a query, and then sort by rank, but you
> cannot
> > guarantee, that the most recently inserted document is the most relevant
> in
> > terms of your ranking.
> >
> >
> > Do your current searches take too long?
> >
> >
> > On Tue, Mar 11, 2014 at 11:51 AM, Salman Akram <
> > salman.akram@northbaysolutions.net> wrote:
> >
> > > Its a long video and I will definitely go through it but it seems this
> is
> > > not possible with SOLR as it is?
> > >
> > > I just thought it would be quite a common issue; I mean generally for
> > > search engines its more important to show the first page results,
> rather
> > > than using timeAllowed which might not even return a single result.
> > >
> > > Thanks!
> > >
> > >
> > > --
> > > Regards,
> > >
> > > Salman Akram
> > >
> >
> >
> >
> > --
> > Dmitry
> > Blog: http://dmitrykan.blogspot.com
> > Twitter: http://twitter.com/dmitrykan
> >
>
>
>
> --
> Regards,
>
> Salman Akram
>



-- 
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message