lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Salman Akram <salman.ak...@northbaysolutions.net>
Subject Re: Partial Counts in SOLR
Date Thu, 13 Mar 2014 06:52:13 GMT
Well some of the searches take minutes.

Below are some stats about this particular index that I am talking about:

Index size = 400GB (Using CommonGrams so without that the index is around
180GB)
Position File = 280GB
Total Docs = 170 million (just indexed for searching - for highlighting
contents are stored in another index)
Avg Doc Size = Few hundred KBs
RAM = 384GB (it has other indexes too but still OS cache can have 60-80% of
the total index cached)

Phrase queries run pretty fast with CG but complex versions of wildcard and
proximity queries can be really slow. I know using CG will make them slow
but they just take too long. By default sorting is on date but users have
few other parameters too on which they can sort.

I wanted to avoid creating multiple indexes (maybe based on years) but
seems that to search on partial data that's the only feasible way.




On Wed, Mar 12, 2014 at 2:47 PM, Dmitry Kan <solrexpert@gmail.com> wrote:

> As Hoss pointed out above, different projects have different requirements.
> Some want to sort by date of ingestion reverse, which means that having
> posting lists organized in a reverse order with the early termination is
> the way to go (no such feature in Solr directly). Some other projects want
> to collect all docs matching a query, and then sort by rank, but you cannot
> guarantee, that the most recently inserted document is the most relevant in
> terms of your ranking.
>
>
> Do your current searches take too long?
>
>
> On Tue, Mar 11, 2014 at 11:51 AM, Salman Akram <
> salman.akram@northbaysolutions.net> wrote:
>
> > Its a long video and I will definitely go through it but it seems this is
> > not possible with SOLR as it is?
> >
> > I just thought it would be quite a common issue; I mean generally for
> > search engines its more important to show the first page results, rather
> > than using timeAllowed which might not even return a single result.
> >
> > Thanks!
> >
> >
> > --
> > Regards,
> >
> > Salman Akram
> >
>
>
>
> --
> Dmitry
> Blog: http://dmitrykan.blogspot.com
> Twitter: http://twitter.com/dmitrykan
>



-- 
Regards,

Salman Akram

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message