lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Walter Underwood <wun...@wunderwood.org>
Subject Re: Very high memory and CPU utilization.
Date Mon, 02 Nov 2015 16:47:00 GMT
To back up a bit, how many documents are in this 90GB index? You might not need to shard at
all.

Why are you sending a query with a trailing wildcard? Are you matching the prefix of words,
for query completion? If so, look at the suggester, which is designed to solve exactly that.
Or you can use the EdgeNgramFilter to index prefixes. That will make your index larger, but
prefix searches will be very fast.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Nov 2, 2015, at 5:17 AM, Toke Eskildsen <te@statsbiblioteket.dk> wrote:
> 
> On Mon, 2015-11-02 at 17:27 +0530, Modassar Ather wrote:
> 
>> The query q=network se* is quick enough in our system too. It takes
>> around 3-4 seconds for around 8 million records.
>> 
>> The problem is with the same query as phrase. q="network se*".
> 
> I misunderstood your query then. I tried replicating it with
> q="der se*"
> 
> http://rosalind:52300/solr/collection1/select?q=%22der+se*%
> 22&wt=json&indent=true&facet=false&group=true&group.field=domain
> 
> gets expanded to
> 
> parsedquery": "(+DisjunctionMaxQuery((content_text:\"kan svane\" |
> author:kan svane* | text:\"kan svane\" | title:\"kan svane\" | url:kan
> svane* | description:\"kan svane\")) ())/no_coord"
> 
> The result was 1,043,258,271 hits in 15,211 ms
> 
> 
> Interestingly enough, a search for 
> q="kan svane*"
> resulted in 711 hits in 12,470 ms. Maybe because 'kan' alone matches 1
> billion+ documents. On that note,
> q=se*
> resulted in -951812427 hits in 194,276 ms.
> 
> Now this is interesting. The negative number seems to be caused by
> grouping, but I finally got the response time up in the minutes. Still
> no memory problems though. Hits without grouping were 3,343,154,869.
> 
> For comparison,
> q=http
> resulted in -1527418054 hits in 87,464 ms. Without grouping the hit
> count was 7,062,516,538. Twice the hits of 'se*' in half the time.
> 
>> I changed my SolrCloud setup from 12 shard to 8 shard and given each
>> shard 30 GB of RAM on the same machine with same index size
>> (re-indexed) but could not see the significant improvement for the
>> query given.
> 
> Strange. I would have expected the extra free memory for disk space to
> help performance.
> 
>> Also can you please share your experiences with respect to RAM, GC,
>> solr cache setup etc as it seems by your comment that the SolrCloud
>> environment you have is kind of similar to the one I work on?
>> 
> There is a short write up at
> https://sbdevel.wordpress.com/net-archive-search/
> 
> - Toke Eskildsen, State and University Library, Denmark
> 
> 
> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message