Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0516D182C5 for ; Mon, 2 Nov 2015 13:25:26 +0000 (UTC) Received: (qmail 52362 invoked by uid 500); 2 Nov 2015 13:24:22 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 50629 invoked by uid 500); 2 Nov 2015 13:24:21 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 48054 invoked by uid 99); 2 Nov 2015 13:17:42 -0000 Received: from Unknown (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Nov 2015 13:17:42 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 55DA21801DD for ; Mon, 2 Nov 2015 13:17:42 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.991 X-Spam-Level: X-Spam-Status: No, score=0.991 tagged_above=-999 required=6.31 tests=[KAM_LAZY_DOMAIN_SECURITY=1, T_RP_MATCHES_RCVD=-0.01, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id CDRgUqVjPCjv for ; Mon, 2 Nov 2015 13:17:33 +0000 (UTC) Received: from sbexch04.sb.statsbiblioteket.dk (sbexch04.sb.statsbiblioteket.dk [130.225.24.70]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id 171C920E96 for ; Mon, 2 Nov 2015 13:17:33 +0000 (UTC) Received: from sbexch04.sb.statsbiblioteket.dk (130.225.24.70) by sbexch04.sb.statsbiblioteket.dk (130.225.24.70) with Microsoft SMTP Server (TLS) id 15.0.1076.9; Mon, 2 Nov 2015 14:17:32 +0100 Received: from [130.225.25.26] (130.225.25.26) by sbexch04.sb.statsbiblioteket.dk (130.225.24.70) with Microsoft SMTP Server id 15.0.1076.9 via Frontend Transport; Mon, 2 Nov 2015 14:17:32 +0100 Message-ID: <1446470220.3703.389.camel@te-prime> Subject: Re: Very high memory and CPU utilization. From: Toke Eskildsen Reply-To: To: Date: Mon, 2 Nov 2015 14:17:00 +0100 In-Reply-To: References: <1446451247.3703.306.camel@te-prime> <1446465034.3703.369.camel@te-prime> Organization: State and University Library, Denmark Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.10.4-0ubuntu2 MIME-Version: 1.0 Content-Transfer-Encoding: 7bit On Mon, 2015-11-02 at 17:27 +0530, Modassar Ather wrote: > The query q=network se* is quick enough in our system too. It takes > around 3-4 seconds for around 8 million records. > > The problem is with the same query as phrase. q="network se*". I misunderstood your query then. I tried replicating it with q="der se*" http://rosalind:52300/solr/collection1/select?q=%22der+se*% 22&wt=json&indent=true&facet=false&group=true&group.field=domain gets expanded to parsedquery": "(+DisjunctionMaxQuery((content_text:\"kan svane\" | author:kan svane* | text:\"kan svane\" | title:\"kan svane\" | url:kan svane* | description:\"kan svane\")) ())/no_coord" The result was 1,043,258,271 hits in 15,211 ms Interestingly enough, a search for q="kan svane*" resulted in 711 hits in 12,470 ms. Maybe because 'kan' alone matches 1 billion+ documents. On that note, q=se* resulted in -951812427 hits in 194,276 ms. Now this is interesting. The negative number seems to be caused by grouping, but I finally got the response time up in the minutes. Still no memory problems though. Hits without grouping were 3,343,154,869. For comparison, q=http resulted in -1527418054 hits in 87,464 ms. Without grouping the hit count was 7,062,516,538. Twice the hits of 'se*' in half the time. > I changed my SolrCloud setup from 12 shard to 8 shard and given each > shard 30 GB of RAM on the same machine with same index size > (re-indexed) but could not see the significant improvement for the > query given. Strange. I would have expected the extra free memory for disk space to help performance. > Also can you please share your experiences with respect to RAM, GC, > solr cache setup etc as it seems by your comment that the SolrCloud > environment you have is kind of similar to the one I work on? > There is a short write up at https://sbdevel.wordpress.com/net-archive-search/ - Toke Eskildsen, State and University Library, Denmark