Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0EE9C1031D for ; Thu, 7 Nov 2013 12:41:03 +0000 (UTC) Received: (qmail 14149 invoked by uid 500); 7 Nov 2013 12:40:50 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 14100 invoked by uid 500); 7 Nov 2013 12:40:49 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 14092 invoked by uid 99); 7 Nov 2013 12:40:48 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 Nov 2013 12:40:48 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of erickerickson@gmail.com designates 209.85.128.178 as permitted sender) Received: from [209.85.128.178] (HELO mail-ve0-f178.google.com) (209.85.128.178) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 Nov 2013 12:40:44 +0000 Received: by mail-ve0-f178.google.com with SMTP id db12so320255veb.23 for ; Thu, 07 Nov 2013 04:40:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=KpDovSjormjebyDDLioM/k4JCpI6DRjursKGv/GH+f4=; b=crwVUn8gdYCiARpOyGqNM0oMJIb/4FYugOITPT5fhJcB4LjZHmR1BpddYMbgQ5rjoM pianeG43F1WZirTdVyrYNuNS4iVD0Byi7AFjMklambScqPi083NGgQlSvvk6BQ431DHY XZCnH9bcAMFuIwbjMipLoVyaq2S6X4F/cOMBVxT6ki/CsrtfuxrePujT197bRzpLvT6b LPcd8NerEhhjIAkVA2pJ0QyvYk5FI285/gG+o2rIbYlOqK6AFDkOVerkLsCSAtwGs8DV B/0jf86MufTjHDJd5Yvs1/XrfCeLFDAxPqTFBnAhhs1hycyHs8PPx2WdOqQa2I2b+Hvi wZMw== MIME-Version: 1.0 X-Received: by 10.220.237.138 with SMTP id ko10mr354363vcb.44.1383828023581; Thu, 07 Nov 2013 04:40:23 -0800 (PST) Received: by 10.52.171.78 with HTTP; Thu, 7 Nov 2013 04:40:23 -0800 (PST) In-Reply-To: <6920F599E2987B40BE502AA9478C825220DAA020@DAGN02B-E6.exg6.exghost.com> References: <6920F599E2987B40BE502AA9478C825220DAA020@DAGN02B-E6.exg6.exghost.com> Date: Thu, 7 Nov 2013 07:40:23 -0500 Message-ID: Subject: Re: Does solr supports Federated search, if not what framework From: Erick Erickson To: solr-user@lucene.apache.org Content-Type: multipart/alternative; boundary=047d7b343070c6a15804ea9591c1 X-Virus-Checked: Checked by ClamAV on apache.org --047d7b343070c6a15804ea9591c1 Content-Type: text/plain; charset=ISO-8859-1 First, please start a new thread when changing topics, see "thread hijacking" here http://people.apache.org/~hossman/#threadhijack But do be aware that scores are NOT comparable between different queries on the _same_ corpus. A score of .75 on one query has no relation to a score of .75 on another. So "federated search" is hard, you usually have to figure out a way to group the results in a way that's meaningful to a user. Don't quite know how carrot handles that one... FWIW, Erick On Mon, Nov 4, 2013 at 11:09 PM, Susheel Kumar < susheel.kumar@thedigitalgroup.net> wrote: > Hello, > > We have a scenario where we present results to users one from solr and > other from real time web site search. The solr data we have locally > available that we are able to index but other website search, we don't host > data and it is real time. > > We are wondering if we can use some federated search framework which can > unify the results into single set with relevancy and all. > > Any thoughts? > > Thanks & appreciate your help. > Susheel > > -----Original Message----- > From: Patanachai Tangchaisin [mailto: > patanachai.tangchaisin@wizecommerce.com] > Sent: Monday, November 04, 2013 7:38 PM > To: solr-user@lucene.apache.org > Subject: Disjuctive Queries (OR queries) and FilterCache > > Hello, > > We are running our search system using Apache Solr 4.2.1 and using > Master/Slave model. > Our index has ~100M document. The index size is ~20gb. > The machine has 24 CPU and 48gb rams. > > Our response time is pretty bad, median is ~4 seconds with 25 > queries/second. > > We noticed a couple of things > - Our machine always use 100% CPU. > - There is a lot of room for Java Heap. We assign Xms12g and Xmx16g, but > the size of heap is still only 12g > - Solr's filterCache hit ratio is only 0.76 and the number of insertion > and eviction is almost equal. > > The weird thing is > - most items in Solr's filterCache (only 100 first) are specify to only > 1 field which we filter it by using an OR query for this field. Note that > every request will have this field constraint. > > For example, if field name is x > fq=x:(1 OR 2 OR 3)&fq=y:'a' > fq=x:(3 OR 2 OR 1)&fq=y:'b' > fq=x:(2 OR 1 OR 3)&fq=y:'c' > > An order of items is different since it is an input from a different > system. > > To me, it seems that Solr do a cache on this field in different entry if > an order of item is different. e.g. "(1 OR 2)" and "(2 OR 1)" is going to > be a different cache entry. > > Question: > Is there other way to create a fq parameter using 'OR' and make Solr cache > them as a same entry? > > > Thanks, > Patanachai Tangchaisin > > CONFIDENTIALITY NOTICE > ====================== > This email message and any attachments are for the exclusive use of the > intended recipient(s) and may contain confidential and privileged > information. Any unauthorized review, use, disclosure or distribution is > prohibited. If you are not the intended recipient, please contact the > sender by reply email and destroy all copies of the original message along > with any attachments, from your computer system. If you are the intended > recipient, please be advised that the content of this message is subject to > access, review and disclosure by the sender's Email System Administrator. > --047d7b343070c6a15804ea9591c1--