lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Does solr supports Federated search, if not what framework
Date Thu, 07 Nov 2013 12:40:23 GMT
First, please start a new thread when changing
topics, see "thread hijacking" here
http://people.apache.org/~hossman/#threadhijack

But do be aware that scores are NOT comparable
between different queries on the _same_ corpus.
A score of .75 on one query has no relation to a
score of .75 on another. So "federated search"
is hard, you usually have to figure out a way to
group the results in a way that's meaningful to
a user.

Don't quite know how carrot handles that one...

FWIW,
Erick


On Mon, Nov 4, 2013 at 11:09 PM, Susheel Kumar <
susheel.kumar@thedigitalgroup.net> wrote:

> Hello,
>
> We have a scenario where we present results to users one from solr and
> other from real time web site search. The solr data we have locally
> available that we are able to index but other website search, we don't host
> data and it is real time.
>
> We are wondering if we can use some federated search framework which can
> unify the results into single set with relevancy and all.
>
> Any thoughts?
>
> Thanks & appreciate your help.
> Susheel
>
> -----Original Message-----
> From: Patanachai Tangchaisin [mailto:
> patanachai.tangchaisin@wizecommerce.com]
> Sent: Monday, November 04, 2013 7:38 PM
> To: solr-user@lucene.apache.org
> Subject: Disjuctive Queries (OR queries) and FilterCache
>
> Hello,
>
> We are running our search system using Apache Solr 4.2.1 and using
> Master/Slave model.
> Our index has ~100M document. The index size is  ~20gb.
> The machine has 24 CPU and 48gb rams.
>
> Our response time is pretty bad, median is ~4 seconds with 25
> queries/second.
>
> We noticed a couple of things
> - Our machine always use 100% CPU.
> - There is a lot of room for Java Heap. We assign Xms12g and Xmx16g, but
> the size of heap is still only 12g
> - Solr's filterCache hit ratio is only 0.76 and the number of insertion
> and eviction is almost equal.
>
> The weird thing is
> - most items in Solr's filterCache (only 100 first) are specify to only
> 1 field which we filter it by using an OR query for this field. Note that
> every request will have this field constraint.
>
> For example, if field name is x
> fq=x:(1 OR 2 OR 3)&fq=y:'a'
> fq=x:(3 OR 2 OR 1)&fq=y:'b'
> fq=x:(2 OR 1 OR 3)&fq=y:'c'
>
> An order of items is different since it is an input from a different
> system.
>
> To me, it seems that Solr do a cache on this field in different entry if
> an order of item is different. e.g. "(1 OR 2)" and "(2 OR 1)" is going to
> be a different cache entry.
>
> Question:
> Is there other way to create a fq parameter using 'OR' and make Solr cache
> them as a same entry?
>
>
> Thanks,
> Patanachai Tangchaisin
>
> CONFIDENTIALITY NOTICE
> ======================
> This email message and any attachments are for the exclusive use of the
> intended recipient(s) and may contain confidential and privileged
> information. Any unauthorized review, use, disclosure or distribution is
> prohibited. If you are not the intended recipient, please contact the
> sender by reply email and destroy all copies of the original message along
> with any attachments, from your computer system. If you are the intended
> recipient, please be advised that the content of this message is subject to
> access, review and disclosure by the sender's Email System Administrator.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message