Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9775110D37 for ; Wed, 19 Nov 2014 20:28:46 +0000 (UTC) Received: (qmail 91542 invoked by uid 500); 19 Nov 2014 20:28:42 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 91476 invoked by uid 500); 19 Nov 2014 20:28:42 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 91464 invoked by uid 99); 19 Nov 2014 20:28:42 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 19 Nov 2014 20:28:42 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of peter.sturge@gmail.com designates 209.85.217.174 as permitted sender) Received: from [209.85.217.174] (HELO mail-lb0-f174.google.com) (209.85.217.174) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 19 Nov 2014 20:28:37 +0000 Received: by mail-lb0-f174.google.com with SMTP id w7so1170036lbi.5 for ; Wed, 19 Nov 2014 12:26:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=OPa/OlXCAXShID/jhCXPR9Pp0v2sZ6SbaZ60P7P5LV4=; b=YOUj4N9p9sFC1wjW/V3347X3sO3HinaqgcqUkKBDfI1vkNcpgxOUh6z0iZkptFwMl3 /wiTv4uRC7Fi1nEgxy3NwgoYx8saW6LBfRQ/ZgsGwczzwGBm2KJBKV2tr3aG63HYJLFD tcoHSqXikqrx/E4zXLUoqg/3sGIFN5JhbsTQiycrjb3t/xXy86rIU1fl/5TGUIzjZaV6 yWK1QRlBKS7qUtjEpl4sQRlciqOWoBhVOnXRT+rrDPgVAX/mhCPK/dkmrXBl8Jp4jF4N QFTrA5lueYKZzVqtbH1droyhzF/iNZPtnpWQgzzViTaJIQNPeu0kO3VTy/jnHxmtJeov owtA== MIME-Version: 1.0 X-Received: by 10.152.205.11 with SMTP id lc11mr7369604lac.34.1416428761781; Wed, 19 Nov 2014 12:26:01 -0800 (PST) Received: by 10.25.157.74 with HTTP; Wed, 19 Nov 2014 12:26:01 -0800 (PST) In-Reply-To: <2E6A89A648463A4EBF093A9062C166830581E6AEC6D3@SBMAILBOX1.sb.statsbiblioteket.dk> References: <2E6A89A648463A4EBF093A9062C166830581E6AEC6D3@SBMAILBOX1.sb.statsbiblioteket.dk> Date: Wed, 19 Nov 2014 20:26:01 +0000 Message-ID: Subject: Re: Handling intersection facets of many values From: Peter Sturge To: solr-user@lucene.apache.org Content-Type: multipart/alternative; boundary=001a1133a81832198505083c05ee X-Virus-Checked: Checked by ClamAV on apache.org --001a1133a81832198505083c05ee Content-Type: text/plain; charset=UTF-8 Hi Toke, Thanks for your input. I guess you mean take the 1k or so values and build a boolean query from them? If that's not what you mean, my apologies.. I'd thought of doing that - the trouble I had was the unique values could be 20k, or 15,167 or any arbirary and potentially high-ish number - it's not really known and can/will change over time. I believe a boolean query with more than 1024 ops can blow up the query, so scalability is a concern. The other issue is how this would yield the unique facet values - e.g. dest=8.8.8.8 (17) [i.e. 8.8.8.8 is in the 'addr' list and occurs 17 times in entries with a 'dest' field] - in fact, I need the uniques value(s) ('8.8.8.8') more than I need the count ('17') I could get the facet list of 'dest' values, then trawl through each one, but this will be a complicated and time-consuming client-side operation. I'm also looking at creating a custom QueryParser that would build the relevant DocLists, then intersect them and return the values, but I wouldn't want to reinvent the wheel if possible, given that facets already build unique term lists, seems so close - I guess it's like taking two facet lists (1 for addr, 1 for dest), intersecting them and returning the result: List 1: a b c d e f List 2: a a g z c c c e Resultant intersection: a (2) c (3) e (1) Thanks, Peter On Wed, Nov 19, 2014 at 7:16 PM, Toke Eskildsen wrote: > Peter Sturge [peter.sturge@gmail.com] wrote: > > [addr 7M unique, dest 1K unique] > > > What is the best/only/most efficient way to consutruct a search where by > I > > get back an (ideally faceted) list of values for 'dest' that occur in > > 'addr'? > > I assume the actual values are defined by a query? As the number of > possible values in dest is not that large, extracting those first and then > using them as a filter when searching for addr seems like a fairly > efficient way of solving the problem. > > - Toke Eskildsen > --001a1133a81832198505083c05ee--