Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 66EA4F110 for ; Tue, 9 Apr 2013 16:51:44 +0000 (UTC) Received: (qmail 31048 invoked by uid 500); 9 Apr 2013 16:51:42 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 30984 invoked by uid 500); 9 Apr 2013 16:51:42 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 30974 invoked by uid 99); 9 Apr 2013 16:51:42 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 09 Apr 2013 16:51:42 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of serera@gmail.com designates 74.125.82.54 as permitted sender) Received: from [74.125.82.54] (HELO mail-wg0-f54.google.com) (74.125.82.54) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 09 Apr 2013 16:51:38 +0000 Received: by mail-wg0-f54.google.com with SMTP id a12so7056538wgh.33 for ; Tue, 09 Apr 2013 09:51:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:content-type; bh=0E5ZWgm3nTa5Qn5jRp53RvnLW6wtSgJZjm0mfBNpydE=; b=e5VIjWJRiYp6JzHkKyQDfR3/BA7LB7ED6hClecYduJ+KKWfpGRQ8uke+zoUDbu+wc8 fr4tWF3BD8dA0AV8Tg/oOScWQBla2Fkuy8He3SAXOoOnfJTOnHlNP9GqZGmLTtxzZtII sjdXwqjFItDuEsRp+xR3rArl0mO1/7piS1aoTShEeXY/lKODzn8qKQdA2zAJcezC+VBG mQKOvU7neFrqg18aSRJmRfeIBvvb37r3Y6ZcJvFvXuZfiGGKsldRF10a8GhET7UyW4l8 rIrLk3fZAL9gMWPkAcfhW2+PQQjDXWPLOBVXwuLqk+UqTd3iDvHCUQgOkqD+z9cf87/K oLXw== X-Received: by 10.180.188.3 with SMTP id fw3mr20840181wic.33.1365526276321; Tue, 09 Apr 2013 09:51:16 -0700 (PDT) MIME-Version: 1.0 Received: by 10.194.20.170 with HTTP; Tue, 9 Apr 2013 09:50:56 -0700 (PDT) In-Reply-To: <1365521990.2044.1.camel@linux.scoobydoo> References: <1358780365.2728.14.camel@linux.scoobydoo> <007c01cdf7eb$1eba8d50$5c2fa7f0$@thetaphi.de> <1358782771.2728.28.camel@linux.scoobydoo> <1358784492.2728.35.camel@linux.scoobydoo> <8C30D9E4-B961-4A09-8FEA-343690A0BC56@gmail.com> <1359046405.2728.73.camel@linux.scoobydoo> <1365521990.2044.1.camel@linux.scoobydoo> From: Shai Erera Date: Tue, 9 Apr 2013 19:50:56 +0300 Message-ID: Subject: Re: FacetedSearch and MultiReader To: "java-user@lucene.apache.org" , nbuso@ebi.ac.uk Content-Type: multipart/alternative; boundary=001a11c37cdca1b7f804d9f05c35 X-Virus-Checked: Checked by ClamAV on apache.org --001a11c37cdca1b7f804d9f05c35 Content-Type: text/plain; charset=ISO-8859-1 Hello Nicola, I think it would be good if you start a new thread to discuss this problem, as I don't think it's related to the issue in this thread. Also, I did not understand what's the problem you're running into. What used to work before 4.2 and doesn't work now? Shai On Tue, Apr 9, 2013 at 6:39 PM, Nicola Buso wrote: > Hi, > > I'm trying to use Lucene 4.2, but this merge of more taxonomy indexes > seam is no more working. > > Do you have any idea why it has not to work in Lucene 4.2? > Normal faceted search on a single index is working correctly. > > > Nicola. > > On Thu, 2013-01-24 at 16:53 +0000, Nicola Buso wrote: > > Hi Shai, > > > > I'd like just to give you a confirmation that your solution is working > > after the tests I did. > > > > Thanks again for the useful hints. > > > > > > Nicola. > > > > On Tue, 2013-01-22 at 06:20 +0200, Shai Erera wrote: > > > Hi Nicola, > > > > > > What I had in mind is something similar to this, which is possible > starting > > > with Lucene 4.1, due to changes done to facets (per-segment faceting): > > > > > > DirTaxoWriter master = new DirTaxoWriter(masterDir); > > > Directory[] origTaxoDirs = new Directory[numTaxoDirs]; // open > Directories > > > and store in that array > > > OrdinalMap[] ordinalMaps = new OrdinalMap[numTaxoDirs]; // initialize > > > OrdinalMap and store in that array > > > > > > // now do the merge > > > for (int i = 0; i < origTaxoDirs.length; i++) { > > > master.addTaxonomy(origTaxoDir[i], ordinalMaps[i]); > > > } > > > > > > // now open your readers, and create the important map > > > Map > > HashMap(); > > > DirectoryReader[] readers = new DirectoryReader[origTaxoDirs.length]; > > > for (int i = 0; i < origTaxoDirs.length; i++) { > > > DirectoryReader r = DirectoryReader.open(contentDirectories[i]); > > > OrdinalMap ordMap = ordinalMaps[i]; > > > for (AtomicReaderContext ctx : r.leaves()) { > > > readerOrdinals.put(ctx.reader(), ordMap); > > > } > > > } > > > > > > MultiReader mr = new MultiReader(readers); > > > > > > // create your FacetRequest (CountFacetRequest) with a custom > Aggregator > > > FacetRequest fr = new CountFacetRequest(cp, topK) { > > > @Override > > > public Aggregator createAggregator(...) { > > > return new OrdinalMappingAggregator() { > > > int[] ordMap; > > > > > > @Override > > > public void setNextReader(AtomicReaderContext context) { > > > ordMap = readerOrdinals.get(context.reader()).getMap(); > > > } > > > > > > @Override > > > public void aggregate(int docID, float score, IntsRef ordinals) { > > > int upto = ordinals.offset + ordinals.length; > > > for (int i = ordinals.offset; i < upto; i++) { > > > int ordinal = ordinals[i]; // original ordinal read for the > > > AtomicReader given to setNextReader > > > int mappedOrdinal = ordMap[ordinal]; // mapped ordinal, > following > > > the taxonomy merge > > > counts[mappedOrdinal]++; // count the mapped ordinal > instead, so > > > all AtomicReaders count that ordinal > > > } > > > } > > > }; > > > } > > > } > > > > > > While it may look like I wrote actual code to do it, I didn't :). So I > > > guess it should work, but I haven't tried it. > > > That way, you don't touch the content indexes at all, just the taxonomy > > > ones. > > > > > > Note however that you'll need to do this step every time the taxonomy > index > > > is updated, and you refresh the TaxoReader instance. > > > Also, this will only work if all your indexes are opened in the same > JVM > > > (which I assume that's the case, since you use MultiReader). > > > > > > If you still don't want to do that, then what Dennis wrote above is > another > > > way to do distributed faceted search, either inside the same JVM or > across > > > multiple JVMs. > > > You obtain the FacetResult from each search and merge the results > > > (unfortunately, there's still no tool in Lucene to do that for you). > > > Just make sure to ask for a larger K, to ensure that the correct top-K > is > > > returned (see my previous notes). > > > > > > Shai > > > > > > > > > > > > > > > On Tue, Jan 22, 2013 at 4:32 AM, Denis Bazhenov > wrote: > > > > > > > We have similar distribute search system and we have finished with > the > > > > following scheme. Search replicas (machines where index resides) are > build > > > > FacetResult's based on their index chunk (top N categories with > document > > > > counts). Later on the results are merged "by hands" with summing > relevant > > > > categories from different replicas. > > > > > > > > On Jan 22, 2013, at 3:08 AM, Nicola Buso wrote: > > > > > > > > > Hi Shai, > > > > > > > > > > I was thinking to that too, but I'm indexing all indexes in a > custom > > > > > distributed environment than I can't in this moment have a single > > > > > categories index for all the content indexes at indexing time. > > > > > A solution should be to merge all the categories indexes in one > only > > > > > index and use your solution but the merge code I see in the > examples > > > > > merge also the content index and I can't do that. > > > > > > > > > > I should share the taxonomy if is possible to merge (I see the > resulting > > > > > categories indexes are not that big currently), but I would prefer > to > > > > > have a solution where I can collect the facets over multiple > categories > > > > > indexes in this way I will be sure the solution will scale better. > > > > > > > > > > > > > > > Nicola. > > > > > > > > > > > > > > > On Mon, 2013-01-21 at 17:54 +0200, Shai Erera wrote: > > > > >> Hi Nicola, > > > > >> > > > > >> > > > > >> I think that what you're describing corresponds to distributed > faceted > > > > >> search. I.e., you have N content indexes, alongside N taxonomy > > > > >> indexes. > > > > >> > > > > >> The information that's indexed in each of those sub-indexes does > not > > > > >> correlate with the other ones. > > > > >> For example, say that you index the category "Movie/Drama", it may > > > > >> receive ordinal 12 in index1 and 23 in index2. > > > > >> > > > > >> If you'll try to count ordinals using MultiReader, you'll just > mess up > > > > >> everything. > > > > >> > > > > >> > > > > >> If you can share a single taxonomy index for all N content > indexes, > > > > >> then you'll be in a super-simple position: > > > > >> > > > > >> 1) Open one TaxonomyReader > > > > >> > > > > >> 2) Execute search with MultiReader and FacetsCollector > > > > >> > > > > >> > > > > >> > > > > >> It doesn't get simpler than that ! :) > > > > >> > > > > >> > > > > >> Before I go into great length describing what you should do if you > > > > >> cannot share the taxonomy, let me know if that's not an option for > > > > >> you. > > > > >> > > > > >> Shai > > > > >> > > > > >> > > > > >> > > > > >> On Mon, Jan 21, 2013 at 5:39 PM, Nicola Buso > wrote: > > > > >> Thanks for the reply Uwe, > > > > >> > > > > >> we currently can search with MultiReader over all the > indexes > > > > >> we have. > > > > >> Now I want to add the faceting search, than I created a > > > > >> categories index > > > > >> for every index I currently have. > > > > >> To accumulate the faceted results now I have a MultiReader > > > > >> pointing all > > > > >> the indexes and I can create a TaxonomyReader for every > > > > >> categories index > > > > >> I have; all the way I see to obtain FacetResults are: > > > > >> 1 - FacetsCollector > > > > >> 2 - a FacetsAccumulator implementation > > > > >> > > > > >> suppose I use the second option. I should: > > > > >> - search as usual using the MultiReader > > > > >> - than try to collect all the facetresults iterating over > my > > > > >> TaxonomyReaders; at every iteration: > > > > >> - I create a FacetsAccumulator using the MultiReader and > a > > > > >> TaxonomyReader > > > > >> - I get a list of FacetResult from the accumulator. > > > > >> - as I finish I should in some way merge all the > > > > >> List I > > > > >> have. > > > > >> > > > > >> I think this solution is not correct because the docsids > from > > > > >> the search > > > > >> are pointing the multireader instead the taxonomyreader is > > > > >> pointing to > > > > >> the categories index of a single reader. > > > > >> I neither like to merge all the List of FacetResult I > retrieve > > > > >> from the > > > > >> Accumulators. > > > > >> > > > > >> Probably I'm missing something, can somebody clarify to me > how > > > > >> I should > > > > >> collect the facets in this case? > > > > >> > > > > >> > > > > >> Nicola. > > > > >> > > > > >> > > > > >> > > > > >> On Mon, 2013-01-21 at 16:22 +0100, Uwe Schindler wrote: > > > > >>> Just use MultiReader, it extends IndexReader, so you can > > > > >> pass it anywhere where IndexReader can be passed. > > > > >>> > > > > >>> ----- > > > > >>> Uwe Schindler > > > > >>> H.-H.-Meier-Allee 63, D-28213 Bremen > > > > >>> http://www.thetaphi.de > > > > >>> eMail: uwe@thetaphi.de > > > > >>> > > > > >>>> -----Original Message----- > > > > >>>> From: Nicola Buso [mailto:nbuso@ebi.ac.uk] > > > > >>>> Sent: Monday, January 21, 2013 3:59 PM > > > > >>>> To: java-user@lucene.apache.org > > > > >>>> Subject: FacetedSearch and MultiReader > > > > >>>> > > > > >>>> Hi all, > > > > >>>> > > > > >>>> I'm trying to develop faceted search using lucene 4.0 > > > > >> faceting framework. > > > > >>>> In our project we are searching on multiple indexes using > > > > >> lucene > > > > >>>> MultiReader. How should we use the faceted framework to > > > > >> obtain > > > > >>>> FacetResults starting from a MultiReader? all the example > > > > >> I see are using a > > > > >>>> "single" IndexReader. > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>> Nicola. > > > > >>>> > > > > >>>> > > > > >>>> > > > > >> > > > > > --------------------------------------------------------------------- > > > > >>>> To unsubscribe, e-mail: > > > > >> java-user-unsubscribe@lucene.apache.org > > > > >>>> For additional commands, e-mail: > > > > >> java-user-help@lucene.apache.org > > > > >>> > > > > >> > > > > >> > > > > >> > > > > >> > > > > > --------------------------------------------------------------------- > > > > >> To unsubscribe, e-mail: > > > > >> java-user-unsubscribe@lucene.apache.org > > > > >> For additional commands, e-mail: > > > > >> java-user-help@lucene.apache.org > > > > >> > > > > >> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > > > > > For additional commands, e-mail: java-user-help@lucene.apache.org > > > > > > > > > > > > > --- > > > > Denis Bazhenov > > > > > > > > > > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > > > > For additional commands, e-mail: java-user-help@lucene.apache.org > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > > For additional commands, e-mail: java-user-help@lucene.apache.org > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --001a11c37cdca1b7f804d9f05c35--