lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicola Buso <nb...@ebi.ac.uk>
Subject Re: FacetedSearch and MultiReader
Date Tue, 09 Apr 2013 15:39:50 GMT
Hi,

I'm trying to use Lucene 4.2, but this merge of more taxonomy indexes
seam is no more working.

Do you have any idea why it has not to work in Lucene 4.2?
Normal faceted search on a single index is working correctly.


Nicola.

On Thu, 2013-01-24 at 16:53 +0000, Nicola Buso wrote:
> Hi Shai,
> 
> I'd like just to give you a confirmation that your solution is working
> after the tests I did.
> 
> Thanks again for the useful hints.
> 
> 
> Nicola.
> 
> On Tue, 2013-01-22 at 06:20 +0200, Shai Erera wrote:
> > Hi Nicola,
> > 
> > What I had in mind is something similar to this, which is possible starting
> > with Lucene 4.1, due to changes done to facets (per-segment faceting):
> > 
> > DirTaxoWriter master = new DirTaxoWriter(masterDir);
> > Directory[] origTaxoDirs = new Directory[numTaxoDirs]; // open Directories
> > and store in that array
> > OrdinalMap[] ordinalMaps = new OrdinalMap[numTaxoDirs]; // initialize
> > OrdinalMap and store in that array
> > 
> > // now do the merge
> > for (int i = 0; i < origTaxoDirs.length; i++) {
> >   master.addTaxonomy(origTaxoDir[i], ordinalMaps[i]);
> > }
> > 
> > // now open your readers, and create the important map
> > Map<AtomicReader,OrdinalMap) readerOrdinals = new
> > HashMap<AtomicReader,OrdinalMap>();
> > DirectoryReader[] readers = new DirectoryReader[origTaxoDirs.length];
> > for (int i = 0; i < origTaxoDirs.length; i++) {
> >   DirectoryReader r = DirectoryReader.open(contentDirectories[i]);
> >   OrdinalMap ordMap = ordinalMaps[i];
> >   for (AtomicReaderContext ctx : r.leaves()) {
> >     readerOrdinals.put(ctx.reader(), ordMap);
> >   }
> > }
> > 
> > MultiReader mr = new MultiReader(readers);
> > 
> > // create your FacetRequest (CountFacetRequest) with a custom Aggregator
> > FacetRequest fr = new CountFacetRequest(cp, topK) {
> >   @Override
> >   public Aggregator createAggregator(...) {
> >     return new OrdinalMappingAggregator() {
> >       int[] ordMap;
> > 
> >       @Override
> >       public void setNextReader(AtomicReaderContext context) {
> >         ordMap = readerOrdinals.get(context.reader()).getMap();
> >       }
> > 
> >       @Override
> >       public void aggregate(int docID, float score, IntsRef ordinals) {
> >         int upto = ordinals.offset + ordinals.length;
> >         for (int i = ordinals.offset; i < upto; i++) {
> >           int ordinal = ordinals[i]; // original ordinal read for the
> > AtomicReader given to setNextReader
> >           int mappedOrdinal = ordMap[ordinal]; // mapped ordinal, following
> > the taxonomy merge
> >           counts[mappedOrdinal]++; // count the mapped ordinal instead, so
> > all AtomicReaders count that ordinal
> >         }
> >       }
> >     };
> >   }
> > }
> > 
> > While it may look like I wrote actual code to do it, I didn't :). So I
> > guess it should work, but I haven't tried it.
> > That way, you don't touch the content indexes at all, just the taxonomy
> > ones.
> > 
> > Note however that you'll need to do this step every time the taxonomy index
> > is updated, and you refresh the TaxoReader instance.
> > Also, this will only work if all your indexes are opened in the same JVM
> > (which I assume that's the case, since you use MultiReader).
> > 
> > If you still don't want to do that, then what Dennis wrote above is another
> > way to do distributed faceted search, either inside the same JVM or across
> > multiple JVMs.
> > You obtain the FacetResult from each search and merge the results
> > (unfortunately, there's still no tool in Lucene to do that for you).
> > Just make sure to ask for a larger K, to ensure that the correct top-K is
> > returned (see my previous notes).
> > 
> > Shai
> > 
> > 
> > 
> > 
> > On Tue, Jan 22, 2013 at 4:32 AM, Denis Bazhenov <dotsid@gmail.com> wrote:
> > 
> > > We have similar distribute search system and we have finished with the
> > > following scheme. Search replicas (machines where index resides) are build
> > > FacetResult's based on their index chunk (top N categories with document
> > > counts). Later on the results are merged "by hands" with summing relevant
> > > categories from different replicas.
> > >
> > > On Jan 22, 2013, at 3:08 AM, Nicola Buso <nbuso@ebi.ac.uk> wrote:
> > >
> > > > Hi Shai,
> > > >
> > > > I was thinking to that too, but I'm indexing all indexes in a custom
> > > > distributed environment than I can't in this moment have a single
> > > > categories index for all the content indexes at indexing time.
> > > > A solution should be to merge all the categories indexes in one only
> > > > index and use your solution but the merge code I see in the examples
> > > > merge also the content index and I can't do that.
> > > >
> > > > I should share the taxonomy if is possible to merge (I see the resulting
> > > > categories indexes are not that big currently), but I would prefer to
> > > > have a solution where I can collect the facets over multiple categories
> > > > indexes in this way I will be sure the solution will scale better.
> > > >
> > > >
> > > > Nicola.
> > > >
> > > >
> > > > On Mon, 2013-01-21 at 17:54 +0200, Shai Erera wrote:
> > > >> Hi Nicola,
> > > >>
> > > >>
> > > >> I think that what you're describing corresponds to distributed faceted
> > > >> search. I.e., you have N content indexes, alongside N taxonomy
> > > >> indexes.
> > > >>
> > > >> The information that's indexed in each of those sub-indexes does not
> > > >> correlate with the other ones.
> > > >> For example, say that you index the category "Movie/Drama", it may
> > > >> receive ordinal 12 in index1 and 23 in index2.
> > > >>
> > > >> If you'll try to count ordinals using MultiReader, you'll just mess
up
> > > >> everything.
> > > >>
> > > >>
> > > >> If you can share a single taxonomy index for all N content indexes,
> > > >> then you'll be in a super-simple position:
> > > >>
> > > >> 1) Open one TaxonomyReader
> > > >>
> > > >> 2) Execute search with MultiReader and FacetsCollector
> > > >>
> > > >>
> > > >>
> > > >> It doesn't get simpler than that ! :)
> > > >>
> > > >>
> > > >> Before I go into great length describing what you should do if you
> > > >> cannot share the taxonomy, let me know if that's not an option for
> > > >> you.
> > > >>
> > > >> Shai
> > > >>
> > > >>
> > > >>
> > > >> On Mon, Jan 21, 2013 at 5:39 PM, Nicola Buso <nbuso@ebi.ac.uk>
wrote:
> > > >>        Thanks for the reply Uwe,
> > > >>
> > > >>        we currently can search with MultiReader over all the indexes
> > > >>        we have.
> > > >>        Now I want to add the faceting search, than I created a
> > > >>        categories index
> > > >>        for every index I currently have.
> > > >>        To accumulate the faceted results now I have a MultiReader
> > > >>        pointing all
> > > >>        the indexes and I can create a TaxonomyReader for every
> > > >>        categories index
> > > >>        I have; all the way I see to obtain FacetResults are:
> > > >>        1 - FacetsCollector
> > > >>        2 - a FacetsAccumulator implementation
> > > >>
> > > >>        suppose I use the second option. I should:
> > > >>        - search as usual using the MultiReader
> > > >>        - than try to collect all the facetresults iterating over my
> > > >>        TaxonomyReaders; at every iteration:
> > > >>          - I create a FacetsAccumulator using the MultiReader and
a
> > > >>        TaxonomyReader
> > > >>          - I get a list of FacetResult from the accumulator.
> > > >>        - as I finish I should in some way merge all the
> > > >>        List<FacetResult> I
> > > >>        have.
> > > >>
> > > >>        I think this solution is not correct because the docsids from
> > > >>        the search
> > > >>        are pointing the multireader instead the taxonomyreader is
> > > >>        pointing to
> > > >>        the categories index of a single reader.
> > > >>        I neither like to merge all the List of FacetResult I retrieve
> > > >>        from the
> > > >>        Accumulators.
> > > >>
> > > >>        Probably I'm missing something, can somebody clarify to me
how
> > > >>        I should
> > > >>        collect the facets in this case?
> > > >>
> > > >>
> > > >>        Nicola.
> > > >>
> > > >>
> > > >>
> > > >>        On Mon, 2013-01-21 at 16:22 +0100, Uwe Schindler wrote:
> > > >>> Just use MultiReader, it extends IndexReader, so you can
> > > >>        pass it anywhere where IndexReader can be passed.
> > > >>>
> > > >>> -----
> > > >>> Uwe Schindler
> > > >>> H.-H.-Meier-Allee 63, D-28213 Bremen
> > > >>> http://www.thetaphi.de
> > > >>> eMail: uwe@thetaphi.de
> > > >>>
> > > >>>> -----Original Message-----
> > > >>>> From: Nicola Buso [mailto:nbuso@ebi.ac.uk]
> > > >>>> Sent: Monday, January 21, 2013 3:59 PM
> > > >>>> To: java-user@lucene.apache.org
> > > >>>> Subject: FacetedSearch and MultiReader
> > > >>>>
> > > >>>> Hi all,
> > > >>>>
> > > >>>> I'm trying to develop faceted search using lucene 4.0
> > > >>        faceting framework.
> > > >>>> In our project we are searching on multiple indexes using
> > > >>        lucene
> > > >>>> MultiReader. How should we use the faceted framework to
> > > >>        obtain
> > > >>>> FacetResults starting from a MultiReader? all the example
> > > >>        I see are using a
> > > >>>> "single" IndexReader.
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> Nicola.
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>
> > >  ---------------------------------------------------------------------
> > > >>>> To unsubscribe, e-mail:
> > > >>        java-user-unsubscribe@lucene.apache.org
> > > >>>> For additional commands, e-mail:
> > > >>        java-user-help@lucene.apache.org
> > > >>>
> > > >>
> > > >>
> > > >>
> > > >>
> > >  ---------------------------------------------------------------------
> > > >>        To unsubscribe, e-mail:
> > > >>        java-user-unsubscribe@lucene.apache.org
> > > >>        For additional commands, e-mail:
> > > >>        java-user-help@lucene.apache.org
> > > >>
> > > >>
> > > >>
> > > >>
> > > >
> > > >
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > > >
> > >
> > > ---
> > > Denis Bazhenov <dotsid@gmail.com>
> > >
> > >
> > >
> > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message