lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shai Erera <ser...@gmail.com>
Subject Re: FacetedSearch and MultiReader
Date Tue, 09 Apr 2013 16:50:56 GMT
Hello Nicola,

I think it would be good if you start a new thread to discuss this problem,
as I don't think it's related to the issue in this thread.
Also, I did not understand what's the problem you're running into. What
used to work before 4.2 and doesn't work now?

Shai


On Tue, Apr 9, 2013 at 6:39 PM, Nicola Buso <nbuso@ebi.ac.uk> wrote:

> Hi,
>
> I'm trying to use Lucene 4.2, but this merge of more taxonomy indexes
> seam is no more working.
>
> Do you have any idea why it has not to work in Lucene 4.2?
> Normal faceted search on a single index is working correctly.
>
>
> Nicola.
>
> On Thu, 2013-01-24 at 16:53 +0000, Nicola Buso wrote:
> > Hi Shai,
> >
> > I'd like just to give you a confirmation that your solution is working
> > after the tests I did.
> >
> > Thanks again for the useful hints.
> >
> >
> > Nicola.
> >
> > On Tue, 2013-01-22 at 06:20 +0200, Shai Erera wrote:
> > > Hi Nicola,
> > >
> > > What I had in mind is something similar to this, which is possible
> starting
> > > with Lucene 4.1, due to changes done to facets (per-segment faceting):
> > >
> > > DirTaxoWriter master = new DirTaxoWriter(masterDir);
> > > Directory[] origTaxoDirs = new Directory[numTaxoDirs]; // open
> Directories
> > > and store in that array
> > > OrdinalMap[] ordinalMaps = new OrdinalMap[numTaxoDirs]; // initialize
> > > OrdinalMap and store in that array
> > >
> > > // now do the merge
> > > for (int i = 0; i < origTaxoDirs.length; i++) {
> > >   master.addTaxonomy(origTaxoDir[i], ordinalMaps[i]);
> > > }
> > >
> > > // now open your readers, and create the important map
> > > Map<AtomicReader,OrdinalMap) readerOrdinals = new
> > > HashMap<AtomicReader,OrdinalMap>();
> > > DirectoryReader[] readers = new DirectoryReader[origTaxoDirs.length];
> > > for (int i = 0; i < origTaxoDirs.length; i++) {
> > >   DirectoryReader r = DirectoryReader.open(contentDirectories[i]);
> > >   OrdinalMap ordMap = ordinalMaps[i];
> > >   for (AtomicReaderContext ctx : r.leaves()) {
> > >     readerOrdinals.put(ctx.reader(), ordMap);
> > >   }
> > > }
> > >
> > > MultiReader mr = new MultiReader(readers);
> > >
> > > // create your FacetRequest (CountFacetRequest) with a custom
> Aggregator
> > > FacetRequest fr = new CountFacetRequest(cp, topK) {
> > >   @Override
> > >   public Aggregator createAggregator(...) {
> > >     return new OrdinalMappingAggregator() {
> > >       int[] ordMap;
> > >
> > >       @Override
> > >       public void setNextReader(AtomicReaderContext context) {
> > >         ordMap = readerOrdinals.get(context.reader()).getMap();
> > >       }
> > >
> > >       @Override
> > >       public void aggregate(int docID, float score, IntsRef ordinals) {
> > >         int upto = ordinals.offset + ordinals.length;
> > >         for (int i = ordinals.offset; i < upto; i++) {
> > >           int ordinal = ordinals[i]; // original ordinal read for the
> > > AtomicReader given to setNextReader
> > >           int mappedOrdinal = ordMap[ordinal]; // mapped ordinal,
> following
> > > the taxonomy merge
> > >           counts[mappedOrdinal]++; // count the mapped ordinal
> instead, so
> > > all AtomicReaders count that ordinal
> > >         }
> > >       }
> > >     };
> > >   }
> > > }
> > >
> > > While it may look like I wrote actual code to do it, I didn't :). So I
> > > guess it should work, but I haven't tried it.
> > > That way, you don't touch the content indexes at all, just the taxonomy
> > > ones.
> > >
> > > Note however that you'll need to do this step every time the taxonomy
> index
> > > is updated, and you refresh the TaxoReader instance.
> > > Also, this will only work if all your indexes are opened in the same
> JVM
> > > (which I assume that's the case, since you use MultiReader).
> > >
> > > If you still don't want to do that, then what Dennis wrote above is
> another
> > > way to do distributed faceted search, either inside the same JVM or
> across
> > > multiple JVMs.
> > > You obtain the FacetResult from each search and merge the results
> > > (unfortunately, there's still no tool in Lucene to do that for you).
> > > Just make sure to ask for a larger K, to ensure that the correct top-K
> is
> > > returned (see my previous notes).
> > >
> > > Shai
> > >
> > >
> > >
> > >
> > > On Tue, Jan 22, 2013 at 4:32 AM, Denis Bazhenov <dotsid@gmail.com>
> wrote:
> > >
> > > > We have similar distribute search system and we have finished with
> the
> > > > following scheme. Search replicas (machines where index resides) are
> build
> > > > FacetResult's based on their index chunk (top N categories with
> document
> > > > counts). Later on the results are merged "by hands" with summing
> relevant
> > > > categories from different replicas.
> > > >
> > > > On Jan 22, 2013, at 3:08 AM, Nicola Buso <nbuso@ebi.ac.uk> wrote:
> > > >
> > > > > Hi Shai,
> > > > >
> > > > > I was thinking to that too, but I'm indexing all indexes in a
> custom
> > > > > distributed environment than I can't in this moment have a single
> > > > > categories index for all the content indexes at indexing time.
> > > > > A solution should be to merge all the categories indexes in one
> only
> > > > > index and use your solution but the merge code I see in the
> examples
> > > > > merge also the content index and I can't do that.
> > > > >
> > > > > I should share the taxonomy if is possible to merge (I see the
> resulting
> > > > > categories indexes are not that big currently), but I would prefer
> to
> > > > > have a solution where I can collect the facets over multiple
> categories
> > > > > indexes in this way I will be sure the solution will scale better.
> > > > >
> > > > >
> > > > > Nicola.
> > > > >
> > > > >
> > > > > On Mon, 2013-01-21 at 17:54 +0200, Shai Erera wrote:
> > > > >> Hi Nicola,
> > > > >>
> > > > >>
> > > > >> I think that what you're describing corresponds to distributed
> faceted
> > > > >> search. I.e., you have N content indexes, alongside N taxonomy
> > > > >> indexes.
> > > > >>
> > > > >> The information that's indexed in each of those sub-indexes does
> not
> > > > >> correlate with the other ones.
> > > > >> For example, say that you index the category "Movie/Drama", it
may
> > > > >> receive ordinal 12 in index1 and 23 in index2.
> > > > >>
> > > > >> If you'll try to count ordinals using MultiReader, you'll just
> mess up
> > > > >> everything.
> > > > >>
> > > > >>
> > > > >> If you can share a single taxonomy index for all N content
> indexes,
> > > > >> then you'll be in a super-simple position:
> > > > >>
> > > > >> 1) Open one TaxonomyReader
> > > > >>
> > > > >> 2) Execute search with MultiReader and FacetsCollector
> > > > >>
> > > > >>
> > > > >>
> > > > >> It doesn't get simpler than that ! :)
> > > > >>
> > > > >>
> > > > >> Before I go into great length describing what you should do if
you
> > > > >> cannot share the taxonomy, let me know if that's not an option
for
> > > > >> you.
> > > > >>
> > > > >> Shai
> > > > >>
> > > > >>
> > > > >>
> > > > >> On Mon, Jan 21, 2013 at 5:39 PM, Nicola Buso <nbuso@ebi.ac.uk>
> wrote:
> > > > >>        Thanks for the reply Uwe,
> > > > >>
> > > > >>        we currently can search with MultiReader over all the
> indexes
> > > > >>        we have.
> > > > >>        Now I want to add the faceting search, than I created
a
> > > > >>        categories index
> > > > >>        for every index I currently have.
> > > > >>        To accumulate the faceted results now I have a MultiReader
> > > > >>        pointing all
> > > > >>        the indexes and I can create a TaxonomyReader for every
> > > > >>        categories index
> > > > >>        I have; all the way I see to obtain FacetResults are:
> > > > >>        1 - FacetsCollector
> > > > >>        2 - a FacetsAccumulator implementation
> > > > >>
> > > > >>        suppose I use the second option. I should:
> > > > >>        - search as usual using the MultiReader
> > > > >>        - than try to collect all the facetresults iterating over
> my
> > > > >>        TaxonomyReaders; at every iteration:
> > > > >>          - I create a FacetsAccumulator using the MultiReader
and
> a
> > > > >>        TaxonomyReader
> > > > >>          - I get a list of FacetResult from the accumulator.
> > > > >>        - as I finish I should in some way merge all the
> > > > >>        List<FacetResult> I
> > > > >>        have.
> > > > >>
> > > > >>        I think this solution is not correct because the docsids
> from
> > > > >>        the search
> > > > >>        are pointing the multireader instead the taxonomyreader
is
> > > > >>        pointing to
> > > > >>        the categories index of a single reader.
> > > > >>        I neither like to merge all the List of FacetResult I
> retrieve
> > > > >>        from the
> > > > >>        Accumulators.
> > > > >>
> > > > >>        Probably I'm missing something, can somebody clarify to
me
> how
> > > > >>        I should
> > > > >>        collect the facets in this case?
> > > > >>
> > > > >>
> > > > >>        Nicola.
> > > > >>
> > > > >>
> > > > >>
> > > > >>        On Mon, 2013-01-21 at 16:22 +0100, Uwe Schindler wrote:
> > > > >>> Just use MultiReader, it extends IndexReader, so you can
> > > > >>        pass it anywhere where IndexReader can be passed.
> > > > >>>
> > > > >>> -----
> > > > >>> Uwe Schindler
> > > > >>> H.-H.-Meier-Allee 63, D-28213 Bremen
> > > > >>> http://www.thetaphi.de
> > > > >>> eMail: uwe@thetaphi.de
> > > > >>>
> > > > >>>> -----Original Message-----
> > > > >>>> From: Nicola Buso [mailto:nbuso@ebi.ac.uk]
> > > > >>>> Sent: Monday, January 21, 2013 3:59 PM
> > > > >>>> To: java-user@lucene.apache.org
> > > > >>>> Subject: FacetedSearch and MultiReader
> > > > >>>>
> > > > >>>> Hi all,
> > > > >>>>
> > > > >>>> I'm trying to develop faceted search using lucene 4.0
> > > > >>        faceting framework.
> > > > >>>> In our project we are searching on multiple indexes using
> > > > >>        lucene
> > > > >>>> MultiReader. How should we use the faceted framework
to
> > > > >>        obtain
> > > > >>>> FacetResults starting from a MultiReader? all the example
> > > > >>        I see are using a
> > > > >>>> "single" IndexReader.
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> Nicola.
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>
> > > >
>  ---------------------------------------------------------------------
> > > > >>>> To unsubscribe, e-mail:
> > > > >>        java-user-unsubscribe@lucene.apache.org
> > > > >>>> For additional commands, e-mail:
> > > > >>        java-user-help@lucene.apache.org
> > > > >>>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > >
>  ---------------------------------------------------------------------
> > > > >>        To unsubscribe, e-mail:
> > > > >>        java-user-unsubscribe@lucene.apache.org
> > > > >>        For additional commands, e-mail:
> > > > >>        java-user-help@lucene.apache.org
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >
> > > > >
> > > > >
> > > > >
> ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > > > >
> > > >
> > > > ---
> > > > Denis Bazhenov <dotsid@gmail.com>
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > > >
> > > >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message