lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Faceted Search using Lucene
Date Sun, 01 Mar 2009 14:24:02 GMT

OK new version of SearcherManager, that fixes maybeReopen() so that it  
can be called from multiple threads.

NOTE: it's still untested!

Mike

package lia.admin;

import java.io.IOException;
import java.util.HashMap;

import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.store.Directory;

/** Utility class to get/refresh searchers when you are
  *  using multiple threads. */

public class SearcherManager {

   private IndexSearcher currentSearcher;                         //A
   private Directory dir;

   public SearcherManager(Directory dir) throws IOException {
     this.dir = dir;
     currentSearcher = new IndexSearcher(IndexReader.open(dir));  //B
   }

   public void warm(IndexSearcher searcher) {}                    //C

   private boolean reopening;

   private synchronized void startReopen()                        //D
     throws InterruptedException {
     while (reopening) {
       wait();
     }
     reopening = true;
   }

   private synchronized void doneReopen() {                       //E
     reopening = false;
     notifyAll();
   }

   public void maybeReopen() throws InterruptedException, IOException  
{ //F

     startReopen();

     try {
       final IndexSearcher searcher = get();
       try {
         long currentVersion =  
currentSearcher.getIndexReader().getVersion();  //G
         if (IndexReader.getCurrentVersion(dir) != currentVersion)  
{           //G
           IndexReader newReader =  
currentSearcher.getIndexReader().reopen();  //G
           assert newReader !=  
currentSearcher.getIndexReader();               //G
           IndexSearcher newSearcher = new  
IndexSearcher(newReader);           //G
            
warm(newSearcher);                                                  //G
            
swapSearcher(newSearcher);                                          //G
         }
       } finally {
         release(searcher);
       }
     } finally {
       doneReopen();
     }
   }

   public synchronized IndexSearcher get() {                      //H
     currentSearcher.getIndexReader().incRef();
     return currentSearcher;
   }

   public synchronized void release(IndexSearcher searcher)       //I
     throws IOException {
     searcher.getIndexReader().decRef();
   }

   private synchronized void swapSearcher(IndexSearcher newSearcher) //J
       throws IOException {
     release(currentSearcher);
     currentSearcher = newSearcher;
   }
}

/*
#A Current IndexSearcher
#B Create initial searcher
#C Implement in subclass to warm new searcher
#D Pauses until no other thread is reopening
#E Finish reopen and notify other threads
#F Reopen searcher if there are changes
#G Check index version and reopen, warm, swap if needed
#H Returns current searcher
#I Release searcher
#J Swaps currentSearcher to new searcher
*/

Mike

On Mar 1, 2009, at 8:27 AM, Amin Mohammed-Coleman wrote:

> just a quick point:
> public void maybeReopen() throws IOException {                 //D
>   long currentVersion = currentSearcher.getIndexReader().getVersion();
>   if (IndexReader.getCurrentVersion(dir) != currentVersion) {
>     IndexReader newReader = currentSearcher.getIndexReader().reopen();
>     assert newReader != currentSearcher.getIndexReader();
>     IndexSearcher newSearcher = new IndexSearcher(newReader);
>     warm(newSearcher);
>     swapSearcher(newSearcher);
>   }
> }
>
> should the above be synchronised?
>
> On Sun, Mar 1, 2009 at 1:25 PM, Amin Mohammed-Coleman <aminmc@gmail.com 
> >wrote:
>
>> thanks.  i will rewrite..in between giving my baby her feed and  
>> playing
>> with the other child and my wife who wants me to do several other  
>> things!
>>
>>
>>
>> On Sun, Mar 1, 2009 at 1:20 PM, Michael McCandless <
>> lucene@mikemccandless.com> wrote:
>>
>>>
>>> Amin Mohammed-Coleman wrote:
>>>
>>> Hi
>>>> Thanks for your input.  I would like to have a go at doing this  
>>>> myself
>>>> first, Solr may be an option.
>>>>
>>>> * You are creating a new Analyzer & QueryParser every time, also
>>>> creating unnecessary garbage; instead, they should be created once
>>>> & reused.
>>>>
>>>> -- I can moved the code out so that it is only created once and  
>>>> reused.
>>>>
>>>>
>>>> * You always make a new IndexSearcher and a new MultiSearcher even
>>>> when nothing has changed.  This just generates unnecessary garbage
>>>> which GC then must sweep up.
>>>>
>>>> -- This was something I thought about.  I could move it out so  
>>>> that it's
>>>> created once.  However I presume inside my code i need to check  
>>>> whether
>>>> the
>>>> indexreaders are update to date.  This needs to be synchronized  
>>>> as well I
>>>> guess(?)
>>>>
>>>
>>> Yes you should synchronize the check for whether the IndexReader is
>>> current.
>>>
>>> * I don't see any synchronization -- it looks like two search
>>>> requests are allowed into this method at the same time?  Which is
>>>> dangerous... eg both (or, more) will wastefully reopen the
>>>> readers.
>>>> --  So i need to extract the logic for reopening and provide a
>>>> synchronisation mechanism.
>>>>
>>>
>>> Yes.
>>>
>>>
>>> Ok.  So I have some work to do.  I'll refactor the code and see if  
>>> I can
>>>> get
>>>> inline to your recommendations.
>>>>
>>>>
>>>> On Sun, Mar 1, 2009 at 12:11 PM, Michael McCandless <
>>>> lucene@mikemccandless.com> wrote:
>>>>
>>>>
>>>>> On a quick look, I think there are a few problems with the code:
>>>>>
>>>>> * I don't see any synchronization -- it looks like two search
>>>>> requests are allowed into this method at the same time?  Which is
>>>>> dangerous... eg both (or, more) will wastefully reopen the
>>>>> readers.
>>>>>
>>>>> * You are over-incRef'ing (the reader.incRef inside the loop) -- I
>>>>> don't see a corresponding decRef.
>>>>>
>>>>> * You reopen and warm your searchers "live" (vs with BG thread);
>>>>> meaning the unlucky search request that hits a reopen pays the
>>>>> cost.  This might be OK if the index is small enough that
>>>>> reopening & warming takes very little time.  But if index gets
>>>>> large, making a random search pay that warming cost is not nice to
>>>>> the end user.  It erodes their trust in you.
>>>>>
>>>>> * You always make a new IndexSearcher and a new MultiSearcher even
>>>>> when nothing has changed.  This just generates unnecessary garbage
>>>>> which GC then must sweep up.
>>>>>
>>>>> * You are creating a new Analyzer & QueryParser every time, also
>>>>> creating unnecessary garbage; instead, they should be created once
>>>>> & reused.
>>>>>
>>>>> You should consider simply using Solr -- it handles all this  
>>>>> logic for
>>>>> you and has been well debugged with time...
>>>>>
>>>>> Mike
>>>>>
>>>>> Amin Mohammed-Coleman wrote:
>>>>>
>>>>> The reason for the indexreader.reopen is because I have a webapp  
>>>>> which
>>>>>
>>>>>> enables users to upload files and then search for the  
>>>>>> documents.  If I
>>>>>> don't
>>>>>> reopen i'm concerned that the facet hit counter won't be updated.
>>>>>>
>>>>>> On Tue, Feb 24, 2009 at 8:32 PM, Amin Mohammed-Coleman <
>>>>>> aminmc@gmail.com
>>>>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>
>>>>>> Hi
>>>>>>
>>>>>>> I have been able to get the code working for my scenario,  
>>>>>>> however I
>>>>>>> have
>>>>>>> a
>>>>>>> question and I was wondering if I could get some help.  I have
 
>>>>>>> a list
>>>>>>> of
>>>>>>> IndexSearchers which are used in a MultiSearcher class.  I use
 
>>>>>>> the
>>>>>>> indexsearchers to get each indexreader and put them into a
>>>>>>> MultiIndexReader.
>>>>>>>
>>>>>>> IndexReader[] readers = new IndexReader[searchables.length];
>>>>>>>
>>>>>>> for (int i =0 ; i < searchables.length;i++) {
>>>>>>>
>>>>>>> IndexSearcher indexSearcher = (IndexSearcher)searchables[i];
>>>>>>>
>>>>>>> readers[i] = indexSearcher.getIndexReader();
>>>>>>>
>>>>>>> IndexReader newReader = readers[i].reopen();
>>>>>>>
>>>>>>> if (newReader != readers[i]) {
>>>>>>>
>>>>>>> readers[i].close();
>>>>>>>
>>>>>>> }
>>>>>>>
>>>>>>> readers[i] = newReader;
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> }
>>>>>>>
>>>>>>> multiReader = new MultiReader(readers);
>>>>>>>
>>>>>>> OpenBitSetFacetHitCounter facetHitCounter =
>>>>>>> newOpenBitSetFacetHitCounter();
>>>>>>>
>>>>>>> IndexSearcher indexSearcher = new IndexSearcher(multiReader);
>>>>>>>
>>>>>>>
>>>>>>> I then use the indexseacher to do the facet stuff.  I end the
 
>>>>>>> code
>>>>>>> with
>>>>>>> closing the multireader.  This is causing problems in another
 
>>>>>>> method
>>>>>>> where I
>>>>>>> do some other search as the indexreaders are closed.  Is it ok
 
>>>>>>> to not
>>>>>>> close
>>>>>>> the multiindexreader or should I do some additional checks in
 
>>>>>>> the
>>>>>>> other
>>>>>>> method to see if the indexreader is closed?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Cheers
>>>>>>>
>>>>>>>
>>>>>>> P.S. Hope that made sense...!
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Feb 23, 2009 at 7:20 AM, Amin Mohammed-Coleman <
>>>>>>> aminmc@gmail.com
>>>>>>>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>
>>>>>>> Hi
>>>>>>>
>>>>>>>>
>>>>>>>> Thanks just what I needed!
>>>>>>>>
>>>>>>>> Cheers
>>>>>>>> Amin
>>>>>>>>
>>>>>>>>
>>>>>>>> On 22 Feb 2009, at 16:11, Marcelo Ochoa <marcelo.ochoa@gmail.com

>>>>>>>> >
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Hi Amin:
>>>>>>>>
>>>>>>>> Please take a look a this blog post:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> http://sujitpal.blogspot.com/2007/04/lucene-search-within-search-with.html
>>>>>>>>> Best regards, Marcelo.
>>>>>>>>>
>>>>>>>>> On Sun, Feb 22, 2009 at 1:18 PM, Amin Mohammed-Coleman
<
>>>>>>>>> aminmc@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Hi
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Sorry to re send this email but I was wondering if
I could  
>>>>>>>>>> get some
>>>>>>>>>> advice
>>>>>>>>>> on this.
>>>>>>>>>>
>>>>>>>>>> Cheers
>>>>>>>>>>
>>>>>>>>>> Amin
>>>>>>>>>>
>>>>>>>>>> On 16 Feb 2009, at 20:37, Amin Mohammed-Coleman <aminmc@gmail.com

>>>>>>>>>> >
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> I am looking at building a faceted search using
Lucene.  I  
>>>>>>>>>>> know
>>>>>>>>>>> that
>>>>>>>>>>> Solr
>>>>>>>>>>> comes with this built in, however I would like
to try this  
>>>>>>>>>>> by
>>>>>>>>>>> myself
>>>>>>>>>>> (something to add to my CV!).  I have been looking
around  
>>>>>>>>>>> and I
>>>>>>>>>>> found
>>>>>>>>>>> that
>>>>>>>>>>> you can use the IndexReader and use TermVectors.
 This  
>>>>>>>>>>> looks ok
>>>>>>>>>>> but
>>>>>>>>>>> I'm
>>>>>>>>>>> not
>>>>>>>>>>> sure how to filter the results so that a particular
user  
>>>>>>>>>>> can only
>>>>>>>>>>> see
>>>>>>>>>>> a
>>>>>>>>>>> subset of results.  The next option I was looking
at was  
>>>>>>>>>>> something
>>>>>>>>>>> like
>>>>>>>>>>>
>>>>>>>>>>> Term term1 = new Term("brand", "ford");
>>>>>>>>>>> Term term2 = new Term("brand", "vw");
>>>>>>>>>>> Term[] termsArray = new Term[] { term1, term2
};un
>>>>>>>>>>> int[] docFreqs = indexSearcher.docFreqs(termsArray);
>>>>>>>>>>>
>>>>>>>>>>> The only problem here is that I have to provide
the brand  
>>>>>>>>>>> type
>>>>>>>>>>> each
>>>>>>>>>>> time a
>>>>>>>>>>> new brand is created.  Again I'm not sure how
I can filter  
>>>>>>>>>>> the
>>>>>>>>>>> results
>>>>>>>>>>> here.
>>>>>>>>>>> It may be that I'm using the wrong api methods
to do this.
>>>>>>>>>>>
>>>>>>>>>>> I would be grateful if I could get some advice
on this.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Cheers
>>>>>>>>>>> Amin
>>>>>>>>>>>
>>>>>>>>>>> P.S.  I am basically trying to do something that
displays  
>>>>>>>>>>> the
>>>>>>>>>>> following
>>>>>>>>>>>
>>>>>>>>>>> Personal Contact (23) Business Contact (45) and
so on..
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Marcelo F. Ochoa
>>>>>>>>> http://marceloochoa.blogspot.com/
>>>>>>>>> http://marcelo.ochoa.googlepages.com/home
>>>>>>>>> ______________
>>>>>>>>> Want to integrate Lucene and Oracle?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> http://marceloochoa.blogspot.com/2007/09/running-lucene-inside-your-oracle-jvm.html
>>>>>>>>> Is Oracle 11g REST ready?
>>>>>>>>>
>>>>>>>>> http://marceloochoa.blogspot.com/2008/02/is-oracle-11g-rest-ready.html
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>> To unsubscribe, e-mail: java-user- 
>>>>>>>>> unsubscribe@lucene.apache.org
>>>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>>
>>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message