lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Faceted Search using Lucene
Date Sun, 01 Mar 2009 13:36:51 GMT

I was wondering the same thing ;)

It's best to call this method from a single BG "warming" thread, in  
which case it would not need its own synchronization.

But, to be safe, I'll add internal synchronization to it.  You can't  
simply put synchronized in front of the method, since you don't want  
this to block searching.

Mike

Amin Mohammed-Coleman wrote:

> just a quick point:
> public void maybeReopen() throws IOException {                 //D
>   long currentVersion = currentSearcher.getIndexReader().getVersion();
>   if (IndexReader.getCurrentVersion(dir) != currentVersion) {
>     IndexReader newReader = currentSearcher.getIndexReader().reopen();
>     assert newReader != currentSearcher.getIndexReader();
>     IndexSearcher newSearcher = new IndexSearcher(newReader);
>     warm(newSearcher);
>     swapSearcher(newSearcher);
>   }
> }
>
> should the above be synchronised?
>
> On Sun, Mar 1, 2009 at 1:25 PM, Amin Mohammed-Coleman <aminmc@gmail.com 
> >wrote:
>
>> thanks.  i will rewrite..in between giving my baby her feed and  
>> playing
>> with the other child and my wife who wants me to do several other  
>> things!
>>
>>
>>
>> On Sun, Mar 1, 2009 at 1:20 PM, Michael McCandless <
>> lucene@mikemccandless.com> wrote:
>>
>>>
>>> Amin Mohammed-Coleman wrote:
>>>
>>> Hi
>>>> Thanks for your input.  I would like to have a go at doing this  
>>>> myself
>>>> first, Solr may be an option.
>>>>
>>>> * You are creating a new Analyzer & QueryParser every time, also
>>>> creating unnecessary garbage; instead, they should be created once
>>>> & reused.
>>>>
>>>> -- I can moved the code out so that it is only created once and  
>>>> reused.
>>>>
>>>>
>>>> * You always make a new IndexSearcher and a new MultiSearcher even
>>>> when nothing has changed.  This just generates unnecessary garbage
>>>> which GC then must sweep up.
>>>>
>>>> -- This was something I thought about.  I could move it out so  
>>>> that it's
>>>> created once.  However I presume inside my code i need to check  
>>>> whether
>>>> the
>>>> indexreaders are update to date.  This needs to be synchronized  
>>>> as well I
>>>> guess(?)
>>>>
>>>
>>> Yes you should synchronize the check for whether the IndexReader is
>>> current.
>>>
>>> * I don't see any synchronization -- it looks like two search
>>>> requests are allowed into this method at the same time?  Which is
>>>> dangerous... eg both (or, more) will wastefully reopen the
>>>> readers.
>>>> --  So i need to extract the logic for reopening and provide a
>>>> synchronisation mechanism.
>>>>
>>>
>>> Yes.
>>>
>>>
>>> Ok.  So I have some work to do.  I'll refactor the code and see if  
>>> I can
>>>> get
>>>> inline to your recommendations.
>>>>
>>>>
>>>> On Sun, Mar 1, 2009 at 12:11 PM, Michael McCandless <
>>>> lucene@mikemccandless.com> wrote:
>>>>
>>>>
>>>>> On a quick look, I think there are a few problems with the code:
>>>>>
>>>>> * I don't see any synchronization -- it looks like two search
>>>>> requests are allowed into this method at the same time?  Which is
>>>>> dangerous... eg both (or, more) will wastefully reopen the
>>>>> readers.
>>>>>
>>>>> * You are over-incRef'ing (the reader.incRef inside the loop) -- I
>>>>> don't see a corresponding decRef.
>>>>>
>>>>> * You reopen and warm your searchers "live" (vs with BG thread);
>>>>> meaning the unlucky search request that hits a reopen pays the
>>>>> cost.  This might be OK if the index is small enough that
>>>>> reopening & warming takes very little time.  But if index gets
>>>>> large, making a random search pay that warming cost is not nice to
>>>>> the end user.  It erodes their trust in you.
>>>>>
>>>>> * You always make a new IndexSearcher and a new MultiSearcher even
>>>>> when nothing has changed.  This just generates unnecessary garbage
>>>>> which GC then must sweep up.
>>>>>
>>>>> * You are creating a new Analyzer & QueryParser every time, also
>>>>> creating unnecessary garbage; instead, they should be created once
>>>>> & reused.
>>>>>
>>>>> You should consider simply using Solr -- it handles all this  
>>>>> logic for
>>>>> you and has been well debugged with time...
>>>>>
>>>>> Mike
>>>>>
>>>>> Amin Mohammed-Coleman wrote:
>>>>>
>>>>> The reason for the indexreader.reopen is because I have a webapp  
>>>>> which
>>>>>
>>>>>> enables users to upload files and then search for the  
>>>>>> documents.  If I
>>>>>> don't
>>>>>> reopen i'm concerned that the facet hit counter won't be updated.
>>>>>>
>>>>>> On Tue, Feb 24, 2009 at 8:32 PM, Amin Mohammed-Coleman <
>>>>>> aminmc@gmail.com
>>>>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>
>>>>>> Hi
>>>>>>
>>>>>>> I have been able to get the code working for my scenario,  
>>>>>>> however I
>>>>>>> have
>>>>>>> a
>>>>>>> question and I was wondering if I could get some help.  I have
 
>>>>>>> a list
>>>>>>> of
>>>>>>> IndexSearchers which are used in a MultiSearcher class.  I use
 
>>>>>>> the
>>>>>>> indexsearchers to get each indexreader and put them into a
>>>>>>> MultiIndexReader.
>>>>>>>
>>>>>>> IndexReader[] readers = new IndexReader[searchables.length];
>>>>>>>
>>>>>>> for (int i =0 ; i < searchables.length;i++) {
>>>>>>>
>>>>>>> IndexSearcher indexSearcher = (IndexSearcher)searchables[i];
>>>>>>>
>>>>>>> readers[i] = indexSearcher.getIndexReader();
>>>>>>>
>>>>>>> IndexReader newReader = readers[i].reopen();
>>>>>>>
>>>>>>> if (newReader != readers[i]) {
>>>>>>>
>>>>>>> readers[i].close();
>>>>>>>
>>>>>>> }
>>>>>>>
>>>>>>> readers[i] = newReader;
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> }
>>>>>>>
>>>>>>> multiReader = new MultiReader(readers);
>>>>>>>
>>>>>>> OpenBitSetFacetHitCounter facetHitCounter =
>>>>>>> newOpenBitSetFacetHitCounter();
>>>>>>>
>>>>>>> IndexSearcher indexSearcher = new IndexSearcher(multiReader);
>>>>>>>
>>>>>>>
>>>>>>> I then use the indexseacher to do the facet stuff.  I end the
 
>>>>>>> code
>>>>>>> with
>>>>>>> closing the multireader.  This is causing problems in another
 
>>>>>>> method
>>>>>>> where I
>>>>>>> do some other search as the indexreaders are closed.  Is it ok
 
>>>>>>> to not
>>>>>>> close
>>>>>>> the multiindexreader or should I do some additional checks in
 
>>>>>>> the
>>>>>>> other
>>>>>>> method to see if the indexreader is closed?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Cheers
>>>>>>>
>>>>>>>
>>>>>>> P.S. Hope that made sense...!
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Feb 23, 2009 at 7:20 AM, Amin Mohammed-Coleman <
>>>>>>> aminmc@gmail.com
>>>>>>>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>
>>>>>>> Hi
>>>>>>>
>>>>>>>>
>>>>>>>> Thanks just what I needed!
>>>>>>>>
>>>>>>>> Cheers
>>>>>>>> Amin
>>>>>>>>
>>>>>>>>
>>>>>>>> On 22 Feb 2009, at 16:11, Marcelo Ochoa <marcelo.ochoa@gmail.com

>>>>>>>> >
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Hi Amin:
>>>>>>>>
>>>>>>>> Please take a look a this blog post:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> http://sujitpal.blogspot.com/2007/04/lucene-search-within-search-with.html
>>>>>>>>> Best regards, Marcelo.
>>>>>>>>>
>>>>>>>>> On Sun, Feb 22, 2009 at 1:18 PM, Amin Mohammed-Coleman
<
>>>>>>>>> aminmc@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Hi
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Sorry to re send this email but I was wondering if
I could  
>>>>>>>>>> get some
>>>>>>>>>> advice
>>>>>>>>>> on this.
>>>>>>>>>>
>>>>>>>>>> Cheers
>>>>>>>>>>
>>>>>>>>>> Amin
>>>>>>>>>>
>>>>>>>>>> On 16 Feb 2009, at 20:37, Amin Mohammed-Coleman <aminmc@gmail.com

>>>>>>>>>> >
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> I am looking at building a faceted search using
Lucene.  I  
>>>>>>>>>>> know
>>>>>>>>>>> that
>>>>>>>>>>> Solr
>>>>>>>>>>> comes with this built in, however I would like
to try this  
>>>>>>>>>>> by
>>>>>>>>>>> myself
>>>>>>>>>>> (something to add to my CV!).  I have been looking
around  
>>>>>>>>>>> and I
>>>>>>>>>>> found
>>>>>>>>>>> that
>>>>>>>>>>> you can use the IndexReader and use TermVectors.
 This  
>>>>>>>>>>> looks ok
>>>>>>>>>>> but
>>>>>>>>>>> I'm
>>>>>>>>>>> not
>>>>>>>>>>> sure how to filter the results so that a particular
user  
>>>>>>>>>>> can only
>>>>>>>>>>> see
>>>>>>>>>>> a
>>>>>>>>>>> subset of results.  The next option I was looking
at was  
>>>>>>>>>>> something
>>>>>>>>>>> like
>>>>>>>>>>>
>>>>>>>>>>> Term term1 = new Term("brand", "ford");
>>>>>>>>>>> Term term2 = new Term("brand", "vw");
>>>>>>>>>>> Term[] termsArray = new Term[] { term1, term2
};un
>>>>>>>>>>> int[] docFreqs = indexSearcher.docFreqs(termsArray);
>>>>>>>>>>>
>>>>>>>>>>> The only problem here is that I have to provide
the brand  
>>>>>>>>>>> type
>>>>>>>>>>> each
>>>>>>>>>>> time a
>>>>>>>>>>> new brand is created.  Again I'm not sure how
I can filter  
>>>>>>>>>>> the
>>>>>>>>>>> results
>>>>>>>>>>> here.
>>>>>>>>>>> It may be that I'm using the wrong api methods
to do this.
>>>>>>>>>>>
>>>>>>>>>>> I would be grateful if I could get some advice
on this.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Cheers
>>>>>>>>>>> Amin
>>>>>>>>>>>
>>>>>>>>>>> P.S.  I am basically trying to do something that
displays  
>>>>>>>>>>> the
>>>>>>>>>>> following
>>>>>>>>>>>
>>>>>>>>>>> Personal Contact (23) Business Contact (45) and
so on..
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Marcelo F. Ochoa
>>>>>>>>> http://marceloochoa.blogspot.com/
>>>>>>>>> http://marcelo.ochoa.googlepages.com/home
>>>>>>>>> ______________
>>>>>>>>> Want to integrate Lucene and Oracle?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> http://marceloochoa.blogspot.com/2007/09/running-lucene-inside-your-oracle-jvm.html
>>>>>>>>> Is Oracle 11g REST ready?
>>>>>>>>>
>>>>>>>>> http://marceloochoa.blogspot.com/2008/02/is-oracle-11g-rest-ready.html
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>> To unsubscribe, e-mail: java-user- 
>>>>>>>>> unsubscribe@lucene.apache.org
>>>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>>
>>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message