lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Faceted Search using Lucene
Date Sun, 01 Mar 2009 13:20:32 GMT

Amin Mohammed-Coleman wrote:

> Hi
> Thanks for your input.  I would like to have a go at doing this myself
> first, Solr may be an option.
>
> * You are creating a new Analyzer & QueryParser every time, also
>   creating unnecessary garbage; instead, they should be created once
>   & reused.
>
> -- I can moved the code out so that it is only created once and  
> reused.
>
>
> * You always make a new IndexSearcher and a new MultiSearcher even
>   when nothing has changed.  This just generates unnecessary garbage
>   which GC then must sweep up.
>
> -- This was something I thought about.  I could move it out so that  
> it's
> created once.  However I presume inside my code i need to check  
> whether the
> indexreaders are update to date.  This needs to be synchronized as  
> well I
> guess(?)

Yes you should synchronize the check for whether the IndexReader is  
current.

> * I don't see any synchronization -- it looks like two search
>   requests are allowed into this method at the same time?  Which is
>   dangerous... eg both (or, more) will wastefully reopen the
>   readers.
> --  So i need to extract the logic for reopening and provide a
> synchronisation mechanism.

Yes.

> Ok.  So I have some work to do.  I'll refactor the code and see if I  
> can get
> inline to your recommendations.
>
>
> On Sun, Mar 1, 2009 at 12:11 PM, Michael McCandless <
> lucene@mikemccandless.com> wrote:
>
>>
>> On a quick look, I think there are a few problems with the code:
>>
>> * I don't see any synchronization -- it looks like two search
>>   requests are allowed into this method at the same time?  Which is
>>   dangerous... eg both (or, more) will wastefully reopen the
>>   readers.
>>
>> * You are over-incRef'ing (the reader.incRef inside the loop) -- I
>>   don't see a corresponding decRef.
>>
>> * You reopen and warm your searchers "live" (vs with BG thread);
>>   meaning the unlucky search request that hits a reopen pays the
>>   cost.  This might be OK if the index is small enough that
>>   reopening & warming takes very little time.  But if index gets
>>   large, making a random search pay that warming cost is not nice to
>>   the end user.  It erodes their trust in you.
>>
>> * You always make a new IndexSearcher and a new MultiSearcher even
>>   when nothing has changed.  This just generates unnecessary garbage
>>   which GC then must sweep up.
>>
>> * You are creating a new Analyzer & QueryParser every time, also
>>   creating unnecessary garbage; instead, they should be created once
>>   & reused.
>>
>> You should consider simply using Solr -- it handles all this logic  
>> for
>> you and has been well debugged with time...
>>
>> Mike
>>
>> Amin Mohammed-Coleman wrote:
>>
>> The reason for the indexreader.reopen is because I have a webapp  
>> which
>>> enables users to upload files and then search for the documents.   
>>> If I
>>> don't
>>> reopen i'm concerned that the facet hit counter won't be updated.
>>>
>>> On Tue, Feb 24, 2009 at 8:32 PM, Amin Mohammed-Coleman <aminmc@gmail.com
>>>> wrote:
>>>
>>> Hi
>>>> I have been able to get the code working for my scenario, however  
>>>> I have
>>>> a
>>>> question and I was wondering if I could get some help.  I have a  
>>>> list of
>>>> IndexSearchers which are used in a MultiSearcher class.  I use the
>>>> indexsearchers to get each indexreader and put them into a
>>>> MultiIndexReader.
>>>>
>>>> IndexReader[] readers = new IndexReader[searchables.length];
>>>>
>>>> for (int i =0 ; i < searchables.length;i++) {
>>>>
>>>> IndexSearcher indexSearcher = (IndexSearcher)searchables[i];
>>>>
>>>> readers[i] = indexSearcher.getIndexReader();
>>>>
>>>>  IndexReader newReader = readers[i].reopen();
>>>>
>>>> if (newReader != readers[i]) {
>>>>
>>>> readers[i].close();
>>>>
>>>> }
>>>>
>>>> readers[i] = newReader;
>>>>
>>>>
>>>>
>>>> }
>>>>
>>>> multiReader = new MultiReader(readers);
>>>>
>>>> OpenBitSetFacetHitCounter facetHitCounter =
>>>> newOpenBitSetFacetHitCounter();
>>>>
>>>> IndexSearcher indexSearcher = new IndexSearcher(multiReader);
>>>>
>>>>
>>>> I then use the indexseacher to do the facet stuff.  I end the  
>>>> code with
>>>> closing the multireader.  This is causing problems in another  
>>>> method
>>>> where I
>>>> do some other search as the indexreaders are closed.  Is it ok to  
>>>> not
>>>> close
>>>> the multiindexreader or should I do some additional checks in the  
>>>> other
>>>> method to see if the indexreader is closed?
>>>>
>>>>
>>>>
>>>> Cheers
>>>>
>>>>
>>>> P.S. Hope that made sense...!
>>>>
>>>>
>>>> On Mon, Feb 23, 2009 at 7:20 AM, Amin Mohammed-Coleman <aminmc@gmail.com
>>>>> wrote:
>>>>
>>>> Hi
>>>>>
>>>>> Thanks just what I needed!
>>>>>
>>>>> Cheers
>>>>> Amin
>>>>>
>>>>>
>>>>> On 22 Feb 2009, at 16:11, Marcelo Ochoa <marcelo.ochoa@gmail.com>
>>>>> wrote:
>>>>>
>>>>> Hi Amin:
>>>>>
>>>>>> Please take a look a this blog post:
>>>>>>
>>>>>>
>>>>>> http://sujitpal.blogspot.com/2007/04/lucene-search-within-search-with.html
>>>>>> Best regards, Marcelo.
>>>>>>
>>>>>> On Sun, Feb 22, 2009 at 1:18 PM, Amin Mohammed-Coleman <
>>>>>> aminmc@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> Hi
>>>>>>>
>>>>>>> Sorry to re send this email but I was wondering if I could get
 
>>>>>>> some
>>>>>>> advice
>>>>>>> on this.
>>>>>>>
>>>>>>> Cheers
>>>>>>>
>>>>>>> Amin
>>>>>>>
>>>>>>> On 16 Feb 2009, at 20:37, Amin Mohammed-Coleman <aminmc@gmail.com

>>>>>>> >
>>>>>>> wrote:
>>>>>>>
>>>>>>> Hi
>>>>>>>
>>>>>>>>
>>>>>>>> I am looking at building a faceted search using Lucene. 
I  
>>>>>>>> know that
>>>>>>>> Solr
>>>>>>>> comes with this built in, however I would like to try this
by  
>>>>>>>> myself
>>>>>>>> (something to add to my CV!).  I have been looking around
and  
>>>>>>>> I found
>>>>>>>> that
>>>>>>>> you can use the IndexReader and use TermVectors.  This looks
 
>>>>>>>> ok but
>>>>>>>> I'm
>>>>>>>> not
>>>>>>>> sure how to filter the results so that a particular user
can  
>>>>>>>> only see
>>>>>>>> a
>>>>>>>> subset of results.  The next option I was looking at was
 
>>>>>>>> something
>>>>>>>> like
>>>>>>>>
>>>>>>>> Term term1 = new Term("brand", "ford");
>>>>>>>> Term term2 = new Term("brand", "vw");
>>>>>>>> Term[] termsArray = new Term[] { term1, term2 };un
>>>>>>>> int[] docFreqs = indexSearcher.docFreqs(termsArray);
>>>>>>>>
>>>>>>>> The only problem here is that I have to provide the brand
 
>>>>>>>> type each
>>>>>>>> time a
>>>>>>>> new brand is created.  Again I'm not sure how I can filter
the
>>>>>>>> results
>>>>>>>> here.
>>>>>>>> It may be that I'm using the wrong api methods to do this.
>>>>>>>>
>>>>>>>> I would be grateful if I could get some advice on this.
>>>>>>>>
>>>>>>>>
>>>>>>>> Cheers
>>>>>>>> Amin
>>>>>>>>
>>>>>>>> P.S.  I am basically trying to do something that displays
the
>>>>>>>> following
>>>>>>>>
>>>>>>>> Personal Contact (23) Business Contact (45) and so on..
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Marcelo F. Ochoa
>>>>>> http://marceloochoa.blogspot.com/
>>>>>> http://marcelo.ochoa.googlepages.com/home
>>>>>> ______________
>>>>>> Want to integrate Lucene and Oracle?
>>>>>>
>>>>>>
>>>>>> http://marceloochoa.blogspot.com/2007/09/running-lucene-inside-your-oracle-jvm.html
>>>>>> Is Oracle 11g REST ready?
>>>>>> http://marceloochoa.blogspot.com/2008/02/is-oracle-11g-rest-ready.html
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>
>>>>>>
>>>>>>
>>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message