lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amin Mohammed-Coleman <ami...@gmail.com>
Subject Re: Faceted Search using Lucene
Date Sun, 01 Mar 2009 13:25:55 GMT
thanks.  i will rewrite..in between giving my baby her feed and playing with
the other child and my wife who wants me to do several other things!


On Sun, Mar 1, 2009 at 1:20 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

>
> Amin Mohammed-Coleman wrote:
>
>  Hi
>> Thanks for your input.  I would like to have a go at doing this myself
>> first, Solr may be an option.
>>
>> * You are creating a new Analyzer & QueryParser every time, also
>>  creating unnecessary garbage; instead, they should be created once
>>  & reused.
>>
>> -- I can moved the code out so that it is only created once and reused.
>>
>>
>> * You always make a new IndexSearcher and a new MultiSearcher even
>>  when nothing has changed.  This just generates unnecessary garbage
>>  which GC then must sweep up.
>>
>> -- This was something I thought about.  I could move it out so that it's
>> created once.  However I presume inside my code i need to check whether
>> the
>> indexreaders are update to date.  This needs to be synchronized as well I
>> guess(?)
>>
>
> Yes you should synchronize the check for whether the IndexReader is
> current.
>
>  * I don't see any synchronization -- it looks like two search
>>  requests are allowed into this method at the same time?  Which is
>>  dangerous... eg both (or, more) will wastefully reopen the
>>  readers.
>> --  So i need to extract the logic for reopening and provide a
>> synchronisation mechanism.
>>
>
> Yes.
>
>
>  Ok.  So I have some work to do.  I'll refactor the code and see if I can
>> get
>> inline to your recommendations.
>>
>>
>> On Sun, Mar 1, 2009 at 12:11 PM, Michael McCandless <
>> lucene@mikemccandless.com> wrote:
>>
>>
>>> On a quick look, I think there are a few problems with the code:
>>>
>>> * I don't see any synchronization -- it looks like two search
>>>  requests are allowed into this method at the same time?  Which is
>>>  dangerous... eg both (or, more) will wastefully reopen the
>>>  readers.
>>>
>>> * You are over-incRef'ing (the reader.incRef inside the loop) -- I
>>>  don't see a corresponding decRef.
>>>
>>> * You reopen and warm your searchers "live" (vs with BG thread);
>>>  meaning the unlucky search request that hits a reopen pays the
>>>  cost.  This might be OK if the index is small enough that
>>>  reopening & warming takes very little time.  But if index gets
>>>  large, making a random search pay that warming cost is not nice to
>>>  the end user.  It erodes their trust in you.
>>>
>>> * You always make a new IndexSearcher and a new MultiSearcher even
>>>  when nothing has changed.  This just generates unnecessary garbage
>>>  which GC then must sweep up.
>>>
>>> * You are creating a new Analyzer & QueryParser every time, also
>>>  creating unnecessary garbage; instead, they should be created once
>>>  & reused.
>>>
>>> You should consider simply using Solr -- it handles all this logic for
>>> you and has been well debugged with time...
>>>
>>> Mike
>>>
>>> Amin Mohammed-Coleman wrote:
>>>
>>> The reason for the indexreader.reopen is because I have a webapp which
>>>
>>>> enables users to upload files and then search for the documents.  If I
>>>> don't
>>>> reopen i'm concerned that the facet hit counter won't be updated.
>>>>
>>>> On Tue, Feb 24, 2009 at 8:32 PM, Amin Mohammed-Coleman <
>>>> aminmc@gmail.com
>>>>
>>>>> wrote:
>>>>>
>>>>
>>>> Hi
>>>>
>>>>> I have been able to get the code working for my scenario, however I
>>>>> have
>>>>> a
>>>>> question and I was wondering if I could get some help.  I have a list
>>>>> of
>>>>> IndexSearchers which are used in a MultiSearcher class.  I use the
>>>>> indexsearchers to get each indexreader and put them into a
>>>>> MultiIndexReader.
>>>>>
>>>>> IndexReader[] readers = new IndexReader[searchables.length];
>>>>>
>>>>> for (int i =0 ; i < searchables.length;i++) {
>>>>>
>>>>> IndexSearcher indexSearcher = (IndexSearcher)searchables[i];
>>>>>
>>>>> readers[i] = indexSearcher.getIndexReader();
>>>>>
>>>>>  IndexReader newReader = readers[i].reopen();
>>>>>
>>>>> if (newReader != readers[i]) {
>>>>>
>>>>> readers[i].close();
>>>>>
>>>>> }
>>>>>
>>>>> readers[i] = newReader;
>>>>>
>>>>>
>>>>>
>>>>> }
>>>>>
>>>>> multiReader = new MultiReader(readers);
>>>>>
>>>>> OpenBitSetFacetHitCounter facetHitCounter =
>>>>> newOpenBitSetFacetHitCounter();
>>>>>
>>>>> IndexSearcher indexSearcher = new IndexSearcher(multiReader);
>>>>>
>>>>>
>>>>> I then use the indexseacher to do the facet stuff.  I end the code with
>>>>> closing the multireader.  This is causing problems in another method
>>>>> where I
>>>>> do some other search as the indexreaders are closed.  Is it ok to not
>>>>> close
>>>>> the multiindexreader or should I do some additional checks in the other
>>>>> method to see if the indexreader is closed?
>>>>>
>>>>>
>>>>>
>>>>> Cheers
>>>>>
>>>>>
>>>>> P.S. Hope that made sense...!
>>>>>
>>>>>
>>>>> On Mon, Feb 23, 2009 at 7:20 AM, Amin Mohammed-Coleman <
>>>>> aminmc@gmail.com
>>>>>
>>>>>> wrote:
>>>>>>
>>>>>
>>>>> Hi
>>>>>
>>>>>>
>>>>>> Thanks just what I needed!
>>>>>>
>>>>>> Cheers
>>>>>> Amin
>>>>>>
>>>>>>
>>>>>> On 22 Feb 2009, at 16:11, Marcelo Ochoa <marcelo.ochoa@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> Hi Amin:
>>>>>>
>>>>>>  Please take a look a this blog post:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> http://sujitpal.blogspot.com/2007/04/lucene-search-within-search-with.html
>>>>>>> Best regards, Marcelo.
>>>>>>>
>>>>>>> On Sun, Feb 22, 2009 at 1:18 PM, Amin Mohammed-Coleman <
>>>>>>> aminmc@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> Hi
>>>>>>>
>>>>>>>>
>>>>>>>> Sorry to re send this email but I was wondering if I could
get some
>>>>>>>> advice
>>>>>>>> on this.
>>>>>>>>
>>>>>>>> Cheers
>>>>>>>>
>>>>>>>> Amin
>>>>>>>>
>>>>>>>> On 16 Feb 2009, at 20:37, Amin Mohammed-Coleman <aminmc@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Hi
>>>>>>>>
>>>>>>>>
>>>>>>>>> I am looking at building a faceted search using Lucene.
 I know
>>>>>>>>> that
>>>>>>>>> Solr
>>>>>>>>> comes with this built in, however I would like to try
this by
>>>>>>>>> myself
>>>>>>>>> (something to add to my CV!).  I have been looking around
and I
>>>>>>>>> found
>>>>>>>>> that
>>>>>>>>> you can use the IndexReader and use TermVectors.  This
looks ok but
>>>>>>>>> I'm
>>>>>>>>> not
>>>>>>>>> sure how to filter the results so that a particular user
can only
>>>>>>>>> see
>>>>>>>>> a
>>>>>>>>> subset of results.  The next option I was looking at
was something
>>>>>>>>> like
>>>>>>>>>
>>>>>>>>> Term term1 = new Term("brand", "ford");
>>>>>>>>> Term term2 = new Term("brand", "vw");
>>>>>>>>> Term[] termsArray = new Term[] { term1, term2 };un
>>>>>>>>> int[] docFreqs = indexSearcher.docFreqs(termsArray);
>>>>>>>>>
>>>>>>>>> The only problem here is that I have to provide the brand
type each
>>>>>>>>> time a
>>>>>>>>> new brand is created.  Again I'm not sure how I can filter
the
>>>>>>>>> results
>>>>>>>>> here.
>>>>>>>>> It may be that I'm using the wrong api methods to do
this.
>>>>>>>>>
>>>>>>>>> I would be grateful if I could get some advice on this.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Cheers
>>>>>>>>> Amin
>>>>>>>>>
>>>>>>>>> P.S.  I am basically trying to do something that displays
the
>>>>>>>>> following
>>>>>>>>>
>>>>>>>>> Personal Contact (23) Business Contact (45) and so on..
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>> --
>>>>>>> Marcelo F. Ochoa
>>>>>>> http://marceloochoa.blogspot.com/
>>>>>>> http://marcelo.ochoa.googlepages.com/home
>>>>>>> ______________
>>>>>>> Want to integrate Lucene and Oracle?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> http://marceloochoa.blogspot.com/2007/09/running-lucene-inside-your-oracle-jvm.html
>>>>>>> Is Oracle 11g REST ready?
>>>>>>>
>>>>>>> http://marceloochoa.blogspot.com/2008/02/is-oracle-11g-rest-ready.html
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message