lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dmitry" <dmitrytka...@hotmail.com>
Subject Re: Fine Tuning Lucene implementation
Date Wed, 25 Jul 2007 06:01:18 GMT
Askar,
why do you need to add +id:<idWeCareAbout>?
thanks,
dt,
www.ejinz.com
search engine news forms
----- Original Message ----- 
From: "Askar Zaidi" <askar.zaidi@gmail.com>
To: <java-user@lucene.apache.org>; <nhira@cognocys.com>
Sent: Wednesday, July 25, 2007 12:39 AM
Subject: Re: Fine Tuning Lucene implementation


> Hey Hira ,
>
> Thanks so much for the reply. Much appreciate it.
>
> Quote:
>
> Would it be possible to just include a query clause?
>   - i.e., instead of just contents:<userQuery>, also add
> +id:<idWeCareAbout>
>
> How can I do that ?
>
> I see my query as :
>
> +contents:harvard +contents:business +contents:review
>
> where the search phrase was: harvard business review
>
> Now how can I add +id:<idWeCareAbout>  ??
>
> This would give me that one exact document I am looking for , for that id. 
> I
> don't have to iterate through hits.
>
> thanks,
>
> Askar
>
>
>
> On 7/24/07, N. Hira <nhira@cognocys.com> wrote:
>>
>> I'm no expert on this (so please accept the comments in that context)
>> but 2 things seem weird to me:
>>
>> 1.  Iterating over each hit is an expensive proposition.  I've often
>> seen people recommending a HitCollector.
>>
>> 2.  It seems that doBodySearch() is essentially saying, do this search
>> and return the score pertinent to this ID (using an exhaustive loop).
>> Would it be possible to just include a query clause?
>>     - i.e., instead of just contents:<userQuery>, also add
>> +id:<idWeCareAbout>
>>
>> In general though, I think your algorithm seems inefficient (if I
>> understand it correctly):-- if I want to search for one term among 3 in
>> a "collection" of 300 documents (as defined by some external attribute),
>> I will wind up executing 300 x 3 searches, and for each search that is
>> executed, I will iterate over every Hit, even if I've already found the
>> one that I "care about".
>>
>> What would break if you:
>> 1.  Included "creator" in the Lucene index (or, filtered out the Hits
>> using a BitSet or something like it)
>> 2.  Executed 1 search
>> 3.  Collected the results of the first N Hits (where N is some
>> reasonable limit, like 100 or 500)
>>
>> -h
>>
>>
>> On Tue, 2007-07-24 at 20:14 -0400, Askar Zaidi wrote:
>>
>> > Sure.
>> >
>> >  public float doBodySearch(Searcher searcher,String query, int id){
>> >
>> >                  try{
>> >                                 score = search(searcher, query,id);
>> >                      }
>> >                       catch(IOException io){}
>> >                       catch(ParseException pe){}
>> >
>> >                       return score;
>> >
>> >                 }
>> >
>> >  private float search(Searcher searcher, String queryString, int id)
>> > throws ParseException, IOException {
>> >
>> >         // Build a Query object
>> >
>> >         QueryParser queryParser = new QueryParser("contents", new
>> > KeywordAnalyzer());
>> >
>> >         queryParser.setDefaultOperator(QueryParser.Operator.AND);
>> >
>> >         Query query = queryParser.parse(queryString);
>> >
>> >         // Search for the query
>> >
>> >         Hits hits = searcher.search(query);
>> >         Document doc = null;
>> >
>> >         // Examine the Hits object to see if there were any matches
>> >         int hitCount = hits.length();
>> >
>> >                 for(int i=0;i<hitCount;i++){
>> >                 doc = hits.doc(i);
>> >                 String str = doc.get("item");
>> >                 int tmp = Integer.parseInt(str);
>> >                 if(tmp==id)
>> >                 score = hits.score(i);
>> >                 }
>> >
>> >         return score;
>> >     }
>> >
>> > I really need to optimize doBodySearch(...) as this takes the most
>> > time.
>> >
>> > thanks guys,
>> > Askar
>> >
>> >
>> > On 7/24/07, N. Hira <nhira@cognocys.com> wrote:
>> >
>> >         Could you show us the relevant source from doBodySearch()?
>> >
>> >         -h
>> >
>> >         On Tue, 2007-07-24 at 19:58 -0400, Askar Zaidi wrote:
>> >         > I ran some tests and it seems that the slowness is from
>> >         Lucene calls when I
>> >         > do "doBodySearch", if I remove that call, Lucene gives me
>> >         results in 5
>> >         > seconds. otherwise it takes about 50 seconds.
>> >         >
>> >         > But I need to do Body search and that field contains lots of
>> >         text. The field
>> >         > is <contents>. How can I optimize that ?
>> >         >
>> >         > thanks,
>> >         > Askar
>> >         >
>> >         >
>>
>>
>>
>>
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message