lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "N. Hira" <nh...@cognocys.com>
Subject Re: Fine Tuning Lucene implementation
Date Wed, 25 Jul 2007 01:02:43 GMT
I'm no expert on this (so please accept the comments in that context)
but 2 things seem weird to me:

1.  Iterating over each hit is an expensive proposition.  I've often
seen people recommending a HitCollector.

2.  It seems that doBodySearch() is essentially saying, do this search
and return the score pertinent to this ID (using an exhaustive loop).
Would it be possible to just include a query clause?
    - i.e., instead of just contents:<userQuery>, also add
+id:<idWeCareAbout>

In general though, I think your algorithm seems inefficient (if I
understand it correctly):-- if I want to search for one term among 3 in
a "collection" of 300 documents (as defined by some external attribute),
I will wind up executing 300 x 3 searches, and for each search that is
executed, I will iterate over every Hit, even if I've already found the
one that I "care about".

What would break if you:
1.  Included "creator" in the Lucene index (or, filtered out the Hits
using a BitSet or something like it)
2.  Executed 1 search
3.  Collected the results of the first N Hits (where N is some
reasonable limit, like 100 or 500)

-h


On Tue, 2007-07-24 at 20:14 -0400, Askar Zaidi wrote:

> Sure.
> 
>  public float doBodySearch(Searcher searcher,String query, int id){
> 
>                  try{
>                                 score = search(searcher, query,id);
>                      }
>                       catch(IOException io){} 
>                       catch(ParseException pe){}
> 
>                       return score;
> 
>                 }
> 
>  private float search(Searcher searcher, String queryString, int id)
> throws ParseException, IOException { 
> 
>         // Build a Query object
> 
>         QueryParser queryParser = new QueryParser("contents", new
> KeywordAnalyzer());
> 
>         queryParser.setDefaultOperator(QueryParser.Operator.AND);
> 
>         Query query = queryParser.parse(queryString);
> 
>         // Search for the query
> 
>         Hits hits = searcher.search(query);
>         Document doc = null;
> 
>         // Examine the Hits object to see if there were any matches 
>         int hitCount = hits.length();
> 
>                 for(int i=0;i<hitCount;i++){
>                 doc = hits.doc(i);
>                 String str = doc.get("item");
>                 int tmp = Integer.parseInt(str);
>                 if(tmp==id)
>                 score = hits.score(i);
>                 }
> 
>         return score;
>     }
> 
> I really need to optimize doBodySearch(...) as this takes the most
> time. 
> 
> thanks guys,
> Askar 
> 
> 
> On 7/24/07, N. Hira <nhira@cognocys.com> wrote:
> 
>         Could you show us the relevant source from doBodySearch()?
>         
>         -h
>         
>         On Tue, 2007-07-24 at 19:58 -0400, Askar Zaidi wrote:
>         > I ran some tests and it seems that the slowness is from
>         Lucene calls when I
>         > do "doBodySearch", if I remove that call, Lucene gives me
>         results in 5 
>         > seconds. otherwise it takes about 50 seconds.
>         >
>         > But I need to do Body search and that field contains lots of
>         text. The field
>         > is <contents>. How can I optimize that ?
>         >
>         > thanks, 
>         > Askar
>         >
>         >




Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message