lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From hg...@cswebmail.com
Subject Search Expansion - one step closer ... !
Date Sun, 04 Apr 2004 16:28:34 GMT
Hi Eric, all,

did construct a boolean query. No error with regard to
my search expansion now: did pass more than 800 terms
onto Lucene:  Great - thanks !

One other problem though I would need your advice on:

Several of my terms are in fact keyphrases with 2 or
more words separated by whitespaces, e.g. 'host
defense'.
They are obviously not handled properly during the
construction of the boolean query because 'host
defense' is not found though it is in the field.
Replacing the whitespace inbetween the words by an
underscore ('host_defense' which is recognised by query
parser and yields similar results to double 

quoting, e.g. "host defense") did not retrieve either
...

I had to convert to lowercase before sending to his
function because - unlike in the QueryParser call - no
analyzer is used at the moment. 
Indexing was done with StandardAnalyzer so I would
prefer using an analyser at search as well. 
The terms are well formed because they are taken from a
domain ontology but there could be inconsistencies in
spelling between what is in the ontology and 

what is in the field, e.g. as 'host-defense' which
would need equivalent handling to 'host defense'. Guess
this will be dealt with by the analyser - but where do
I 

put it within the current code (see below) with boolean
query generation ?

Any hints ?
Anyway - thanks a lot so far !

Holger


Code follows:

    public String[] doSearchBQ(String index_path,
String[] myquery){
    // does query processing without QueryParser but by
contructing a boolean query	
    try {
      Searcher searcher = new IndexSearcher(index_path);
      Analyzer analyzer = new StandardAnalyzer();
	
	BooleanQuery query = new BooleanQuery();
	
	//for each term to add:
	for (int j=0; j<myquery.length; j++){
	query.add(new TermQuery(new Term("subject",
myquery[j])), false, false);
	}
	
	Hits hits = searcher.search(query);
	
	lucene_out = new String[hits.length()];	
	for (int i = 0; i < hits.length(); i ++)
     	 {
	    Document doc = hits.doc(i);
	    String name = doc.get("filename");
	    lucene_out[i] = name + "|" + doc.get("subject") +
"|" + doc.get("message");
	}
      searcher.close();

    } catch (Exception e) {
      System.out.println(" caught a " + e.getClass() +
			 "\n with message: " + e.getMessage());
    }
    return lucene_out;
  }

___________________________________________________
The ALL NEW CS2000 from CompuServe
 Better!  Faster! More Powerful!
 250 FREE hours! Sign-on Now!
 http://www.compuserve.com/trycsrv/cs2000/webmail/





---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message