Return-Path: Mailing-List: contact lucene-dev-help@jakarta.apache.org; run by ezmlm Delivered-To: mailing list lucene-dev@jakarta.apache.org Received: (qmail 17658 invoked from network); 25 Sep 2003 20:52:50 -0000 Received: from unknown (HELO server0027.freedom2surf.net) (194.106.33.36) by daedalus.apache.org with SMTP; 25 Sep 2003 20:52:50 -0000 Received: from dell (searcharea.co.uk [194.106.34.5]) by server0027.freedom2surf.net (8.12.6/8.12.6/Debian-7) with SMTP id h8PKqtit018097 for ; Thu, 25 Sep 2003 20:52:55 GMT Date: Thu, 25 Sep 2003 20:52:55 GMT Message-Id: <200309252052.h8PKqtit018097@server0027.freedom2surf.net> From: markharw00d@yahoo.co.uk To: lucene-dev@jakarta.apache.org Subject: RE : New highlighter package available X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N Thanks for the feedback on the highlighter package. Here are some responses to the issues raised: >>what may be the performance implications seeing that >>the method query.rewrite(reader) seems to be called twice, one for >>querying, once for highlighting. One solution is to do this before calling the highlighter: query=query.rewrite(reader); //turn into a primitive query Hits hits = searcher.search(query); QueryHighlightExtractor h = new QueryHighlightExtractor(reader, query, new StandardAnalyzer(), "", ""); Would you want the highlighter to enforce this optimisation by insisting that queries passed to it are not multi-term ones that require expansion? That way we would not need to pass an IndexReader to the Highlighter constructors and should redefine them to be capable of throwing a "QueryNotRewrittenException" if we find un-expanded queries are passed. It seems a bit heavy-handed to beat people over the head like this for not passing a pre-optimized query. Maybe the best solution is to remove support for highlighting multi-term queries entirely from the highlighter - the caller must call rewrite() BEFORE calling the highlighter if they expect multi-terms to be highlighted. I think thats my favoured approach - thoughts? >>Is it possible to split the logic (2 classes ?) which : >>a) handles highlighting >>b) grabs Query terms (method getTerms and its dependencies) The TextHighlighter class is already a class that purely handles highlighting (independent of query terms). The getTerms() function is made public in QueryHighlighter as I thought it might be of use to some people. I guess I could move it into a static function on a utility class somewhere but I struggle to think of uses outside of text highlighting? Surely the query classes offer better metadata about a query (eg phrases, boosts etc) so does this "Term[] getTerms(Query)" function warrant a specialised home anywhere? >>Does anyone know if this package supports highlighting in MultiSearcher >>environments? This works but looks ugly: //setup index 1 RAMDirectory ramDir1 = new RAMDirectory(); IndexWriter writer1 = new IndexWriter(ramDir1, new StandardAnalyzer(), true); Document d = new Document(); Field f = new Field(FIELD_NAME, "multiOne", true, true, true); d.add(f); writer1.addDocument(d); writer1.optimize(); writer1.close(); IndexReader reader1 = IndexReader.open(ramDir1); //setup index 2 RAMDirectory ramDir2 = new RAMDirectory(); IndexWriter writer2 = new IndexWriter(ramDir2, new StandardAnalyzer(), true); d = new Document(); f = new Field(FIELD_NAME, "multiTwo", true, true, true); d.add(f); writer2.addDocument(d); writer2.optimize(); writer2.close(); IndexReader reader2 = IndexReader.open(ramDir2); IndexSearcher searchers[]=new IndexSearcher[2]; searchers[0] = new IndexSearcher(ramDir1); searchers[1] = new IndexSearcher(ramDir2); MultiSearcher multiSearcher=new MultiSearcher(searchers); query = QueryParser.parse("multi*", FIELD_NAME, new StandardAnalyzer()); System.out.println("Searching for: " + query.toString(FIELD_NAME)); hits = multiSearcher.search(query); //Now do some query expansion Query expandedQueries[]=new Query[2]; expandedQueries[0]=query.rewrite(reader1); expandedQueries[1]=query.rewrite(reader2); Query combinedExpandedQuery=query.combine(expandedQueries); //NB The reader passed here is irrelevant as the query is expanded QueryHighlightExtractor highlighter = new QueryHighlightExtractor(this, reader2, combinedExpandedQuery, new StandardAnalyzer()); Thanks again Mark