lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: Payloads
Date Sun, 20 Dec 2009 15:06:20 GMT
The problem was solved in #lucene irc channel already. The behaviour of
PayloadTermQuery was correct if you compare scores of a document with an
even and no-even match in the *same* query.

In general: You cannot compare scores on different queries or different
indexes.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Elias Khsheibun [mailto:elias3@gmail.com]
> Sent: Sunday, December 20, 2009 2:51 PM
> To: java-user@lucene.apache.org
> Subject: RE: Payloads
> 
> 
> I'm trying to run queries now, the problem is - the scoring of the
> BoostingTermQuery is always giving a double weight to even terms, and not
> if
> the query itself contains the term, here is the code that I'm using:
> 
> 
> public class DocumentAnalyzer extends Analyzer {
> 
> 	@Override
> 	public TokenStream tokenStream(String fieldName, Reader reader) {
> 		TokenStream result = new WhitespaceTokenizer(reader);
> 		result = new TermPositionPayloadTokenFilter(result);
> 
> 		return result;
> 	}
> 
> }
> 
> 
> public class TermPositionPayloadTokenFilter extends TokenFilter {
> 
>     protected PayloadAttribute payAtt;
>     protected PositionIncrementAttribute posIncrAtt;
> 
>     private static final Payload evenPayload = new
> Payload(PayloadHelper.encodeFloat(2.0f));
> 
>     private int termPosition = 0;
> 
>     public TermPositionPayloadTokenFilter(TokenStream input) {
>         super(input);
>         payAtt = (PayloadAttribute) addAttribute(PayloadAttribute.class);
>         posIncrAtt = (PositionIncrementAttribute)
> addAttribute(PositionIncrementAttribute.class);
>     }
> 
>     @Override
>     public final boolean incrementToken() throws IOException {
>         if (input.incrementToken()) {
>             if ((termPosition % 2) == 0)
>                 payAtt.setPayload(evenPayload);
>             termPosition += posIncrAtt.getPositionIncrement();
>             return true;
>         } else {
>             return false;
>         }
>     }
> 
> }
> 
> 
> 
> public class BoostingSimilarity extends DefaultSimilarity {
> 	public float scorePayload(String fieldName, byte[] payload, int
> offset, int length) {
> 	if (payload != null)
> 	return PayloadHelper.decodeFloat(payload, offset);
> 
> 	else
> 	return 1.0F;
> 	}
> }
> 
> And this is a test I've written, if you look at the scores, then you will
> notice that the BoostingTermQuery is always giving a double weight to even
> terms no matter if they appear in the query or no (this is my current
> problem now):
> 
> public class PayloadsTest extends TestCase {
> 	Directory dir;
> 	IndexWriter writer;
> 	DocumentAnalyzer analyzer;
> 	protected void setUp() throws Exception {
> 	super.setUp();
> 	dir = new RAMDirectory();
> 	analyzer = new DocumentAnalyzer();
> 	writer = new IndexWriter(dir, analyzer,
> IndexWriter.MaxFieldLength.UNLIMITED);
> 	}
> 	protected void tearDown() throws Exception {
> 	super.tearDown();
> 	writer.close();
> 	}
> 	void addDoc(String title, String contents) throws IOException {
> 	Document doc = new Document();
> 	doc.add(new Field("title",
> 	title,
> 	Field.Store.YES,
> 	Field.Index.NO));
> 
> 	doc.add(new Field("contents",
> 			contents,
> 			Field.Store.NO,
> 			Field.Index.ANALYZED));
> 
> 	writer.addDocument(doc);
> 	}
> 
> 	public void testBoostingTermQuery() throws Throwable {
> 	addDoc("Hurricane warning", "A hurricane warning was issued at 6 AM
> for the outer great banks");
> 	addDoc("Warning label maker", "The warning label maker is a
> delightful toy for your precocious six year old's warning needs");
> 	addDoc("Tornado warning", "There is a tornado warning for Worcester
> county until 6 PM today");
> 	writer.commit();
> 	IndexSearcher searcher = new IndexSearcher(dir);
> 	searcher.setSimilarity(new BoostingSimilarity());
> 	Term warning = new Term("contents", "tornado");
> 	Query query1 = new TermQuery(warning);
> 	System.out.println("\nTermQuery results:");
> 
> 	ScoreDoc [] hits = searcher.search(query1, 10).scoreDocs;
> 	 for (int i = 0; i < hits.length; i++) {
> 	      Document hitDoc = searcher.doc(hits[i].doc);
> 	      System.out.println(hitDoc.get("title"));
> 	 }
> 	Query query2 = new BoostingTermQuery(warning);
> 	System.out.println("\nBoostingTermQuery results:");
> 
> 	ScoreDoc [] hits2 = searcher.search(query2, 10).scoreDocs;
> 	for (int i = 0; i < hits2.length; i++) {
> 	      Document hitDoc = searcher.doc(hits2[i].doc);
> 	      System.out.println(hitDoc.get("title"));
> 	 }
> 	}
> 	}
> 
> 
> -----Original Message-----
> From: AHMET ARSLAN [mailto:iorixxx@yahoo.com]
> Sent: Saturday, December 19, 2009 11:19 PM
> To: java-user@lucene.apache.org
> Subject: RE: Payloads
> 
> 
> > If I need to override the QueryParser
> > to return PayloadTermQuery, what
> > function for PayloadFunction should I use in the
> > constructor (If you can
> > show me an example).
> 
> I am not sure about that. Maybe custom one.
> 
> > In your code I didn't see an indexer, will this work with
> > the regular
> > IndexWriter but with the new Analyzer that you overloaded
> 
> No, at index time [IndexWriter] you are going to use a new analyzer that
> uses WhitespaceTokenizer  + TermPositionPayloadTokenFilter.
> 
> PayloadAnalyzer will be used at query time. [QueryParser]
> 
> You need to setSimilarity(new CustomSimilarity) of both indexer and
> searcher.
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message