lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <>
Subject Re: Indexing of virtual "made up" documents
Date Wed, 27 Apr 2005 01:50:08 GMT
On Apr 26, 2005, at 4:46 PM, Paul Libbrecht wrote:
> Le 26 avr. 05, à 15:00, Erik Hatcher a écrit :
>>> I am not sure how Lucenes uses the placement information, but in the
>>> described case where I concatenate all my features to a
>>> whitespace-delimited text, I fear that Lucene uses the placement of
>>> features in this made-up text and comes to some wrong conclusions 
>>> (after
>>> all, the placement is arbitrary in the "made-up" text).
>> What wrong conclusions do you fear here?  Again, the position 
>> information is used for phrase queries, but in your situation you 
>> wouldn't be using phrase queries so no need to concern yourself with 
>> the position stuff at all.
> There are some information retrieval settings which tend to say that 
> things that appear early in the document should be considered with 
> greater score... is there nothing such in Lucene's scoring ?

No, Lucene doesn't have that feature, at least not explicitly....  it 
could be hacked, sort of, by injecting multiple of the same term in the 
same position (to get a higher term frequency) for the earlier terms.  
Back to the original question - the position information will not 
adversely affect scoring.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message