lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: Indexing of virtual "made up" documents
Date Wed, 27 Apr 2005 01:50:08 GMT
On Apr 26, 2005, at 4:46 PM, Paul Libbrecht wrote:
>
> Le 26 avr. 05, à 15:00, Erik Hatcher a écrit :
>>> I am not sure how Lucenes uses the placement information, but in the
>>> described case where I concatenate all my features to a
>>> whitespace-delimited text, I fear that Lucene uses the placement of
>>> features in this made-up text and comes to some wrong conclusions 
>>> (after
>>> all, the placement is arbitrary in the "made-up" text).
>> What wrong conclusions do you fear here?  Again, the position 
>> information is used for phrase queries, but in your situation you 
>> wouldn't be using phrase queries so no need to concern yourself with 
>> the position stuff at all.
>
> There are some information retrieval settings which tend to say that 
> things that appear early in the document should be considered with 
> greater score... is there nothing such in Lucene's scoring ?

No, Lucene doesn't have that feature, at least not explicitly....  it 
could be hacked, sort of, by injecting multiple of the same term in the 
same position (to get a higher term frequency) for the earlier terms.  
Back to the original question - the position information will not 
adversely affect scoring.

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message