lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wu, Stephen T., Ph.D." <Wu.Step...@mayo.edu>
Subject Re: what is the offsets and payload in DocsAndPositionsEnum for ??
Date Tue, 27 Nov 2012 15:59:52 GMT
I think we're looking at doing something related.  I haven't explored the
Enums or know how to make a postings codec... But what is "flexible
indexing" in Lucene 4.0 if it's not the ability to make new postings codecs?

We're trying to incorporate attributes onto terms/spans in indexes.  We'd
also like to try out some interesting ways to score things that go beyond
just tokens. 

We were considering using Attributes instead of Payloads, because it seems
like using Payloads ties you to a particular kind of scoring -- just a
weight on a token.  Can Payloads be used for more general scoring functions?
E.g., considering a span of text alongside multiple Payloads?

Does it make sense to move outside of Payloads here?

Thanks!

stephen




On 11/19/12 8:14 AM, "Michael McCandless" <lucene@mikemccandless.com> wrote:

> A new postings format would be tricky because you have new attributes
> you want to index.
> 
> The DocsAndPositionsEnum does have an attributes source, but this is
> not well explored, and there are known problems (they can't be easily
> merged in the composite reader case).
> 
> So that's why I suggested packing your information into a payload ...
> 
> Mike McCandless
> 
> http://blog.mikemccandless.com
> 
> On Sun, Nov 18, 2012 at 8:33 PM, wgggfiy <wuqiu.reg@qq.com> wrote:
>> thx, mike.
>> about the 3th question, "encode them all into the payload" is better than
>> "a new postings format with the codec" ??
>> I mean replace the orginal posting item (position, startOffset, endOffset,
>> payload) with my own inverted item such as
>> class TestPostingItem
>> {
>>         int termId;
>>         long startOffset;
>>         long endOffset;
>>         float score;
>>         int segId;
>>         long timeStamp;
>> }
>> ?
>> 
>> 
>> 
>> 
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/what-is-the-offsets-and-payload-in-DocsAnd
>> PositionsEnum-for-tp4020933p4020968.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message