lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Causse <dcau...@spotter.com>
Subject Re: what is the offsets and payload in DocsAndPositionsEnum for ??
Date Tue, 27 Nov 2012 17:48:21 GMT
Hi,

We use payloads but we can't use the whole lucene API.
For example we use it to do some relation query for example :

@quote(@speaker(obama) @discourse(health))

Search for all documents that contains a quote by Obama talking about 
health.
We encode linguistic informations (standoff annotations) inside payloads 
and use custom search API to query the index.
I didn't found a convenable way to attach my code to lucene 
Query/Scorer/Weight API. Like SpanQuery you have to rewrite the whole 
Query stack.
In short if you want to go with Payloads that do more than boosting a 
term there's chances that you'll need to rewrite a big part of the query 
stack.


Le 27/11/2012 16:59, Wu, Stephen T., Ph.D. a écrit :
> I think we're looking at doing something related.  I haven't explored the
> Enums or know how to make a postings codec... But what is "flexible
> indexing" in Lucene 4.0 if it's not the ability to make new postings codecs?
>
> We're trying to incorporate attributes onto terms/spans in indexes.  We'd
> also like to try out some interesting ways to score things that go beyond
> just tokens.
>
> We were considering using Attributes instead of Payloads, because it seems
> like using Payloads ties you to a particular kind of scoring -- just a
> weight on a token.  Can Payloads be used for more general scoring functions?
> E.g., considering a span of text alongside multiple Payloads?
>
> Does it make sense to move outside of Payloads here?
>
> Thanks!
>
> stephen
>
>
>
>
> On 11/19/12 8:14 AM, "Michael McCandless" <lucene@mikemccandless.com> wrote:
>
>> A new postings format would be tricky because you have new attributes
>> you want to index.
>>
>> The DocsAndPositionsEnum does have an attributes source, but this is
>> not well explored, and there are known problems (they can't be easily
>> merged in the composite reader case).
>>
>> So that's why I suggested packing your information into a payload ...
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>> On Sun, Nov 18, 2012 at 8:33 PM, wgggfiy <wuqiu.reg@qq.com> wrote:
>>> thx, mike.
>>> about the 3th question, "encode them all into the payload" is better than
>>> "a new postings format with the codec" ??
>>> I mean replace the orginal posting item (position, startOffset, endOffset,
>>> payload) with my own inverted item such as
>>> class TestPostingItem
>>> {
>>>          int termId;
>>>          long startOffset;
>>>          long endOffset;
>>>          float score;
>>>          int segId;
>>>          long timeStamp;
>>> }
>>> ?
>>>
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://lucene.472066.n3.nabble.com/what-is-the-offsets-and-payload-in-DocsAnd
>>> PositionsEnum-for-tp4020933p4020968.html
>>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>


-- 
David Causse
Spotter
http://www.spotter.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message