lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wu, Stephen T., Ph.D." <Wu.Step...@mayo.edu>
Subject Re: What is "flexible indexing" in Lucene 4.0 if it's not the ability to make new postings codecs?
Date Fri, 30 Nov 2012 17:25:47 GMT
Is there any (preliminary) code checked in somewhere that I can look at,
that would help me understand the practical issues that would need to be
addressed?

If I understand you correctly, it's a little different from what's happening
in your blog posts:
http://blog.mikemccandless.com/2012/07/building-new-lucene-postings-format.h
tml
http://blog.mikemccandless.com/2012/08/lucenes-new-blockpostingsformat-thank
s.html
Those posts deal with making your own codec, but not about changing what's
stored in the postings?  I guess I misunderstood "postings format" before.

stephen

> Flexible indexing is the ability to make your own codec, which
> controls the reading and writing of all index parts (postings, stored
> fields, term vectors, deleted docs, etc.).
> 
> So for example if you want to store some postings as a bit set instead
> of the block format that's the default coming up in 4.1, that's easy
> to do.
> 
> But what is less easy (as I described below) is changing what is
> actually stored in the postings, eg adding a new per-position
> attribute.
> 
> The original goal was to allow arbitrary attributes beyond the known
> docs/freqs/positions/offsets that Lucene supports today, so that you
> could easily make new application-dependent per-term, per-doc,
> per-position things, pull them from the analyzer, save them to the
> index, and access them from an IndexReader / query, but while some
> APIs do expose this, it's not very well explored yet (eg, you'd have
> to make a custom indexing chain to get the attributes "through"
> IndexWriter down to your codec).  It would be great to make progress
> making this easier, so ideas are very welcome :)
> 
> Mike McCandless
> 
> http://blog.mikemccandless.com
> 
> On Tue, Nov 27, 2012 at 3:37 PM, Wu, Stephen T., Ph.D.
> <Wu.Stephen@mayo.edu> wrote:
>> Following up on a previous question...
>> What is "flexible indexing" in Lucene 4.0?  We assumed it was the ability to
>> easily make new postings formats/codecs -- but a response below says that
>> would be "tricky"?
>> 
>> stephen
>> 
>> 
>> On 11/27/12 11:48 AM, "David Causse" <dcausse@spotter.com> wrote:
>> 
>>> Hi,
>>> 
>>> We use payloads but we can't use the whole lucene API.
>>> For example we use it to do some relation query for example :
>>> 
>>> @quote(@speaker(obama) @discourse(health))
>>> 
>>> Search for all documents that contains a quote by Obama talking about
>>> health.
>>> We encode linguistic informations (standoff annotations) inside payloads
>>> and use custom search API to query the index.
>>> I didn't found a convenable way to attach my code to lucene
>>> Query/Scorer/Weight API. Like SpanQuery you have to rewrite the whole
>>> Query stack.
>>> In short if you want to go with Payloads that do more than boosting a
>>> term there's chances that you'll need to rewrite a big part of the query
>>> stack.
>>> 
>>> 
>>> Le 27/11/2012 16:59, Wu, Stephen T., Ph.D. a ├ęcrit :
>>>> I think we're looking at doing something related.  I haven't explored the
>>>> Enums or know how to make a postings codec... But what is "flexible
>>>> indexing" in Lucene 4.0 if it's not the ability to make new postings
>>>> codecs?
>>>> 
>>>> We're trying to incorporate attributes onto terms/spans in indexes.  We'd
>>>> also like to try out some interesting ways to score things that go beyond
>>>> just tokens.
>>>> 
>>>> We were considering using Attributes instead of Payloads, because it seems
>>>> like using Payloads ties you to a particular kind of scoring -- just a
>>>> weight on a token.  Can Payloads be used for more general scoring
>>>> functions?
>>>> E.g., considering a span of text alongside multiple Payloads?
>>>> 
>>>> Does it make sense to move outside of Payloads here?
>>>> 
>>>> Thanks!
>>>> 
>>>> stephen
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On 11/19/12 8:14 AM, "Michael McCandless" <lucene@mikemccandless.com>
>>>> wrote:
>>>> 
>>>>> A new postings format would be tricky because you have new attributes
>>>>> you want to index.
>>>>> 
>>>>> The DocsAndPositionsEnum does have an attributes source, but this is
>>>>> not well explored, and there are known problems (they can't be easily
>>>>> merged in the composite reader case).
>>>>> 
>>>>> So that's why I suggested packing your information into a payload ...
>>>>> 
>>>>> Mike McCandless
>>>>> 
>>>>> http://blog.mikemccandless.com
>>>>> 
>>>>> On Sun, Nov 18, 2012 at 8:33 PM, wgggfiy <wuqiu.reg@qq.com> wrote:
>>>>>> thx, mike.
>>>>>> about the 3th question, "encode them all into the payload" is better
than
>>>>>> "a new postings format with the codec" ??
>>>>>> I mean replace the orginal posting item (position, startOffset,
>>>>>> endOffset,
>>>>>> payload) with my own inverted item such as
>>>>>> class TestPostingItem
>>>>>> {
>>>>>>          int termId;
>>>>>>          long startOffset;
>>>>>>          long endOffset;
>>>>>>          float score;
>>>>>>          int segId;
>>>>>>          long timeStamp;
>>>>>> }
>>>>>> ?
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> View this message in context:
>>>>>> http://lucene.472066.n3.nabble.com/what-is-the-offsets-and-payload-in-Doc
>>>>>> sA
>>>>>> nd
>>>>>> PositionsEnum-for-tp4020933p4020968.html
>>>>>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>>>>> 
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>> 
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>> 
>>>> 
>>>> 
>>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message