lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wu, Stephen T., Ph.D." <Wu.Step...@mayo.edu>
Subject More about storing NLP-type stuff in the index
Date Thu, 03 Jan 2013 23:16:06 GMT
I think we've been saying that if we put something in a Payload, it will be
indexed.  From what I understand of the indexing format, that means that
what you put in the Payload will be stored in the Lucene index... But it
won't *itself* be indexed & optimized for search.

That's good, but can we build inverted indices on the contents of the
Payloads (or the Attributes) as well?
 Ex1: Say I put semantic role labels like ARG0 into my index. Say my search
is looking for all instances of ARG0.
 Ex2: Say I add payloads to terms indicating that they're named entities
belonging to a semantic group.  Then say my query looks for all instances of
the "Medications" semantic group.

It's almost like just putting these things in different fields, with the
exception that the things in different fields need to be linked so you know
what the original text was.  Maybe the linking can be done via Payloads
(offsets in the original text)?  If I want to store multiple things at the
same startOffset then I just use something like SynonymFilter?

stephen


On 12/21/12 6:45 AM, "Michael McCandless" <lucene@mikemccandless.com> wrote:

> On Thu, Dec 20, 2012 at 3:54 PM, Wu, Stephen T., Ph.D.
> <Wu.Stephen@mayo.edu> wrote:
>>> If you stuff the end of the span into the payload you'd have to create
>>> a custom variant of PhraseQuery to properly match based on the end
>>> span.
>> 
>> How different is this from the functionality already avaialable through
>> SpanQuery?
> 
> Good question!
> 
> I think the difference would be index-time (payload encoding span-end
> + new Query) vs search time (SpanQuery)?
> 
> Ie, with the former (index-time) you'd have a TokenFilter spotting the
> spans and encoding them into the index, and with the latter all
> spotting happens at search time?
> 
> So net/net I guess (?) the results would be the same, but performance
> should be faster if you do it index-time?
> 
> Mike McCandless
> 
> http://blog.mikemccandless.com
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message