lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Glen Newton <glen.new...@gmail.com>
Subject Re: What is "flexible indexing" in Lucene 4.0 if it's not the ability to make new postings codecs?
Date Thu, 13 Dec 2012 22:08:50 GMT
Cool! Sounds great!  :-)

Any pointers to a (Lucene) example that attaches a payload to a
start..end span that is more than one token?

thanks,
-Glen

On Thu, Dec 13, 2012 at 5:03 PM, Lance Norskog <goksron@gmail.com> wrote:
> I should not have added that note. The Opennlp patch gives a concrete
> example of adding an annotation to text.
>
>
> On 12/13/2012 01:54 PM, Glen Newton wrote:
>>
>> It is not clear this is exactly what is needed/being discussed.
>>
>>  From the issue:
>> "We are also planning a Tokenizer/TokenFilter that can put parts of
>> speech as either payloads (PartOfSpeechAttribute?) on a token or at
>> the same position."
>>
>> This adds it to a token, not a span. 'same position' does not suggest
>> it also records the end position.
>>
>> -Glen
>>
>> On Thu, Dec 13, 2012 at 4:45 PM, Lance Norskog <goksron@gmail.com> wrote:
>>>
>>> Parts-of-speech is available now, in the indexer.
>>>
>>> LUCENE-2899 adds OpenNLP to the Lucene&Solr codebase. It does
>>> parts-of-speech, chunking and Named Entity Recognition. OpenNLP is an
>>> Apache
>>> project for natural-language processing.
>>>
>>> Some parts are in Solr that could be in Lucene.
>>>
>>> https://issues.apache.org/jira/browse/lucene-2899
>>>
>>>
>>> On 12/12/2012 12:02 PM, Wu, Stephen T., Ph.D. wrote:
>>>>>>
>>>>>> Is there any (preliminary) code checked in somewhere that I can look
>>>>>> at,
>>>>>> that would help me understand the practical issues that would need
to
>>>>>> be
>>>>>> addressed?
>>>>>
>>>>> Maybe we can make this more concrete: what new attribute are you
>>>>> needing to record in the postings and access at search time?
>>>>
>>>> For example:
>>>>    - part of speech of a token.
>>>>    - syntactic parse subtree (over a span).
>>>>    - semantically normalized phrase (to canonical text or ontological
>>>> code).
>>>>    - semantic group (of a span).
>>>>    - coreference link.
>>>>
>>>> stephen
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>



-- 
-
http://zzzoot.blogspot.com/
-

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message