lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "T. Kuro Kurosaka" <k...@basistech.com>
Subject Re: Lucene 4 - POS and Syntactic Tagging
Date Tue, 10 Apr 2012 06:57:43 GMT
Please disregard this suggestion. It is a bad idea. Almost every text 
would have a verb, noun, etc. so search on a pos tag only field won't 
make sense.  Maybe the parallel field should have a lemma (dictionary 
form) plus part-of-speech tag putting together as a token like 
"like_verb", "lemming_propernoun"?

On 4/9/12 1:10 PM, T. Kuro Kurosaka wrote:
> If you want to search on part-of-speech tag, I'd just make a parallel 
> field ("text_pos" for the field "text", for example) and search on 
> that field (text_pos:noun).
>
> Kuro
>
> On 3/14/12 9:37 AM, Mark McGuire wrote:
>> I'm working on a project where I need to tag both the part of speech 
>> and other syntactic information on tokens so that this information is 
>> searchable.  I have read the threads on the mailing list regarding 
>> part of speech tagging here 
>> <http://mail-archives.apache.org/mod_mbox/lucene-java-user/201105.mbox/%3CBANLkTimwqcQ_GF2pxE8Hyc_R75NcWDRWbQ@mail.gmail.com%3E>

>> and the many responses to similar questions.  To me, inserting 0 
>> increment tokens seems rather clunky, especially when TypeAttributes 
>> appear to be what one would want to use.  Does Lucene do anything 
>> extra when the Type is set to or not set to its default, "word"?  Is 
>> it possible to write a search that uses multiple attributes from 
>> TokenAttributes (ie a search that searches for CharTermAttribute 
>> "dog" followed by a TypeAttribute of verb)?
>>
>> Also if I were to use 0 increment tokens for tagging, would data like 
>> document length or sumTotalTermFreq be different from a document 
>> indexed without these tags?  How would I counteract these differences 
>> if any occur?
>>
>> Thanks,
>> Mark McGuire
>>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message