lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "T. Kuro Kurosaka" <k...@basistech.com>
Subject Re: Lucene 4 - POS and Syntactic Tagging
Date Mon, 09 Apr 2012 20:10:02 GMT
If you want to search on part-of-speech tag, I'd just make a parallel 
field ("text_pos" for the field "text", for example) and search on that 
field (text_pos:noun).

Kuro

On 3/14/12 9:37 AM, Mark McGuire wrote:
> I'm working on a project where I need to tag both the part of speech 
> and other syntactic information on tokens so that this information is 
> searchable.  I have read the threads on the mailing list regarding 
> part of speech tagging here 
> <http://mail-archives.apache.org/mod_mbox/lucene-java-user/201105.mbox/%3CBANLkTimwqcQ_GF2pxE8Hyc_R75NcWDRWbQ@mail.gmail.com%3E>

> and the many responses to similar questions.  To me, inserting 0 
> increment tokens seems rather clunky, especially when TypeAttributes 
> appear to be what one would want to use.  Does Lucene do anything 
> extra when the Type is set to or not set to its default, "word"?  Is 
> it possible to write a search that uses multiple attributes from 
> TokenAttributes (ie a search that searches for CharTermAttribute "dog" 
> followed by a TypeAttribute of verb)?
>
> Also if I were to use 0 increment tokens for tagging, would data like 
> document length or sumTotalTermFreq be different from a document 
> indexed without these tags?  How would I counteract these differences 
> if any occur?
>
> Thanks,
> Mark McGuire
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message