lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amit Kumar <ami...@uiuc.edu>
Subject Storing Part of Speech information in Lucene Indices
Date Wed, 12 Jul 2006 05:36:24 GMT
Hi,

A new project that I am investigating lucene for needs the  Parts of  
speech information for the tokens. I can  get that
information using NLP techniques  (GATE etc.), by pre processing the  
documents but I would like to  store that
information in the Indices. Something along the lines of

TermVectorOffsetInfo[?].getPartofSpeech();

I am writing to ask for your advice, you can tell me I am b o n k e r  
s  or let me know where I should start digging :).
Is that a good idea? Or would it be just less trouble for me to store  
the offset information along with parts of speech
outside Lucene.

Has anyone else done that?

Best,
Amit


ps: Thank you for putting the LuceneInAction source online, it was a  
great help to see the CategorizerTest.java.
I am ordering my copy of the book tomorrow :)

---------------------------------------------------------
Amit Kumar
Research Programmer
The Graduate School of Library and Information Science
University of Illinois, Urbana Champaign IL, 61820
phone: 217-333-4118 fax: 217-244-3302
---------------------------------------------------------







Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message