lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject Re: question about grouping text
Date Wed, 25 Mar 2009 16:18:35 GMT

This comes down to a preprocessing step that you would have to do  
before putting into Lucene, although I suppose you might be able to  
identify it during analysis and use the TeeTokenFilter and the  
SinkTokenizer.  Once you do this, then you can add them as fields on a  
Document.  I know that's not a great help, but not much Lucene can do  
b/c it is application specific.

Document/field wise, I would probably have:

Then, when you search in the question field, you can also retrieve the  


On Mar 24, 2009, at 4:04 PM, MFM wrote:

> I have been able to successfully index and search text from structured
> documents like PDF and MS Word. I am having a real hard time trying to
> figure out how to group the index strings together e.g. if my  
> document had a
> question and answer in a table, the search will produce the text  
> with the
> question based on the keyword. How would I group or associate the  
> question
> and answer as part of the indexing ? I have tried using POI to read  
> thru the
> MS Word file and try and group them, but then it gets really intense  
> into
> pattern matching.
> Thanks
> -- 
> View this message in context:
> Sent from the Lucene - Java Users mailing list archive at
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message