lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amin Mohammed-Coleman <ami...@gmail.com>
Subject Re: question about grouping text
Date Thu, 26 Mar 2009 07:54:59 GMT
Hi

I was wondering if soemthing like LingPipe or Gate (for text extraction)
might be an idea?  I've started looking at it and I'm just thinking it may
be applicable (I maybe wrong).

Cheers
Amin

On Wed, Mar 25, 2009 at 4:18 PM, Grant Ingersoll <gsingers@apache.org>wrote:

> Hi MFM,
>
> This comes down to a preprocessing step that you would have to do before
> putting into Lucene, although I suppose you might be able to identify it
> during analysis and use the TeeTokenFilter and the SinkTokenizer.  Once you
> do this, then you can add them as fields on a Document.  I know that's not a
> great help, but not much Lucene can do b/c it is application specific.
>
> Document/field wise, I would probably have:
> Document
>   question
>   answer
>
> Then, when you search in the question field, you can also retrieve the
> answer.
>
> -Grant
>
>
> On Mar 24, 2009, at 4:04 PM, MFM wrote:
>
>
>> I have been able to successfully index and search text from structured
>> documents like PDF and MS Word. I am having a real hard time trying to
>> figure out how to group the index strings together e.g. if my document had
>> a
>> question and answer in a table, the search will produce the text with the
>> question based on the keyword. How would I group or associate the question
>> and answer as part of the indexing ? I have tried using POI to read thru
>> the
>> MS Word file and try and group them, but then it gets really intense into
>> pattern matching.
>>
>> Thanks
>> MFM
>> --
>> View this message in context:
>> http://www.nabble.com/question-about-grouping-text-tp22682433p22682433.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message