lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: question about grouping text
Date Thu, 26 Mar 2009 15:26:19 GMT

Hi,

I'm not aware of anything in LingPipe that would do the Q&A part, though LP (and GATE)
may have the building blocks for what you need.  For example, they both must have sentence
boundary detection/sentence chunking, which might be one of the first sub-tasks you'd need
to do to begin finding/evaluating questions and answers.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: Amin Mohammed-Coleman <aminmc@gmail.com>
> To: java-user@lucene.apache.org
> Sent: Thursday, March 26, 2009 3:54:59 AM
> Subject: Re: question about grouping text
> 
> Hi
> 
> I was wondering if soemthing like LingPipe or Gate (for text extraction)
> might be an idea?  I've started looking at it and I'm just thinking it may
> be applicable (I maybe wrong).
> 
> Cheers
> Amin
> 
> On Wed, Mar 25, 2009 at 4:18 PM, Grant Ingersoll wrote:
> 
> > Hi MFM,
> >
> > This comes down to a preprocessing step that you would have to do before
> > putting into Lucene, although I suppose you might be able to identify it
> > during analysis and use the TeeTokenFilter and the SinkTokenizer.  Once you
> > do this, then you can add them as fields on a Document.  I know that's not a
> > great help, but not much Lucene can do b/c it is application specific.
> >
> > Document/field wise, I would probably have:
> > Document
> >   question
> >   answer
> >
> > Then, when you search in the question field, you can also retrieve the
> > answer.
> >
> > -Grant
> >
> >
> > On Mar 24, 2009, at 4:04 PM, MFM wrote:
> >
> >
> >> I have been able to successfully index and search text from structured
> >> documents like PDF and MS Word. I am having a real hard time trying to
> >> figure out how to group the index strings together e.g. if my document had
> >> a
> >> question and answer in a table, the search will produce the text with the
> >> question based on the keyword. How would I group or associate the question
> >> and answer as part of the indexing ? I have tried using POI to read thru
> >> the
> >> MS Word file and try and group them, but then it gets really intense into
> >> pattern matching.
> >>
> >> Thanks
> >> MFM
> >> --
> >> View this message in context:
> >> http://www.nabble.com/question-about-grouping-text-tp22682433p22682433.html
> >> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message