lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ivan Provalov <iprov...@yahoo.com>
Subject Re: Calculate Term Co-occurrence Matrix
Date Sun, 22 Aug 2010 13:56:50 GMT
Ahmed,

Instead, I would use the score coming out of the CollocationSearcher class. I changed it a
bit to return the LinkedHashMap of collocated terms and their scores relative to the term
used in the query.  I have attached the new version.

Thanks,

IP



--- On Sun, 8/22/10, ahmed algohary <algoharyalex@gmail.com> wrote:

> From: ahmed algohary <algoharyalex@gmail.com>
> Subject: Re: Calculate Term Co-occurrence Matrix
> To: java-user@lucene.apache.org
> Date: Sunday, August 22, 2010, 9:27 AM
> I think I got it.
> 
> In the CollectionIndexer class, I have added the
> co-occurrence score to the
> index document:
> 
>  doc.add(new Field("score", collocation.getScore() + "",
>                
> Field.Store.YES, Field.Index.NOT_ANALYZED));
> 
> then in the CollectionSearcher, the scores can be
> retrieved:
> 
>  d.get("score")
> 
> Is that correct ??
> 
> On Sun, Aug 22, 2010 at 2:47 PM, ahmed algohary <algoharyalex@gmail.com>wrote:
> 
> > Thanks! It is exactly what I need. But, isn't there a
> way to get the
> > matching score ?
> >
> > for example, "damaged"  co-occurs with "shipment"
> with a probability = 0.4
> > ??
> >
> >
> > On Sun, Aug 22, 2010 at 5:35 AM, Ivan Provalov <iprovalo@yahoo.com>
> wrote:
> >
> >> Ahmed,
> >>
> >> FYI, I updated the term collocations package I
> mentioned earlier with a
> >> few fixes and changes which will make it work for
> Lucene 3.0.2.  This may
> >> help your task.
> >>
> >> See:
> >> https://issues.apache.org/jira/browse/LUCENE-474
> >>
> >> Thanks,
> >>
> >> Ivan Provalov
> >>
> >>
> >> --- On Sat, 8/21/10, Otis Gospodnetic <otis_gospodnetic@yahoo.com>
> wrote:
> >>
> >> > From: Otis Gospodnetic <otis_gospodnetic@yahoo.com>
> >> > Subject: Re: Calculate Term Co-occurrence
> Matrix
> >> > To: java-user@lucene.apache.org
> >> > Date: Saturday, August 21, 2010, 8:05 AM
> >> > Ahmed,
> >> >
> >> > That's what that KPE (link in my previous
> email, below)
> >> > will do for you.  It's
> >> > not open source at this time, but that is
> exactly one of
> >> > the things it does.  I
> >> > think Mahout collocations stuff might work
> for you, too.
> >> >
> >> > Otis
> >> > ----
> >> > Sematext :: http://sematext.com/ :: Solr - Lucene -
> Nutch
> >> > Lucene ecosystem search :: http://search-lucene.com/
> >> >
> >> >
> >> >
> >> > ----- Original Message ----
> >> > > From: ahmed algohary <algoharyalex@gmail.com>
> >> > > To: java-user@lucene.apache.org
> >> > > Sent: Sat, August 21, 2010 7:20:03 AM
> >> > > Subject: Re: Calculate Term
> Co-occurrence Matrix
> >> > >
> >> > > Thanks for all your answers!
> >> > >
> >> > > it seems like I did not make my
> question  clear.
> >> > I have a text corpus and I
> >> > > need to determine the pairs of words
> that  occur
> >> > together in many documents.
> >> > > I need to do that to be able to measure
> the
> >> > semantic proximity between
> >> > > words. This method is expanded
> >> > > here<http://forums.searchenginewatch.com/showthread.php?t=48>.
> >> > > I hope to  find some code that
> given a text
> >> > corpus, generate all the words
> >> > > pairs with  their probability of
> occurring
> >> > together.
> >> > >
> >> > >
> >> > > On Sat, Aug 21, 2010 at 1:46  AM,
> Otis
> >> > Gospodnetic <
> >> > > otis_gospodnetic@yahoo.com>
> >> > wrote:
> >> > >
> >> > > > There is also a non-Mahout Key
> Phrase Extractor
> >> > for  Collocations, SIPs, and
> >> > > > a
> >> > > > few other things:
> >> > > > http://sematext.com/products/key-phrase-extractor/index.html
> >> > > >
> >> > > >  One of the demos that uses
> news data is at
> >> > > > http://sematext.com/demo/kpe/index.html
> >> > > >
> >> > > > Otis
> >> > > >  ----
> >> > > > Sematext :: http://sematext.com/ :: Solr - Lucene -
> >> > Nutch
> >> > > > Lucene ecosystem  search :: http://search-lucene.com/
> >> > > >
> >> > > >
> >> > > >
> >> > > > ----- Original  Message ----
> >> > > > > From: Grant Ingersoll <gsingers@apache.org>
> >> > > > > To: java-user@lucene.apache.org
> >> > > >  > Sent: Fri, August 20,
> 2010 8:52:17 AM
> >> > > > > Subject: Re: Calculate 
> Term
> >> > Co-occurrence Matrix
> >> > > > >
> >> > > > > You might also be
> interested  in
> >> > Mahout's collocations package:
> >> > > > >http://cwiki.apache.org/confluence/display/MAHOUT/Collocations
> >> > > >  >
> >> > > > > -Grant
> >> > > > > On  Aug 19, 2010, at
> 11:39 AM,
> >> > ahmed  algohary wrote:
> >> > > > >
> >> > > > > > Hi all,
> >> > > > > >
> >> > > >  > > I need to know if
> there is a
> >> > Lucene plug-in or a Lucene-based 
> API  for
> >> > > > > > calculating the term
> co-occurrence
> >> > matrix for a  given text  corpus.
> >> > > > > >
> >> > > > > > Thanks!
> >> > > >  > >
> >> > > > > > --
> >> > > > > >  Ahmed
> >> > > >  >
> >> > > > > --------------------------
> >> > > > > Grant  Ingersoll
> >> > > > > http://www.lucidimagination.com/
> >> > > > >
> >> > > > > Search the  Lucene
> ecosystem
> >> > using  Solr/Lucene:
> >> > > > >http://www.lucidimagination.com/search
> >> > > > >
> >> > > > >
> >> > > >  >
> >> >
> ---------------------------------------------------------------------
> >> > > >  > To  unsubscribe,
> e-mail: java-user-unsubscribe@lucene.apache.org
> >> > > >  > For  additional
> commands, e-mail:
> >> > java-user-help@lucene.apache.org
> >> > > >  >
> >> > > > >
> >> > > >
> >> > > >
> >> >
> ---------------------------------------------------------------------
> >> > > > To  unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> > > >  For additional commands,
> e-mail: java-user-help@lucene.apache.org
> >> > > >
> >> > > >
> >> > >
> >> >
> >> >
> ---------------------------------------------------------------------
> >> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >> >
> >> >
> >>
> >>
> >>
> >>
> >>
> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
> >
> 


      

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message