lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aida Hota <hota.a...@gmail.com>
Subject Re: Calculate Term Co-occurrence Matrix
Date Mon, 23 Aug 2010 17:36:09 GMT
Hi Ivan thanx a lot for this. I just caught time to see this and reply,
sorry for bugging again, I appreciate already what you uploaded . I would
also like to ask one question, if you dont mind. If it is possible somehow
to get from this unified list of frequently occuring unigrams, bigrams and
trigrams with their frequencies????

Thank you very much


On Mon, Aug 23, 2010 at 3:22 PM, Ivan Provalov <iprovalo@yahoo.com> wrote:

> Ahmed, if you want the raw score, you can do it the way you describe below.
>
>
>
> --- On Sun, 8/22/10, ahmed algohary <algoharyalex@gmail.com> wrote:
>
> > From: ahmed algohary <algoharyalex@gmail.com>
> > Subject: Re: Calculate Term Co-occurrence Matrix
> > To: java-user@lucene.apache.org
> > Date: Sunday, August 22, 2010, 9:27 AM
> > I think I got it.
> >
> > In the CollectionIndexer class, I have added the
> > co-occurrence score to the
> > index document:
> >
> >  doc.add(new Field("score", collocation.getScore() + "",
> >
> > Field.Store.YES, Field.Index.NOT_ANALYZED));
> >
> > then in the CollectionSearcher, the scores can be
> > retrieved:
> >
> >  d.get("score")
> >
> > Is that correct ??
> >
> > On Sun, Aug 22, 2010 at 2:47 PM, ahmed algohary <algoharyalex@gmail.com
> >wrote:
> >
> > > Thanks! It is exactly what I need. But, isn't there a
> > way to get the
> > > matching score ?
> > >
> > > for example, "damaged"  co-occurs with "shipment"
> > with a probability = 0.4
> > > ??
> > >
> > >
> > > On Sun, Aug 22, 2010 at 5:35 AM, Ivan Provalov <iprovalo@yahoo.com>
> > wrote:
> > >
> > >> Ahmed,
> > >>
> > >> FYI, I updated the term collocations package I
> > mentioned earlier with a
> > >> few fixes and changes which will make it work for
> > Lucene 3.0.2.  This may
> > >> help your task.
> > >>
> > >> See:
> > >> https://issues.apache.org/jira/browse/LUCENE-474
> > >>
> > >> Thanks,
> > >>
> > >> Ivan Provalov
> > >>
> > >>
> > >> --- On Sat, 8/21/10, Otis Gospodnetic <otis_gospodnetic@yahoo.com>
> > wrote:
> > >>
> > >> > From: Otis Gospodnetic <otis_gospodnetic@yahoo.com>
> > >> > Subject: Re: Calculate Term Co-occurrence
> > Matrix
> > >> > To: java-user@lucene.apache.org
> > >> > Date: Saturday, August 21, 2010, 8:05 AM
> > >> > Ahmed,
> > >> >
> > >> > That's what that KPE (link in my previous
> > email, below)
> > >> > will do for you.  It's
> > >> > not open source at this time, but that is
> > exactly one of
> > >> > the things it does.  I
> > >> > think Mahout collocations stuff might work
> > for you, too.
> > >> >
> > >> > Otis
> > >> > ----
> > >> > Sematext :: http://sematext.com/ :: Solr - Lucene -
> > Nutch
> > >> > Lucene ecosystem search :: http://search-lucene.com/
> > >> >
> > >> >
> > >> >
> > >> > ----- Original Message ----
> > >> > > From: ahmed algohary <algoharyalex@gmail.com>
> > >> > > To: java-user@lucene.apache.org
> > >> > > Sent: Sat, August 21, 2010 7:20:03 AM
> > >> > > Subject: Re: Calculate Term
> > Co-occurrence Matrix
> > >> > >
> > >> > > Thanks for all your answers!
> > >> > >
> > >> > > it seems like I did not make my
> > question  clear.
> > >> > I have a text corpus and I
> > >> > > need to determine the pairs of words
> > that  occur
> > >> > together in many documents.
> > >> > > I need to do that to be able to measure
> > the
> > >> > semantic proximity between
> > >> > > words. This method is expanded
> > >> > > here<http://forums.searchenginewatch.com/showthread.php?t=48>.
> > >> > > I hope to  find some code that
> > given a text
> > >> > corpus, generate all the words
> > >> > > pairs with  their probability of
> > occurring
> > >> > together.
> > >> > >
> > >> > >
> > >> > > On Sat, Aug 21, 2010 at 1:46  AM,
> > Otis
> > >> > Gospodnetic <
> > >> > > otis_gospodnetic@yahoo.com>
> > >> > wrote:
> > >> > >
> > >> > > > There is also a non-Mahout Key
> > Phrase Extractor
> > >> > for  Collocations, SIPs, and
> > >> > > > a
> > >> > > > few other things:
> > >> > > > http://sematext.com/products/key-phrase-extractor/index.html
> > >> > > >
> > >> > > >  One of the demos that uses
> > news data is at
> > >> > > > http://sematext.com/demo/kpe/index.html
> > >> > > >
> > >> > > > Otis
> > >> > > >  ----
> > >> > > > Sematext :: http://sematext.com/ :: Solr - Lucene -
> > >> > Nutch
> > >> > > > Lucene ecosystem  search :: http://search-lucene.com/
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > > ----- Original  Message ----
> > >> > > > > From: Grant Ingersoll <gsingers@apache.org>
> > >> > > > > To: java-user@lucene.apache.org
> > >> > > >  > Sent: Fri, August 20,
> > 2010 8:52:17 AM
> > >> > > > > Subject: Re: Calculate
> > Term
> > >> > Co-occurrence Matrix
> > >> > > > >
> > >> > > > > You might also be
> > interested  in
> > >> > Mahout's collocations package:
> > >> > > > >http://cwiki.apache.org/confluence/display/MAHOUT/Collocations
> > >> > > >  >
> > >> > > > > -Grant
> > >> > > > > On  Aug 19, 2010, at
> > 11:39 AM,
> > >> > ahmed  algohary wrote:
> > >> > > > >
> > >> > > > > > Hi all,
> > >> > > > > >
> > >> > > >  > > I need to know if
> > there is a
> > >> > Lucene plug-in or a Lucene-based
> > API  for
> > >> > > > > > calculating the term
> > co-occurrence
> > >> > matrix for a  given text  corpus.
> > >> > > > > >
> > >> > > > > > Thanks!
> > >> > > >  > >
> > >> > > > > > --
> > >> > > > > >  Ahmed
> > >> > > >  >
> > >> > > > > --------------------------
> > >> > > > > Grant  Ingersoll
> > >> > > > > http://www.lucidimagination.com/
> > >> > > > >
> > >> > > > > Search the  Lucene
> > ecosystem
> > >> > using  Solr/Lucene:
> > >> > > > >http://www.lucidimagination.com/search
> > >> > > > >
> > >> > > > >
> > >> > > >  >
> > >> >
> > ---------------------------------------------------------------------
> > >> > > >  > To  unsubscribe,
> > e-mail: java-user-unsubscribe@lucene.apache.org
> > >> > > >  > For  additional
> > commands, e-mail:
> > >> > java-user-help@lucene.apache.org
> > >> > > >  >
> > >> > > > >
> > >> > > >
> > >> > > >
> > >> >
> > ---------------------------------------------------------------------
> > >> > > > To  unsubscribe, e-mail:
> java-user-unsubscribe@lucene.apache.org
> > >> > > >  For additional commands,
> > e-mail: java-user-help@lucene.apache.org
> > >> > > >
> > >> > > >
> > >> > >
> > >> >
> > >> >
> > ---------------------------------------------------------------------
> > >> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > >> > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >> >
> > >> >
> > >>
> > >>
> > >>
> > >>
> > >>
> > ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > >> For additional commands, e-mail: java-user-help@lucene.apache.org
> > >>
> > >>
> > >
> >
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message