lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aida Hota <hota.a...@gmail.com>
Subject Re: Calculate Term Co-occurrence Matrix
Date Thu, 26 Aug 2010 09:03:11 GMT
ok, thank you Ivan!!

On Tue, Aug 24, 2010 at 5:13 PM, Ivan Provalov <iprovalo@yahoo.com> wrote:

> Aida,
>
> Right now it will do two term collocation only.
>
> Ivan
>
>
> --- On Mon, 8/23/10, Aida Hota <hota.aida@gmail.com> wrote:
>
> > From: Aida Hota <hota.aida@gmail.com>
> > Subject: Re: Calculate Term Co-occurrence Matrix
> > To: java-user@lucene.apache.org
> > Date: Monday, August 23, 2010, 1:36 PM
> > Hi Ivan thanx a lot for this. I just
> > caught time to see this and reply,
> > sorry for bugging again, I appreciate already what you
> > uploaded . I would
> > also like to ask one question, if you dont mind. If it is
> > possible somehow
> > to get from this unified list of frequently occuring
> > unigrams, bigrams and
> > trigrams with their frequencies????
> >
> > Thank you very much
> >
> >
> > On Mon, Aug 23, 2010 at 3:22 PM, Ivan Provalov <iprovalo@yahoo.com>
> > wrote:
> >
> > > Ahmed, if you want the raw score, you can do it the
> > way you describe below.
> > >
> > >
> > >
> > > --- On Sun, 8/22/10, ahmed algohary <algoharyalex@gmail.com>
> > wrote:
> > >
> > > > From: ahmed algohary <algoharyalex@gmail.com>
> > > > Subject: Re: Calculate Term Co-occurrence Matrix
> > > > To: java-user@lucene.apache.org
> > > > Date: Sunday, August 22, 2010, 9:27 AM
> > > > I think I got it.
> > > >
> > > > In the CollectionIndexer class, I have added the
> > > > co-occurrence score to the
> > > > index document:
> > > >
> > > >  doc.add(new Field("score",
> > collocation.getScore() + "",
> > > >
> > > > Field.Store.YES, Field.Index.NOT_ANALYZED));
> > > >
> > > > then in the CollectionSearcher, the scores can
> > be
> > > > retrieved:
> > > >
> > > >  d.get("score")
> > > >
> > > > Is that correct ??
> > > >
> > > > On Sun, Aug 22, 2010 at 2:47 PM, ahmed algohary
> > <algoharyalex@gmail.com
> > > >wrote:
> > > >
> > > > > Thanks! It is exactly what I need. But,
> > isn't there a
> > > > way to get the
> > > > > matching score ?
> > > > >
> > > > > for example, "damaged"  co-occurs with
> > "shipment"
> > > > with a probability = 0.4
> > > > > ??
> > > > >
> > > > >
> > > > > On Sun, Aug 22, 2010 at 5:35 AM, Ivan
> > Provalov <iprovalo@yahoo.com>
> > > > wrote:
> > > > >
> > > > >> Ahmed,
> > > > >>
> > > > >> FYI, I updated the term collocations
> > package I
> > > > mentioned earlier with a
> > > > >> few fixes and changes which will make it
> > work for
> > > > Lucene 3.0.2.  This may
> > > > >> help your task.
> > > > >>
> > > > >> See:
> > > > >> https://issues.apache.org/jira/browse/LUCENE-474
> > > > >>
> > > > >> Thanks,
> > > > >>
> > > > >> Ivan Provalov
> > > > >>
> > > > >>
> > > > >> --- On Sat, 8/21/10, Otis Gospodnetic
> > <otis_gospodnetic@yahoo.com>
> > > > wrote:
> > > > >>
> > > > >> > From: Otis Gospodnetic <otis_gospodnetic@yahoo.com>
> > > > >> > Subject: Re: Calculate Term
> > Co-occurrence
> > > > Matrix
> > > > >> > To: java-user@lucene.apache.org
> > > > >> > Date: Saturday, August 21, 2010,
> > 8:05 AM
> > > > >> > Ahmed,
> > > > >> >
> > > > >> > That's what that KPE (link in my
> > previous
> > > > email, below)
> > > > >> > will do for you.  It's
> > > > >> > not open source at this time, but
> > that is
> > > > exactly one of
> > > > >> > the things it does.  I
> > > > >> > think Mahout collocations stuff
> > might work
> > > > for you, too.
> > > > >> >
> > > > >> > Otis
> > > > >> > ----
> > > > >> > Sematext :: http://sematext.com/ :: Solr - Lucene -
> > > > Nutch
> > > > >> > Lucene ecosystem search :: http://search-lucene.com/
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> > ----- Original Message ----
> > > > >> > > From: ahmed algohary <algoharyalex@gmail.com>
> > > > >> > > To: java-user@lucene.apache.org
> > > > >> > > Sent: Sat, August 21, 2010
> > 7:20:03 AM
> > > > >> > > Subject: Re: Calculate Term
> > > > Co-occurrence Matrix
> > > > >> > >
> > > > >> > > Thanks for all your answers!
> > > > >> > >
> > > > >> > > it seems like I did not make
> > my
> > > > question  clear.
> > > > >> > I have a text corpus and I
> > > > >> > > need to determine the pairs of
> > words
> > > > that  occur
> > > > >> > together in many documents.
> > > > >> > > I need to do that to be able
> > to measure
> > > > the
> > > > >> > semantic proximity between
> > > > >> > > words. This method is
> > expanded
> > > > >> > > here<http://forums.searchenginewatch.com/showthread.php?t=48
> >.
> > > > >> > > I hope to  find some code
> > that
> > > > given a text
> > > > >> > corpus, generate all the words
> > > > >> > > pairs with  their
> > probability of
> > > > occurring
> > > > >> > together.
> > > > >> > >
> > > > >> > >
> > > > >> > > On Sat, Aug 21, 2010 at
> > 1:46  AM,
> > > > Otis
> > > > >> > Gospodnetic <
> > > > >> > > otis_gospodnetic@yahoo.com>
> > > > >> > wrote:
> > > > >> > >
> > > > >> > > > There is also a
> > non-Mahout Key
> > > > Phrase Extractor
> > > > >> > for  Collocations, SIPs, and
> > > > >> > > > a
> > > > >> > > > few other things:
> > > > >> > > >
> http://sematext.com/products/key-phrase-extractor/index.html
> > > > >> > > >
> > > > >> > > >  One of the demos
> > that uses
> > > > news data is at
> > > > >> > > > http://sematext.com/demo/kpe/index.html
> > > > >> > > >
> > > > >> > > > Otis
> > > > >> > > >  ----
> > > > >> > > > Sematext :: http://sematext.com/ :: Solr - Lucene
-
> > > > >> > Nutch
> > > > >> > > > Lucene ecosystem
> > search :: http://search-lucene.com/
> > > > >> > > >
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > ----- Original
> > Message ----
> > > > >> > > > > From: Grant
> > Ingersoll <gsingers@apache.org>
> > > > >> > > > > To: java-user@lucene.apache.org
> > > > >> > > >  > Sent: Fri,
> > August 20,
> > > > 2010 8:52:17 AM
> > > > >> > > > > Subject: Re:
> > Calculate
> > > > Term
> > > > >> > Co-occurrence Matrix
> > > > >> > > > >
> > > > >> > > > > You might also be
> > > > interested  in
> > > > >> > Mahout's collocations package:
> > > > >> > > > >
> http://cwiki.apache.org/confluence/display/MAHOUT/Collocations
> > > > >> > > >  >
> > > > >> > > > > -Grant
> > > > >> > > > > On  Aug 19,
> > 2010, at
> > > > 11:39 AM,
> > > > >> > ahmed  algohary wrote:
> > > > >> > > > >
> > > > >> > > > > > Hi all,
> > > > >> > > > > >
> > > > >> > > >  > > I need to
> > know if
> > > > there is a
> > > > >> > Lucene plug-in or a Lucene-based
> > > > API  for
> > > > >> > > > > > calculating the
> > term
> > > > co-occurrence
> > > > >> > matrix for a  given text
> > corpus.
> > > > >> > > > > >
> > > > >> > > > > > Thanks!
> > > > >> > > >  > >
> > > > >> > > > > > --
> > > > >> > > > > >  Ahmed
> > > > >> > > >  >
> > > > >> > > > >
> > --------------------------
> > > > >> > > > > Grant
> > Ingersoll
> > > > >> > > > > http://www.lucidimagination.com/
> > > > >> > > > >
> > > > >> > > > > Search the
> > Lucene
> > > > ecosystem
> > > > >> > using  Solr/Lucene:
> > > > >> > > > >http://www.lucidimagination.com/search
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > >  >
> > > > >> >
> > > >
> > ---------------------------------------------------------------------
> > > > >> > > >  > To
> > unsubscribe,
> > > > e-mail: java-user-unsubscribe@lucene.apache.org
> > > > >> > > >  > For
> > additional
> > > > commands, e-mail:
> > > > >> > java-user-help@lucene.apache.org
> > > > >> > > >  >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > > >
> > > > >> >
> > > >
> > ---------------------------------------------------------------------
> > > > >> > > > To  unsubscribe,
> > e-mail:
> > > java-user-unsubscribe@lucene.apache.org
> > > > >> > > >  For additional
> > commands,
> > > > e-mail: java-user-help@lucene.apache.org
> > > > >> > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >> >
> > > >
> > ---------------------------------------------------------------------
> > > > >> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > >> > For additional commands, e-mail:
> java-user-help@lucene.apache.org
> > > > >> >
> > > > >> >
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > >
> > ---------------------------------------------------------------------
> > > > >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > >> For additional commands, e-mail: java-user-help@lucene.apache.org
> > > > >>
> > > > >>
> > > > >
> > > >
> > >
> > >
> > >
> > >
> > >
> > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> >
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message