lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ivan Provalov <iprov...@yahoo.com>
Subject Re: Calculate Term Co-occurrence Matrix
Date Mon, 23 Aug 2010 18:41:31 GMT
Aida,

Are you talking about letter n-grams or term n-grams?

Thanks,

Ivan

--- On Mon, 8/23/10, Aida Hota <hota.aida@gmail.com> wrote:

> From: Aida Hota <hota.aida@gmail.com>
> Subject: Re: Calculate Term Co-occurrence Matrix
> To: java-user@lucene.apache.org
> Date: Monday, August 23, 2010, 1:36 PM
> Hi Ivan thanx a lot for this. I just
> caught time to see this and reply,
> sorry for bugging again, I appreciate already what you
> uploaded . I would
> also like to ask one question, if you dont mind. If it is
> possible somehow
> to get from this unified list of frequently occuring
> unigrams, bigrams and
> trigrams with their frequencies????
> 
> Thank you very much
> 
> 
> On Mon, Aug 23, 2010 at 3:22 PM, Ivan Provalov <iprovalo@yahoo.com>
> wrote:
> 
> > Ahmed, if you want the raw score, you can do it the
> way you describe below.
> >
> >
> >
> > --- On Sun, 8/22/10, ahmed algohary <algoharyalex@gmail.com>
> wrote:
> >
> > > From: ahmed algohary <algoharyalex@gmail.com>
> > > Subject: Re: Calculate Term Co-occurrence Matrix
> > > To: java-user@lucene.apache.org
> > > Date: Sunday, August 22, 2010, 9:27 AM
> > > I think I got it.
> > >
> > > In the CollectionIndexer class, I have added the
> > > co-occurrence score to the
> > > index document:
> > >
> > >  doc.add(new Field("score",
> collocation.getScore() + "",
> > >
> > > Field.Store.YES, Field.Index.NOT_ANALYZED));
> > >
> > > then in the CollectionSearcher, the scores can
> be
> > > retrieved:
> > >
> > >  d.get("score")
> > >
> > > Is that correct ??
> > >
> > > On Sun, Aug 22, 2010 at 2:47 PM, ahmed algohary
> <algoharyalex@gmail.com
> > >wrote:
> > >
> > > > Thanks! It is exactly what I need. But,
> isn't there a
> > > way to get the
> > > > matching score ?
> > > >
> > > > for example, "damaged"  co-occurs with
> "shipment"
> > > with a probability = 0.4
> > > > ??
> > > >
> > > >
> > > > On Sun, Aug 22, 2010 at 5:35 AM, Ivan
> Provalov <iprovalo@yahoo.com>
> > > wrote:
> > > >
> > > >> Ahmed,
> > > >>
> > > >> FYI, I updated the term collocations
> package I
> > > mentioned earlier with a
> > > >> few fixes and changes which will make it
> work for
> > > Lucene 3.0.2.  This may
> > > >> help your task.
> > > >>
> > > >> See:
> > > >> https://issues.apache.org/jira/browse/LUCENE-474
> > > >>
> > > >> Thanks,
> > > >>
> > > >> Ivan Provalov
> > > >>
> > > >>
> > > >> --- On Sat, 8/21/10, Otis Gospodnetic
> <otis_gospodnetic@yahoo.com>
> > > wrote:
> > > >>
> > > >> > From: Otis Gospodnetic <otis_gospodnetic@yahoo.com>
> > > >> > Subject: Re: Calculate Term
> Co-occurrence
> > > Matrix
> > > >> > To: java-user@lucene.apache.org
> > > >> > Date: Saturday, August 21, 2010,
> 8:05 AM
> > > >> > Ahmed,
> > > >> >
> > > >> > That's what that KPE (link in my
> previous
> > > email, below)
> > > >> > will do for you.  It's
> > > >> > not open source at this time, but
> that is
> > > exactly one of
> > > >> > the things it does.  I
> > > >> > think Mahout collocations stuff
> might work
> > > for you, too.
> > > >> >
> > > >> > Otis
> > > >> > ----
> > > >> > Sematext :: http://sematext.com/ :: Solr - Lucene -
> > > Nutch
> > > >> > Lucene ecosystem search :: http://search-lucene.com/
> > > >> >
> > > >> >
> > > >> >
> > > >> > ----- Original Message ----
> > > >> > > From: ahmed algohary <algoharyalex@gmail.com>
> > > >> > > To: java-user@lucene.apache.org
> > > >> > > Sent: Sat, August 21, 2010
> 7:20:03 AM
> > > >> > > Subject: Re: Calculate Term
> > > Co-occurrence Matrix
> > > >> > >
> > > >> > > Thanks for all your answers!
> > > >> > >
> > > >> > > it seems like I did not make
> my
> > > question  clear.
> > > >> > I have a text corpus and I
> > > >> > > need to determine the pairs of
> words
> > > that  occur
> > > >> > together in many documents.
> > > >> > > I need to do that to be able
> to measure
> > > the
> > > >> > semantic proximity between
> > > >> > > words. This method is
> expanded
> > > >> > > here<http://forums.searchenginewatch.com/showthread.php?t=48>.
> > > >> > > I hope to  find some code
> that
> > > given a text
> > > >> > corpus, generate all the words
> > > >> > > pairs with  their
> probability of
> > > occurring
> > > >> > together.
> > > >> > >
> > > >> > >
> > > >> > > On Sat, Aug 21, 2010 at
> 1:46  AM,
> > > Otis
> > > >> > Gospodnetic <
> > > >> > > otis_gospodnetic@yahoo.com>
> > > >> > wrote:
> > > >> > >
> > > >> > > > There is also a
> non-Mahout Key
> > > Phrase Extractor
> > > >> > for  Collocations, SIPs, and
> > > >> > > > a
> > > >> > > > few other things:
> > > >> > > > http://sematext.com/products/key-phrase-extractor/index.html
> > > >> > > >
> > > >> > > >  One of the demos
> that uses
> > > news data is at
> > > >> > > > http://sematext.com/demo/kpe/index.html
> > > >> > > >
> > > >> > > > Otis
> > > >> > > >  ----
> > > >> > > > Sematext :: http://sematext.com/ :: Solr - Lucene -
> > > >> > Nutch
> > > >> > > > Lucene ecosystem 
> search :: http://search-lucene.com/
> > > >> > > >
> > > >> > > >
> > > >> > > >
> > > >> > > > ----- Original 
> Message ----
> > > >> > > > > From: Grant
> Ingersoll <gsingers@apache.org>
> > > >> > > > > To: java-user@lucene.apache.org
> > > >> > > >  > Sent: Fri,
> August 20,
> > > 2010 8:52:17 AM
> > > >> > > > > Subject: Re:
> Calculate
> > > Term
> > > >> > Co-occurrence Matrix
> > > >> > > > >
> > > >> > > > > You might also be
> > > interested  in
> > > >> > Mahout's collocations package:
> > > >> > > > >http://cwiki.apache.org/confluence/display/MAHOUT/Collocations
> > > >> > > >  >
> > > >> > > > > -Grant
> > > >> > > > > On  Aug 19,
> 2010, at
> > > 11:39 AM,
> > > >> > ahmed  algohary wrote:
> > > >> > > > >
> > > >> > > > > > Hi all,
> > > >> > > > > >
> > > >> > > >  > > I need to
> know if
> > > there is a
> > > >> > Lucene plug-in or a Lucene-based
> > > API  for
> > > >> > > > > > calculating the
> term
> > > co-occurrence
> > > >> > matrix for a  given text 
> corpus.
> > > >> > > > > >
> > > >> > > > > > Thanks!
> > > >> > > >  > >
> > > >> > > > > > --
> > > >> > > > > >  Ahmed
> > > >> > > >  >
> > > >> > > > >
> --------------------------
> > > >> > > > > Grant 
> Ingersoll
> > > >> > > > > http://www.lucidimagination.com/
> > > >> > > > >
> > > >> > > > > Search the 
> Lucene
> > > ecosystem
> > > >> > using  Solr/Lucene:
> > > >> > > > >http://www.lucidimagination.com/search
> > > >> > > > >
> > > >> > > > >
> > > >> > > >  >
> > > >> >
> > >
> ---------------------------------------------------------------------
> > > >> > > >  > To 
> unsubscribe,
> > > e-mail: java-user-unsubscribe@lucene.apache.org
> > > >> > > >  > For 
> additional
> > > commands, e-mail:
> > > >> > java-user-help@lucene.apache.org
> > > >> > > >  >
> > > >> > > > >
> > > >> > > >
> > > >> > > >
> > > >> >
> > >
> ---------------------------------------------------------------------
> > > >> > > > To  unsubscribe,
> e-mail:
> > java-user-unsubscribe@lucene.apache.org
> > > >> > > >  For additional
> commands,
> > > e-mail: java-user-help@lucene.apache.org
> > > >> > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >> >
> > >
> ---------------------------------------------------------------------
> > > >> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > >> > For additional commands, e-mail: java-user-help@lucene.apache.org
> > > >> >
> > > >> >
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > >
> ---------------------------------------------------------------------
> > > >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > >> For additional commands, e-mail: java-user-help@lucene.apache.org
> > > >>
> > > >>
> > > >
> > >
> >
> >
> >
> >
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> 


      

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message