incubator-lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <>
Subject Re: MoreLikeThisQuery
Date Tue, 16 Mar 2010 14:02:54 GMT
On Tue, Mar 16, 2010 at 09:01:05AM -0400, Robert Muir wrote:
> On Tue, Mar 16, 2010 at 1:17 AM, Marvin Humphrey <> wrote:
> > What I'd like to do is identify the cluster that best represents the document,
> > and exclude any terms outside of that cluster when building the
> > MoreLikeThisQuery.
> >
> > What kind of a data structure would we need to achieve that?

> Marvin, I use this for query expansion purposes, so if you have any
> ideas (even very slow ones) you want to test, I'd be happy to help
> with some relevance-testing gruntwork.

Even very slow ones, eh?  How about one that requires gobs of RAM?

This idea actually came out of a conversation I had with someone at an San
Diego Ruby Users meeting who used to work on the OpenCyc classification
engine.  From what I understand, the Cyc project is an AI project that sits on
top of a kind of Yahoo directory or DMOZ for words.  Apparently it has a Java
API and requires several GB of RAM to load.

His suggestion was to use OpenCyc to classify terms.

That's similar to what we'd do with topic vectors generated by an indexing
component, except that the Cyc topic vectors were built laboriously by hand
rather than using automatic dimension reduction.

Marvin Humphrey

View raw message