lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hannes Carl Meyer <>
Subject Re: Text classification with Solr
Date Wed, 28 Jan 2009 10:22:36 GMT
>From my past projects, our Lucene classification corpus looked like this:

0|document text...|categoryA
1|document text...|categoryB
2|document text...|categoryA
3|document text...|categoryA
800|document text...|categoryC

With the faceting capabilities of Solr it is now possible to design more
dimensions of categories/taxonomies in a corpus with a minimal impact (?) on
computation time! Plus the configuration of synonyms in Solr configuration.

Like the idea to use Solr!

On Wed, Jan 28, 2009 at 7:57 AM, Neal Richter <> wrote:

> On Tue, Jan 27, 2009 at 2:21 PM, Grant Ingersoll <>
> wrote:
> > One of the things I am interested in is the marriage of Solr and Mahout
> > (which has some Genetic Algorithms support) and other ML (Weka, etc.)
> tools.
>  [snip]
> I love it, good to know you are thinking big here.  Here's another big
> thought:
> .. but assume we want
> to extract this type of structure from the full text of Wikipedia
> rather than the narrow categories DB.
> > Things that can help with all this:  LukeReqHandler, TermVectorComponent,
> > TermsComponent, others
> >
> [snip]
> > Neal, what did you have in mind for a JIRA issue?  I'd love to see a
> patch.
> More research needed, but the initial idea would be to enable the
> passing in of a weighted term vector as a query and allowing a
> more-like-this type search on it.  Anyone attempt this yet?
> Interesting point about faceting here is that it would give outgoing
> feedback on what  /new/ words (not in initial query) that if added to
> the query would result in additional discrimination between the
> matched categories.
> So Solr outputs a set of categories for a document, and also emits a
> set of related words to the initial query!  Categorization and
> recommendation in one.
> - Neal

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message