lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Neal Richter <nrich...@gmail.com>
Subject Re: Text classification with Solr
Date Mon, 26 Jan 2009 18:24:11 GMT
Thanks for the link Shalin... played with that a while back.. It's
possibly got some indirect possibilities.

On Mon, Jan 26, 2009 at 10:46 AM, Hannes Carl Meyer <mail@hcmeyer.com> wrote:
> I didn't understand, is the corpus of documents you want to use to classify
> fix?

Assume the 'documents' are not stored in the same index and I want to
only store the taxonomy or ontology in this index.

Instead of indexing documents about 'sports' and searching for hits
based upon 'basketball', 'football' etc.. I simply want to index the
taxonomy and classify documents into it.  This is a an ancient
AI/Data-Mining discipline.. but the standard methods of 'indexing' the
taxonomy are/were primitive compared to what one /could/ do with
something like Lucene.

Here's a 2007 research paper that used Lucene directly for
classification, but doing the inverse of what I described:
http://www.cs.ucl.ac.uk/staff/R.Hirsch/papers/gecco_HHS.pdf

>>>previously suggested procedure of 1) store document 2) execute
>>>more-like-this and 3) delete document would be too slow.
> Do you mean the document to classify?
> Why do you then want to put it into the index (very expensive), you just
> need the contents of it to build a query!

Exactly.. in the December Taxonomy thread Walter Underwood outlined a
store/classify/delete procedure.  Too slow if you have no need to
index the document itself.

- Neal

Mime
View raw message