lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tommaso Teofili (JIRA)" <>
Subject [jira] [Commented] (LUCENE-4345) Create a Classification module
Date Tue, 11 Sep 2012 13:05:08 GMT


Tommaso Teofili commented on LUCENE-4345:

Thanks Lance for your useful insights, I'll definitely have a look :) .

bq. If you use index data which is already analyzed with the same analyzer as your test (unseen)
documents, you can use a lot more documents as input. More is better. As the training data
increases, signal drives out noise.

I agree, we could leverage this for sure.

bq. Once you add the ability to store & load models, training speed becomes less important.

Regarding storing and loading models, the base intuition (at least my intuition :P) in the
case of Lucene is that the index itself plays that role.
> Create a Classification module
> ------------------------------
>                 Key: LUCENE-4345
>                 URL:
>             Project: Lucene - Core
>          Issue Type: New Feature
>            Reporter: Tommaso Teofili
>            Assignee: Tommaso Teofili
>            Priority: Minor
>         Attachments: LUCENE-4345_2.patch, LUCENE-4345.patch, SOLR-3700_2.patch, SOLR-3700.patch
> Lucene/Solr can host huge sets of documents containing lots of information in fields
so that these can be used as training examples (w/ features) in order to very quickly create
classifiers algorithms to use on new documents and / or to provide an additional service.
> So the idea is to create a contrib module (called 'classification') to host a ClassificationComponent
that will use already seen data (the indexed documents / fields) to classify new documents
/ text fragments.
> The first version will contain a (simplistic) Lucene based Naive Bayes classifier but
more implementations should be added in the future.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message