lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Bennett <mark.benn...@lucidworks.com>
Subject Re: Train Lucene with topic-defined files
Date Tue, 17 Jun 2014 16:40:55 GMT
Benglish,

I'm not sure I understand your requirements, but perhaps you could use a Naive Bayes classifier?
https://en.wikipedia.org/wiki/Naive_Bayes_classifier

Typical Bayes separates into Yes/No (spam detection, etc), but can be extended to N-categories.

Lucene provides access to the words it has indexed in your documents.  You could feed those
to a classifier for training.

A quick Google Search brought this back, perhaps it would get you started:
http://lucene.apache.org/core/4_8_1/classification/org/apache/lucene/classification/SimpleNaiveBayesClassifier.html

They also have a KNearestNeighbor version, see the implementers link here:
http://lucene.apache.org/core/4_8_1/classification/org/apache/lucene/classification/Classifier.html

You might also want to consider Solr, which is a layer on top of Lucene.

--
Mark Bennett / LucidWorks: Search & Big Data / mark.bennett@lucidworks.com
Office: 408-898-4201 / Telecommute: 408-733-0387 / Cell: 408-829-6513

On Jun 15, 2014, at 10:37 PM, benglish <behzadrezaie69@yahoo.com> wrote:

> Hi pals,
> 
> I have a huge number of text files with defined tagged topics. What I am
> going to do is to tag the test files due to those pre-tagged files.
> Searching on the Net, I couldn't find my answer: Is it possible to train
> Lucene with tagged files and then it tags test files according to those
> pre-defined tags?
> 
> Yours Sincerely,
> benglish
> 
> 
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Train-Lucene-with-topic-defined-files-tp4141979.html
> Sent from the Lucene - General mailing list archive at Nabble.com.


Mime
View raw message