lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Koji Sekiguchi <k...@r.email.ne.jp>
Subject Re: Train Lucene with topic-defined files
Date Tue, 24 Jun 2014 09:04:23 GMT
Hi benglish,

 > 1. When making the index file (according to my previous post), and running
> the code for the first time, I can see that in the line:
>
>              ClassificationResult<BytesRef> result =
> classifier.assignClass(doc.get("content"));
>              String classified = result.getAssignedClass().utf8ToString();
>
> "classified" is set to "write.block" and it causes the algorithm to find
> many non-matching pairs!!! Could you tell me what I can do to overcome this
> issue? I made the index for the second time and the issues got solved, but I
> want to know why it does not work by the first index file!!!!

I don't know why "write.block" is returned.
Are you sure you made a correct Lucene index?
If not, why don't you use Solr to create the index?

> 2. As far as I have understood, your test dataset is just your training
> dataset, am I right? If not, should I make an index file for the test
> dataset, too?

Yes, your understanding is correct. But you don't need to index your
test set to categorize them. Once you call train(), you can call
assignClass("your test content string here") to get the category tag.

Koji
-- 
http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html

Mime
View raw message