mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yuval Feinstein <>
Subject Re: Classifying documents in database
Date Mon, 14 Nov 2011 14:56:24 GMT
Here's one way - albeit indirect:

a. Index your DB into Solr, using the DataImportHandler:

b. Now you will have a Lucene index, which you can import into Mahout:

c. Train your classifier inside Mahout.

d. Run the classifier on the needed records, and get an output file in the
<record id> <label>

e. Use a script to insert the results into the database.

I am a Mahout newbie, so there might be more efficient ways.


On Mon, Nov 14, 2011 at 5:47 AM, Sam Cunningham <>wrote:

> I have a database of documents. In other words, each tuple contains a
> document that needs to be classified. Does Mahout API provide such
> capability that I connect to DB, get the document, classify and write the
> label back to database?
> I am aware I can connect to DB separately, loop through tuples, convert
> each
> tuple to a document, then use Mahout API to classify, and write back to the
> database, at the end. Is this the way to go?
> To be more specific, does BayesFileFormatter in Mahout API come with
> readerToDatabase method? or is there a way to use readerToDocument method
> along with a database tuple instead of Files.newReader()?
> What is the best practice to connect and read/write from/to DB from Mahout
> classifier?
> --
> View this message in context:
> Sent from the Mahout User List mailing list archive at

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message