mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yuval Feinstein <yuv...@citypath.com>
Subject Re: Classifying documents in database
Date Mon, 14 Nov 2011 14:56:24 GMT
Here's one way - albeit indirect:

a. Index your DB into Solr, using the DataImportHandler:
http://wiki.apache.org/solr/DataImportHandler#Usage_with_RDBMS

b. Now you will have a Lucene index, which you can import into Mahout:
https://cwiki.apache.org/MAHOUT/creating-vectors-from-text.html#CreatingVectorsfromText-FromLucene

c. Train your classifier inside Mahout.

d. Run the classifier on the needed records, and get an output file in the
format:
<record id> <label>

e. Use a script to insert the results into the database.

I am a Mahout newbie, so there might be more efficient ways.

Cheers,
Yuval


On Mon, Nov 14, 2011 at 5:47 AM, Sam Cunningham <sam_cunnin@yahoo.com>wrote:

> I have a database of documents. In other words, each tuple contains a
> document that needs to be classified. Does Mahout API provide such
> capability that I connect to DB, get the document, classify and write the
> label back to database?
>
> I am aware I can connect to DB separately, loop through tuples, convert
> each
> tuple to a document, then use Mahout API to classify, and write back to the
> database, at the end. Is this the way to go?
>
> To be more specific, does BayesFileFormatter in Mahout API come with
> readerToDatabase method? or is there a way to use readerToDocument method
> along with a database tuple instead of Files.newReader()?
>
> What is the best practice to connect and read/write from/to DB from Mahout
> classifier?
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Classifying-documents-in-database-tp3505846p3505846.html
> Sent from the Mahout User List mailing list archive at Nabble.com.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message