lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Krupansky" <>
Subject Re: AW: Email classification with solr
Date Tue, 01 May 2012 17:41:23 GMT
If you have the code that does all of that analysis, then you could 
integrate it with Solr using one of the approaches I listed, but Solr itself 
would not provide any of that analysis.

-- Jack Krupansky

-----Original Message----- 
From: Ramo Karahasan
Sent: Tuesday, May 01, 2012 1:14 PM
Subject: AW: Email classification with solr

Hi Jack,

thanks for the feedback. I'm really new to that stuff and not sure if I have
fully understood it.

Currently I've split emails in their properties and saved them into
relational tables, for example the body part. Most of my e-mails are html
emails. Now I have for example three categories: newsletter is on of this
category. I would like to classify incoming emails as newsletter, if they
fulfill an amount of attributes, e.g. the email address of the sender
comprised newsletter and variants of this word in the address AND a
newsletter content (body) should be classified as an newsletter.

Is that possible to do that just with solr? Or do I need another tools for
classifiying on the basis of text analysis? Isn't it necessary to build up a
taxonomy for "newsletter emails" so that the classifier can match the mail
text with some ruleset (defined taxonomy)?


-----Urspr√ľngliche Nachricht-----
Von: Jack Krupansky []
Gesendet: Dienstag, 1. Mai 2012 18:49
Betreff: Re: Email classification with solr

There are a number of different routes you can go, one of which is to use
SolrCell (Tika) to parse mbox files and then add your own update processor
that does whatever mail classification analysis you desire and then
generates addition field values for the classification.

A simpler approach is to do the analysis yourself outside of Solr and then
feed the mbox data for each message into SolrCell along with the specific
literal field values derived from your classification analysis. SolrCell
(Tika) would then parse the mail message and add your literal field values.

Or, you may want to consider fully parsing the mail messages outside of Solr
so that you have full control over what gets parsed and which schema fields
are used or not used, in additional to your content analysis field values.

-- Jack Krupansky

-----Original Message-----
From: Ramo Karahasan
Sent: Tuesday, May 01, 2012 12:17 PM
Subject: Email classification with solr


just a short question:

Is it possible to use solr/Lucene as a e-mail classifier? I mean, analyzing
an e-mail to add it automatically to a category (four are available)?



View raw message