lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Troy Wical <t...@wical.com>
Subject Re: Email Indexing
Date Thu, 28 Oct 2010 01:16:36 GMT
On Oct 27, 2010, at 3:57 PM, Hasan Diwan wrote:

> I'd like to provide myself with a searchable index of email. I'm
> familiar with the Javamail library, so will use this to fetch the
> mail. Anyone out there done any indexing of email? On Sourceforge,
> there's zoe[1], which hasn't had a release since 2004, and a couple of
> other projects. I'm also seeing something about sphinx, which reads
> like another indexing platform(?). Any advice regarding this is
> appreciated.

Depends on what your trying to index, I suppose. Maildir or mbox? For some time now, off and
on, I have been working to index an ezmlm mailing list archive. In the end, I went with Swish-E
and have made quite a bit of progress. I am short of my complete goal though. The issue is
that the search results do not return results that contain the subject, and there is currently
no excerpt or phrase highlighting. My problem is the flat text email files I am working with
have no xml or anything to help the indexer create fields from. I've not yet figured out how
to convert the emails to xml.

Other than though, it's functional, and very fast. That being said, I'm sure Sphinx or Lucene
could do the same thing, and I would love to hear from anyone out there who is using Lucene
to index a list of emails that are mbox format.

You can see my Swish-E implementation, in all it's unfinished glory, at http://type2.com/search
It covers roughly 200,000 emails over the past 15 years.

Peace, Troy
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message