lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 刘庆志 <liuqingz...@gmail.com>
Subject Re: how to design Lucene Document and Field to indexing and searching email message and attachments
Date Thu, 29 Apr 2010 03:27:14 GMT
Erick:
    Thanks for your information.    
    I search for the similar question and get a very like issue:Best Practice: emails and
file-attachments on 15 August 2006,in mailing list archives(http://mail-archives.apache.org/mod_mbox/lucene-java-user/200608.mbox/browser),but
there is no final answer in that thread.
    
        


dazhi



----- Original Message ----- 
From: "Erick Erickson" <erickerickson@gmail.com>
To: <java-user@lucene.apache.org>
Sent: Wednesday, April 28, 2010 11:00 PM
Subject: Re: how to design Lucene Document and Field to indexing and searching email message
and attachments


This problem has been discussed several times, although I can't
remember the answer. So I'd recommend searching the mail archive
first.

Lucid maintains a searchable archive, see:
http://www.lucidimagination.com/About-Search

HTH
Erick

On Wed, Apr 28, 2010 at 2:35 AM, 刘庆志 <liuqingzhi2@gmail.com> wrote:

> hi all:
> our bussiness system generate some data,that information structrue like
> email message,one message have some attachments,so we can use email message
> to think of our data,I need index and search the message and its
> attachments,and when display hits,must display two kinds of links for every
> hit: one kind for message,and the other for attachments which match the
> query criteria,so the former kind there is only one link,but latter may be
> zero to n links.
> one design may be as:for every message design one Lucene Document,it has a
> field to record its id,let's name the field id,an other field to correspond
> all its attachments,let's name the field attachments,afert that we also
> design a Lucene Document to correspond every message's every attachment,this
> Document has a field record its message's id,let's name the field messageid,
> so when query,we can retrieval messages may be itself's cotent or its
> attachments content match the query criteria,for generating links for
> attachments which match the query criteria,we can requey again,this time we
> can query only the message's attachments by adding a query condition that
> messageid=father query's messge id.it's obviously,there are two
> disadvantages: 1,it indexes attachments twice,one in message,and the other
> in Lucene Document for attachment.2,user's one query becomes 1+n query,1 for
> query message and its all attachments,n for requery the message's every
> attachments.is there any better solution?
>
>
> dazhi
>
> Thanks for any hints!!!
>
Mime
View raw message