Return-Path: Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: (qmail 41257 invoked from network); 25 Nov 2010 22:37:39 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 25 Nov 2010 22:37:39 -0000 Received: (qmail 69450 invoked by uid 500); 25 Nov 2010 22:37:37 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 69400 invoked by uid 500); 25 Nov 2010 22:37:37 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 69393 invoked by uid 99); 25 Nov 2010 22:37:37 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 Nov 2010 22:37:37 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 Nov 2010 22:37:35 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id oAPMbDxi006736 for ; Thu, 25 Nov 2010 22:37:13 GMT Message-ID: <13449903.318151290724633808.JavaMail.jira@thor> Date: Thu, 25 Nov 2010 17:37:13 -0500 (EST) From: "Peter Sturge (JIRA)" To: dev@lucene.apache.org Subject: [jira] Updated: (SOLR-2245) MailEntityProcessor Update In-Reply-To: <27357147.184521290123793481.JavaMail.jira@thor> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/SOLR-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Sturge updated SOLR-2245: ------------------------------- Attachment: SOLR-2245.zip This patch update does a more proper delta-import implementation, rather than the kludge used in the previous version. MailEntityProcessor with this patch is useful for importing emails 'en-masse' the first time 'round, then only new mails after that. Behaviour: * If you send a full-import command, then the 'fetchMailsSince' property specified in data-config.xml will always be used. * If you send a delta-import command, the 'fetchMailsSince' property specified in data-config.xml is used for the first call only. Subsequent delta-import commands will use the time since the last index update. There are significant code changes in this version. So much so, that I've included the complete MailEntityProcessor source as well as a PATCH file. This version doesn't use the persistent last_index_time functionality of dataimport.properties (i.e. it's delta only for the life of the solr process). If I get some free cycles, I'll try to put this in. > MailEntityProcessor Update > -------------------------- > > Key: SOLR-2245 > URL: https://issues.apache.org/jira/browse/SOLR-2245 > Project: Solr > Issue Type: Improvement > Components: contrib - DataImportHandler > Affects Versions: 1.4, 1.4.1 > Reporter: Peter Sturge > Priority: Minor > Fix For: 1.4.2 > > Attachments: SOLR-2245.patch, SOLR-2245.patch, SOLR-2245.zip > > > This patch addresses a number of issues in the MailEntityProcessor contrib-extras module. > The changes are outlined here: > * Added an 'includeContent' entity attribute to allow specifying content to be included independently of processing attachments > e.g. would include message content, but not attachment content > * Added a synonym called 'processAttachments', which is synonymous to the mis-spelled (and singular) 'processAttachement' property. This property functions the same as processAttachement. Default= 'true' - if either is false, then attachments are not processed. Note that only one of these should really be specified in a given tag. > * Added a FLAGS.NONE value, so that if an email has no flags (i.e. it is unread, not deleted etc.), there is still a property value stored in the 'flags' field (the value is the string "none") > Note: there is a potential backward compat issue with FLAGS.NONE for clients that expect the absence of the 'flags' field to mean 'Not read'. I'm calculating this would be extremely rare, and is inadviasable in any case as user flags can be arbitrarily set, so fixing it up now will ensure future client access will be consistent. > * The folder name of an email is now included as a field called 'folder' (e.g. folder=INBOX.Sent). This is quite handy in search/post-indexing processing > * The addPartToDocument() method that processes attachments is significantly re-written, as there looked to be no real way the existing code would ever actually process attachment content and add it to the row data > Tested on the 3.x trunk with a number of popular imap servers. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org