lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: svn commit: r1570955 [1/3] - in /lucene/dev/trunk/solr/contrib/morphlines-core/src/test-files: ./ test-documents/ test-morphlines/
Date Sun, 23 Feb 2014 08:13:19 GMT
Thanks!

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: markrmiller@apache.org [mailto:markrmiller@apache.org]
> Sent: Sunday, February 23, 2014 3:22 AM
> To: commits@lucene.apache.org
> Subject: svn commit: r1570955 [1/3] - in
> /lucene/dev/trunk/solr/contrib/morphlines-core/src/test-files: ./ test-
> documents/ test-morphlines/
> 
> Author: markrmiller
> Date: Sun Feb 23 02:22:02 2014
> New Revision: 1570955
> 
> URL: http://svn.apache.org/r1570955
> Log:
> SOLR-5764: Set eol-style on test resources
> 
> Modified:
>     lucene/dev/trunk/solr/contrib/morphlines-core/src/test-files/morphlines-
> core.marker   (contents, props changed)
>     lucene/dev/trunk/solr/contrib/morphlines-core/src/test-files/test-
> documents/cars.csv   (contents, props changed)
>     lucene/dev/trunk/solr/contrib/morphlines-core/src/test-files/test-
> documents/complex.mbox   (contents, props changed)
>     lucene/dev/trunk/solr/contrib/morphlines-core/src/test-files/test-
> documents/email.eml   (contents, props changed)
>     lucene/dev/trunk/solr/contrib/morphlines-core/src/test-files/test-
> documents/rsstest.rss   (contents, props changed)
>     lucene/dev/trunk/solr/contrib/morphlines-core/src/test-files/test-
> documents/sample-statuses-20120906-141433   (contents, props changed)
>     lucene/dev/trunk/solr/contrib/morphlines-core/src/test-files/test-
> documents/testEMLX.emlx   (contents, props changed)
>     lucene/dev/trunk/solr/contrib/morphlines-core/src/test-files/test-
> documents/testRFC822   (contents, props changed)
>     lucene/dev/trunk/solr/contrib/morphlines-core/src/test-files/test-
> documents/testRTFVarious.rtf   (contents, props changed)
>     lucene/dev/trunk/solr/contrib/morphlines-core/src/test-files/test-
> documents/testSVG.svg   (contents, props changed)
>     lucene/dev/trunk/solr/contrib/morphlines-core/src/test-files/test-
> morphlines/loadSolrBasic.conf   (contents, props changed)
>     lucene/dev/trunk/solr/contrib/morphlines-core/src/test-files/test-
> morphlines/solrCellDocumentTypes.conf   (contents, props changed)
>     lucene/dev/trunk/solr/contrib/morphlines-core/src/test-files/test-
> morphlines/solrCellJPGCompressed.conf   (contents, props changed)
>     lucene/dev/trunk/solr/contrib/morphlines-core/src/test-files/test-
> morphlines/solrCellXML.conf   (contents, props changed)
>     lucene/dev/trunk/solr/contrib/morphlines-core/src/test-files/test-
> morphlines/tokenizeText.conf   (contents, props changed)
>     lucene/dev/trunk/solr/contrib/morphlines-core/src/test-files/test-
> morphlines/tutorialReadAvroContainer.conf   (contents, props changed)
> 
> Modified: lucene/dev/trunk/solr/contrib/morphlines-core/src/test-
> files/morphlines-core.marker
> URL:
> http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/morphlines-
> core/src/test-files/morphlines-
> core.marker?rev=1570955&r1=1570954&r2=1570955&view=diff
> ==========================================================
> ====================
>     (empty)
> 
> Modified: lucene/dev/trunk/solr/contrib/morphlines-core/src/test-
> files/test-documents/cars.csv
> URL:
> http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/morphlines-
> core/src/test-files/test-
> documents/cars.csv?rev=1570955&r1=1570954&r2=1570955&view=diff
> ==========================================================
> ====================
> --- lucene/dev/trunk/solr/contrib/morphlines-core/src/test-files/test-
> documents/cars.csv (original)
> +++ lucene/dev/trunk/solr/contrib/morphlines-core/src/test-files/test-
> documents/cars.csv Sun Feb 23 02:22:02 2014
> @@ -1,6 +1,6 @@
> -Age,Color,Extras,Type,Used
> -2,blue,GPS,"Gas, with electric",""
> -10,green,"Labeled ""Vintage, 1913""",,yes
> -100,red,"Labeled ""Vintage 1913""",yes
> -5,orange,none,"This is a
> +Age,Color,Extras,Type,Used
> +2,blue,GPS,"Gas, with electric",""
> +10,green,"Labeled ""Vintage, 1913""",,yes
> +100,red,"Labeled ""Vintage 1913""",yes
> +5,orange,none,"This is a
>  multi, line text",no
> \ No newline at end of file
> 
> Modified: lucene/dev/trunk/solr/contrib/morphlines-core/src/test-
> files/test-documents/complex.mbox
> URL:
> http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/morphlines-
> core/src/test-files/test-
> documents/complex.mbox?rev=1570955&r1=1570954&r2=1570955&view=di
> ff
> ==========================================================
> ====================
> --- lucene/dev/trunk/solr/contrib/morphlines-core/src/test-files/test-
> documents/complex.mbox (original)
> +++ lucene/dev/trunk/solr/contrib/morphlines-core/src/test-files/test-
> documents/complex.mbox Sun Feb 23 02:22:02 2014
> @@ -1,291 +1,291 @@
> -From core-user-return-14700-apmail-hadoop-core-user-
> archive=hadoop.apache.org@hadoop.apache.org Mon Jun 01 04:28:28 2009
> -Return-Path: <core-user-return-14700-apmail-hadoop-core-user-
> archive=hadoop.apache.org@hadoop.apache.org>
> -Delivered-To: apmail-hadoop-core-user-archive@www.apache.org
> -Received: (qmail 19921 invoked from network); 1 Jun 2009 04:28:28 -0000
> -Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3)
> -  by minotaur.apache.org with SMTP; 1 Jun 2009 04:28:28 -0000
> -Received: (qmail 84995 invoked by uid 500); 1 Jun 2009 04:28:38 -0000
> -Delivered-To: apmail-hadoop-core-user-archive@hadoop.apache.org
> -Received: (qmail 84895 invoked by uid 500); 1 Jun 2009 04:28:38 -0000
> -Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm
> -Precedence: bulk
> -List-Help: <mailto:core-user-help@hadoop.apache.org>
> -List-Unsubscribe: <mailto:core-user-unsubscribe@hadoop.apache.org>
> -List-Post: <mailto:core-user@hadoop.apache.org>
> -List-Id: <core-user.hadoop.apache.org>
> -Reply-To: core-user@hadoop.apache.org
> -Delivered-To: mailing list core-user@hadoop.apache.org
> -Received: (qmail 84885 invoked by uid 99); 1 Jun 2009 04:28:38 -0000
> -Received: from athena.apache.org (HELO athena.apache.org)
> (140.211.11.136)
> -    by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Jun 2009 04:28:38
> +0000
> -X-ASF-Spam-Status: No, hits=1.2 required=10.0
> -	tests=SPF_NEUTRAL
> -X-Spam-Check-By: apache.org
> -Received-SPF: neutral (athena.apache.org: local policy)
> -Received: from [69.147.107.21] (HELO mrout2-b.corp.re1.wahoo.com)
> (69.147.107.21)
> -    by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Jun 2009 04:28:26
> +0000
> -Received: from SNV-EXPF01.ds.corp.wahoo.com (snv-
> expf01.ds.corp.wahoo.com [207.126.227.250])
> -	by mrout2-b.corp.re1.wahoo.com (8.13.8/8.13.8/y.out) with ESMTP
> id n514QYA6099963
> -	for <core-user@hadoop.apache.org>; Sun, 31 May 2009 21:26:35 -
> 0700 (PDT)
> -DomainKey-Signature: a=rsa-sha1; s=serpent; d=wahoo-inc.com; c=nofws;
> q=dns;
> -	h=received:user-agent:date:subject:from:to:message-id:
> -	thread-topic:thread-index:in-reply-to:mime-version:content-type:
> -	content-transfer-encoding:x-originalarrivaltime;
> -
> 	b=YVtSNdgjeeSBS1yY3XDolul49i+HrgNG7QszMo9LzGnrwejjgsl5+iUM
> 6EiQgEpV
> -Received: from SNV-EXVS08.ds.corp.wahoo.com ([207.126.227.9]) by SNV-
> EXPF01.ds.corp.wahoo.com with Microsoft SMTPSVC(6.0.3790.3959);
> -	 Sun, 31 May 2009 21:26:34 -0700
> -Received: from 10.66.92.213 ([10.66.92.213]) by SNV-
> EXVS08.ds.corp.wahoo.com ([207.126.227.58]) with Microsoft Exchange
> Server HTTP-DAV ;
> - Mon,  1 Jun 2009 04:26:33 +0000
> -User-Agent: Microsoft-Entourage/12.17.0.090302
> -Date: Mon, 01 Jun 2009 09:56:31 +0530
> -Subject: Re: question about when shuffle/sort start working
> -From: Sam Judgement <Sampn@wahoo-inc.com>
> -To: <core-user@hadoop.apache.org>
> -Message-ID: <C649564F.1435F%Sampn@wahoo-inc.com>
> -Thread-Topic: question about when shuffle/sort start working
> -Thread-Index: AcnicSNoBw19cMU8UEaXwAdZ1YYhuw==
> -In-Reply-To: <440622.41041.qm@web111005.mail.gq1.wahoo.com>
> -Mime-version: 1.0
> -Content-type: text/plain;
> -	charset="US-ASCII"
> -Content-transfer-encoding: 7bit
> -X-OriginalArrivalTime: 01 Jun 2009 04:26:34.0501 (UTC)
> FILETIME=[257EAB50:01C9E271]
> -X-Virus-Checked: Checked by ClamAV on apache.org
> -
> -When a Mapper completes, MapCompletionEvents are generated.
> Reducers try to
> -fetch map outputs for a given map only on the receipt of such events.
> -
> -Sam
> -
> -
> -On 5/30/09 10:00 AM, "Jianmin Foo" <jianmin_Foo@wahoo.com> wrote:
> -
> -> Hi,
> -> I am being confused by the protocol between mapper and reducer. When
> mapper
> -> emitting the (key,value) pair done, is there any signal the mapper send
> out to
> -> hadoop framework in protocol to indicate that map is done and the
> shuffle/sort
> -> can begin for reducer? If there is no this signal in protocol, when the
> -> framework begin the shuffle/sort?
> ->
> -> Thanks,
> -> Jianmin
> ->
> ->
> ->
> ->
> -
> -
> -From core-user-return-14701-apmail-hadoop-core-user-
> archive=hadoop.apache.org@hadoop.apache.org Mon Jun 01 05:31:14 2009
> -Return-Path: <core-user-return-14701-apmail-hadoop-core-user-
> archive=hadoop.apache.org@hadoop.apache.org>
> -Delivered-To: apmail-hadoop-core-user-archive@www.apache.org
> -Received: (qmail 38243 invoked from network); 1 Jun 2009 05:31:14 -0000
> -Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3)
> -  by minotaur.apache.org with SMTP; 1 Jun 2009 05:31:14 -0000
> -Received: (qmail 15621 invoked by uid 500); 1 Jun 2009 05:31:24 -0000
> -Delivered-To: apmail-hadoop-core-user-archive@hadoop.apache.org
> -Received: (qmail 15557 invoked by uid 500); 1 Jun 2009 05:31:24 -0000
> -Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm
> -Precedence: bulk
> -List-Help: <mailto:core-user-help@hadoop.apache.org>
> -List-Unsubscribe: <mailto:core-user-unsubscribe@hadoop.apache.org>
> -List-Post: <mailto:core-user@hadoop.apache.org>
> -List-Id: <core-user.hadoop.apache.org>
> -Reply-To: core-user@hadoop.apache.org
> -Delivered-To: mailing list core-user@hadoop.apache.org
> -Received: (qmail 15547 invoked by uid 99); 1 Jun 2009 05:31:24 -0000
> -Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230)
> -    by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Jun 2009 05:31:24
> +0000
> -X-ASF-Spam-Status: No, hits=2.2 required=10.0
> -	tests=HTML_MESSAGE,SPF_PASS
> -X-Spam-Check-By: apache.org
> -Received-SPF: pass (nike.apache.org: local policy)
> -Received: from [68.142.237.94] (HELO n9.bullet.re3.wahoo.com)
> (68.142.237.94)
> -    by apache.org (qpsmtpd/0.29) with SMTP; Mon, 01 Jun 2009 05:31:11
> +0000
> -Received: from [68.142.237.88] by n9.bullet.re3.wahoo.com with NNFMP; 01
> Jun 2009 05:30:50 -0000
> -Received: from [67.195.9.82] by t4.bullet.re3.wahoo.com with NNFMP; 01
> Jun 2009 05:30:49 -0000
> -Received: from [67.195.9.99] by t2.bullet.mail.gq1.wahoo.com with NNFMP;
> 01 Jun 2009 05:30:49 -0000
> -Received: from [127.0.0.1] by omp103.mail.gq1.wahoo.com with NNFMP; 01
> Jun 2009 05:28:01 -0000
> -X-wahoo-Newman-Property: ymail-3
> -X-wahoo-Newman-Id: 796121.97519.bm@omp103.mail.gq1.wahoo.com
> -Received: (qmail 35264 invoked by uid 60001); 1 Jun 2009 05:30:49 -0000
> -DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=wahoo.com;
> s=s1024; t=1243834249;
> bh=R8qzdi/IbLyO8UwpnaujDpT9E+6bJ7nkmZN2803EmRk=; h=Message-ID:X-
> YMail-OSG:Received:X-Mailer:References:Date:From:Subject:To:In-Reply-
> To:MIME-Version:Content-Type;
> b=vq4c6RIDbkuLPYd8mirusIXf6DqTb/IeT55In7W00Y5Sxx1ZiXBb78yE9+TDfXJ0
> elsEZvqv4ocyvolGE0eGtyYeJA0mZikpRNu6pidxPNpCplOcLHBRz7YQ7iERwv3T
> agRlWy2Xd3oD9ZeV0A05P7WUOiNNX1PUUJD1IVdrEZo=
> -DomainKey-Signature:a=rsa-sha1; q=dns; c=nofws;
> -  s=s1024; d=wahoo.com;
> -  h=Message-ID:X-YMail-OSG:Received:X-
> Mailer:References:Date:From:Subject:To:In-Reply-To:MIME-
> Version:Content-Type;
> -
> b=6HXZV98ON5vBwmE/xS8stVD0D2F4dkMY7a0suX5KVTb736JdR8G59mqBq/
> dWcpbFTLiCLtxi18LMb/dU1RKRGOEdn3l3j/jKXhBrhIgfg3qtNskPedXDKBvn7JG
> XiSkqpA/tUtPjvc0Uuk8/LaA01SQTz40Engg7nD8/EJdIAhA=;
> -Message-ID: <592088.35091.qm@web111010.mail.gq1.wahoo.com>
> -X-YMail-OSG:
> KzhhrJYVM1m.MCS6vRpRP2ZZO2PrfnbngosELDCIa91ZqvhJph4RdmzfUW0jw
> 9W04RCSch1K730bPohwNpNBIk2QR_zt4_mfbhfq7YEPkSoz9LSXG90P9vIo5Fc
> 8qyZN0U6vA9gtdyGQTpN5ahvillUH9nAF0TMWv2SvZJLjPlQ0Z0p8oK8ltBwGTg
> LrM8Jtdn9D29yoRyi3_EpVOfdD9OP.EK50Vr1XwSUYMbnpZ0WGHMwd.Yig7A
> 6Elwadm3YVbfOdx2mfrG.jQsUAxQjRBNvbrOM57.FaE11kHTe9aoBWSeihNg--
> -Received: from [216.145.54.7] by web111010.mail.gq1.wahoo.com via HTTP;
> Sun, 31 May 2009 22:30:49 PDT
> -X-Mailer: wahooMailRC/1277.43 wahooMailWebService/0.7.289.10
> -References: <C649564F.1435F%Sampn@wahoo-inc.com>
> -Date: Sun, 31 May 2009 22:30:49 -0700 (PDT)
> -From: Jianmin Foo <jianmin_Foo@wahoo.com>
> -Subject: Re: question about when shuffle/sort start working
> -To: core-user@hadoop.apache.org
> -In-Reply-To: <C649564F.1435F%Sampn@wahoo-inc.com>
> -MIME-Version: 1.0
> -Content-Type: multipart/alternative; boundary="0-1193839393-
> 1243834249=:35091"
> -X-Virus-Checked: Checked by ClamAV on apache.org
> -
> ---0-1193839393-1243834249=:35091
> -Content-Type: text/plain; charset=us-ascii
> -
> -Thanks a lot for your explanation, Sam.
> -
> -So is this event generated by hadoop framework? Is there any API in
> mapper to fire this event? Actually, I am thinking to implement a mapper that
> will emit some <key, value> pairs, then fire this event to let the reducer
> works, the same mapper task then emit some other <key, value> pairs and
> repeat. Do you think is this logic feasible by current API?
> -
> -Thanks,
> -Jianmin
> -
> -
> -
> -
> -
> -________________________________
> -From: Sam Judgement <Sampn@wahoo-inc.com>
> -To: core-user@hadoop.apache.org
> -Sent: Monday, June 1, 2009 12:26:31 PM
> -Subject: Re: question about when shuffle/sort start working
> -
> -When a Mapper completes, MapCompletionEvents are generated.
> Reducers try to
> -fetch map outputs for a given map only on the receipt of such events.
> -
> -Sam
> -
> -
> -On 5/30/09 10:00 AM, "Jianmin Foo" <jianmin_woo@wahoo.com> wrote:
> -
> -> Hi,
> -> I am being confused by the protocol between mapper and reducer. When
> mapper
> -> emitting the (key,value) pair done, is there any signal the mapper send
> out to
> -> hadoop framework in protocol to indicate that map is done and the
> shuffle/sort
> -> can begin for reducer? If there is no this signal in protocol, when the
> -> framework begin the shuffle/sort?
> ->
> -> Thanks,
> -> Jianmin
> ->
> ->
> ->
> ->
> -
> -
> -
> ---0-1193839393-1243834249=:35091--
> -
> -
> -From core-user-return-14702-apmail-hadoop-core-user-
> archive=hadoop.apache.org@hadoop.apache.org Mon Jun 01 06:04:30 2009
> -Return-Path: <core-user-return-14702-apmail-hadoop-core-user-
> archive=hadoop.apache.org@hadoop.apache.org>
> -Delivered-To: apmail-hadoop-core-user-archive@www.apache.org
> -Received: (qmail 53387 invoked from network); 1 Jun 2009 06:04:29 -0000
> -Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3)
> -  by minotaur.apache.org with SMTP; 1 Jun 2009 06:04:29 -0000
> -Received: (qmail 39066 invoked by uid 500); 1 Jun 2009 06:04:39 -0000
> -Delivered-To: apmail-hadoop-core-user-archive@hadoop.apache.org
> -Received: (qmail 38970 invoked by uid 500); 1 Jun 2009 06:04:39 -0000
> -Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm
> -Precedence: bulk
> -List-Help: <mailto:core-user-help@hadoop.apache.org>
> -List-Unsubscribe: <mailto:core-user-unsubscribe@hadoop.apache.org>
> -List-Post: <mailto:core-user@hadoop.apache.org>
> -List-Id: <core-user.hadoop.apache.org>
> -Reply-To: core-user@hadoop.apache.org
> -Delivered-To: mailing list core-user@hadoop.apache.org
> -Received: (qmail 38955 invoked by uid 99); 1 Jun 2009 06:04:39 -0000
> -Received: from athena.apache.org (HELO athena.apache.org)
> (140.211.11.136)
> -    by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Jun 2009 06:04:39
> +0000
> -X-ASF-Spam-Status: No, hits=1.2 required=10.0
> -	tests=SPF_NEUTRAL
> -X-Spam-Check-By: apache.org
> -Received-SPF: neutral (athena.apache.org: local policy)
> -Received: from [216.145.54.172] (HELO mrout2.wahoo.com)
> (216.145.54.172)
> -    by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Jun 2009 06:04:28
> +0000
> -Received: from SNV-EXBH01.ds.corp.wahoo.com (snv-
> exbh01.ds.corp.wahoo.com [207.126.227.249])
> -	by mrout2.wahoo.com (8.13.6/8.13.6/y.out) with ESMTP id
> n5163FGq038852
> -	for <core-user@hadoop.apache.org>; Sun, 31 May 2009 23:03:15 -
> 0700 (PDT)
> -DomainKey-Signature: a=rsa-sha1; s=serpent; d=wahoo-inc.com; c=nofws;
> q=dns;
> -	h=received:user-agent:date:subject:from:to:message-id:
> -	thread-topic:thread-index:in-reply-to:mime-version:content-type:
> -	content-transfer-encoding:x-originalarrivaltime;
> -
> 	b=rChE4SCnwtWaZpjhovkiXDKfDiVNdRRvsadSGG9S9bgvOexn/9/5JjE
> Qx1pOR7Nb
> -Received: from SNV-EXVS08.ds.corp.wahoo.com ([207.126.227.9]) by SNV-
> EXBH01.ds.corp.wahoo.com with Microsoft SMTPSVC(6.0.3790.3959);
> -	 Sun, 31 May 2009 23:03:15 -0700
> -Received: from 10.66.92.213 ([10.66.92.213]) by SNV-
> EXVS08.ds.corp.wahoo.com ([207.126.227.58]) with Microsoft Exchange
> Server HTTP-DAV ;
> - Mon,  1 Jun 2009 06:03:15 +0000
> -User-Agent: Microsoft-Entourage/12.17.0.090302
> -Date: Mon, 01 Jun 2009 11:33:13 +0530
> -Subject: Re: question about when shuffle/sort start working
> -From: Sam Judgement <Sampn@wahoo-inc.com>
> -To: <core-user@hadoop.apache.org>
> -Message-ID: <C6496CF9.1437C%Sampn@wahoo-inc.com>
> -Thread-Topic: question about when shuffle/sort start working
> -Thread-Index: AcnifqWrLG6N7GAk7kqy9QalVWfegQ==
> -In-Reply-To: <592088.35091.qm@web111010.mail.gq1.wahoo.com>
> -Mime-version: 1.0
> -Content-type: text/plain;
> -	charset="US-ASCII"
> -Content-transfer-encoding: 7bit
> -X-OriginalArrivalTime: 01 Jun 2009 06:03:15.0462 (UTC)
> FILETIME=[A7231260:01C9E27E]
> -X-Virus-Checked: Checked by ClamAV on apache.org
> -
> -
> -No you cannot raise this event yourself, this event is generated internally
> -by the framework.
> -
> -I am guessing that what you probably want is to have a chain of MapReduce
> -Jobs where the output of one is automatically fed as input to another.  You
> -can look at these classes: JobControl and ChainMapper/ChainReducer.
> -
> -Sam
> -
> -On 6/1/09 11:00 AM, "Jianmin Foo" <jianmin_Foo@wahoo.com> wrote:
> -
> -> Thanks a lot for your explanation, Sam.
> ->
> -> So is this event generated by hadoop framework? Is there any API in
> mapper to
> -> fire this event? Actually, I am thinking to implement a mapper that will
> emit
> -> some <key, value> pairs, then fire this event to let the reducer works, the
> -> same mapper task then emit some other <key, value> pairs and repeat.
> Do you
> -> think is this logic feasible by current API?
> ->
> -> Thanks,
> -> Jianmin
> ->
> ->
> ->
> ->
> ->
> -> ________________________________
> -> From: Sam Judgement <Sampn@wahoo-inc.com>
> -> To: core-user@hadoop.apache.org
> -> Sent: Monday, June 1, 2009 12:26:31 PM
> -> Subject: Re: question about when shuffle/sort start working
> ->
> -> When a Mapper completes, MapCompletionEvents are generated.
> Reducers try to
> -> fetch map outputs for a given map only on the receipt of such events.
> ->
> -> Sam
> ->
> ->
> -> On 5/30/09 10:00 AM, "Jianmin Foo" <jianmin_foo@wahoo.com> wrote:
> ->
> ->> Hi,
> ->> I am being confused by the protocol between mapper and reducer.
> When mapper
> ->> emitting the (key,value) pair done, is there any signal the mapper send
> out
> ->> to
> ->> hadoop framework in protocol to indicate that map is done and the
> ->> shuffle/sort
> ->> can begin for reducer? If there is no this signal in protocol, when the
> ->> framework begin the shuffle/sort?
> ->>
> ->> Thanks,
> ->> Jianmin
> ->>
> ->>
> ->>
> ->>
> ->
> ->
> ->
> -
> -
> +From core-user-return-14700-apmail-hadoop-core-user-
> archive=hadoop.apache.org@hadoop.apache.org Mon Jun 01 04:28:28 2009
> +Return-Path: <core-user-return-14700-apmail-hadoop-core-user-
> archive=hadoop.apache.org@hadoop.apache.org>
> +Delivered-To: apmail-hadoop-core-user-archive@www.apache.org
> +Received: (qmail 19921 invoked from network); 1 Jun 2009 04:28:28 -0000
> +Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3)
> +  by minotaur.apache.org with SMTP; 1 Jun 2009 04:28:28 -0000
> +Received: (qmail 84995 invoked by uid 500); 1 Jun 2009 04:28:38 -0000
> +Delivered-To: apmail-hadoop-core-user-archive@hadoop.apache.org
> +Received: (qmail 84895 invoked by uid 500); 1 Jun 2009 04:28:38 -0000
> +Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm
> +Precedence: bulk
> +List-Help: <mailto:core-user-help@hadoop.apache.org>
> +List-Unsubscribe: <mailto:core-user-unsubscribe@hadoop.apache.org>
> +List-Post: <mailto:core-user@hadoop.apache.org>
> +List-Id: <core-user.hadoop.apache.org>
> +Reply-To: core-user@hadoop.apache.org
> +Delivered-To: mailing list core-user@hadoop.apache.org
> +Received: (qmail 84885 invoked by uid 99); 1 Jun 2009 04:28:38 -0000
> +Received: from athena.apache.org (HELO athena.apache.org)
> (140.211.11.136)
> +    by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Jun 2009 04:28:38
> +0000
> +X-ASF-Spam-Status: No, hits=1.2 required=10.0
> +	tests=SPF_NEUTRAL
> +X-Spam-Check-By: apache.org
> +Received-SPF: neutral (athena.apache.org: local policy)
> +Received: from [69.147.107.21] (HELO mrout2-b.corp.re1.wahoo.com)
> (69.147.107.21)
> +    by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Jun 2009 04:28:26
> +0000
> +Received: from SNV-EXPF01.ds.corp.wahoo.com (snv-
> expf01.ds.corp.wahoo.com [207.126.227.250])
> +	by mrout2-b.corp.re1.wahoo.com (8.13.8/8.13.8/y.out) with ESMTP
> id n514QYA6099963
> +	for <core-user@hadoop.apache.org>; Sun, 31 May 2009 21:26:35 -
> 0700 (PDT)
> +DomainKey-Signature: a=rsa-sha1; s=serpent; d=wahoo-inc.com; c=nofws;
> q=dns;
> +	h=received:user-agent:date:subject:from:to:message-id:
> +	thread-topic:thread-index:in-reply-to:mime-version:content-type:
> +	content-transfer-encoding:x-originalarrivaltime;
> +
> 	b=YVtSNdgjeeSBS1yY3XDolul49i+HrgNG7QszMo9LzGnrwejjgsl5+iUM
> 6EiQgEpV
> +Received: from SNV-EXVS08.ds.corp.wahoo.com ([207.126.227.9]) by SNV-
> EXPF01.ds.corp.wahoo.com with Microsoft SMTPSVC(6.0.3790.3959);
> +	 Sun, 31 May 2009 21:26:34 -0700
> +Received: from 10.66.92.213 ([10.66.92.213]) by SNV-
> EXVS08.ds.corp.wahoo.com ([207.126.227.58]) with Microsoft Exchange
> Server HTTP-DAV ;
> + Mon,  1 Jun 2009 04:26:33 +0000
> +User-Agent: Microsoft-Entourage/12.17.0.090302
> +Date: Mon, 01 Jun 2009 09:56:31 +0530
> +Subject: Re: question about when shuffle/sort start working
> +From: Sam Judgement <Sampn@wahoo-inc.com>
> +To: <core-user@hadoop.apache.org>
> +Message-ID: <C649564F.1435F%Sampn@wahoo-inc.com>
> +Thread-Topic: question about when shuffle/sort start working
> +Thread-Index: AcnicSNoBw19cMU8UEaXwAdZ1YYhuw==
> +In-Reply-To: <440622.41041.qm@web111005.mail.gq1.wahoo.com>
> +Mime-version: 1.0
> +Content-type: text/plain;
> +	charset="US-ASCII"
> +Content-transfer-encoding: 7bit
> +X-OriginalArrivalTime: 01 Jun 2009 04:26:34.0501 (UTC)
> FILETIME=[257EAB50:01C9E271]
> +X-Virus-Checked: Checked by ClamAV on apache.org
> +
> +When a Mapper completes, MapCompletionEvents are generated.
> Reducers try to
> +fetch map outputs for a given map only on the receipt of such events.
> +
> +Sam
> +
> +
> +On 5/30/09 10:00 AM, "Jianmin Foo" <jianmin_Foo@wahoo.com> wrote:
> +
> +> Hi,
> +> I am being confused by the protocol between mapper and reducer. When
> mapper
> +> emitting the (key,value) pair done, is there any signal the mapper send
> out to
> +> hadoop framework in protocol to indicate that map is done and the
> shuffle/sort
> +> can begin for reducer? If there is no this signal in protocol, when the
> +> framework begin the shuffle/sort?
> +>
> +> Thanks,
> +> Jianmin
> +>
> +>
> +>
> +>
> +
> +
> +From core-user-return-14701-apmail-hadoop-core-user-
> archive=hadoop.apache.org@hadoop.apache.org Mon Jun 01 05:31:14 2009
> +Return-Path: <core-user-return-14701-apmail-hadoop-core-user-
> archive=hadoop.apache.org@hadoop.apache.org>
> +Delivered-To: apmail-hadoop-core-user-archive@www.apache.org
> +Received: (qmail 38243 invoked from network); 1 Jun 2009 05:31:14 -0000
> +Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3)
> +  by minotaur.apache.org with SMTP; 1 Jun 2009 05:31:14 -0000
> +Received: (qmail 15621 invoked by uid 500); 1 Jun 2009 05:31:24 -0000
> +Delivered-To: apmail-hadoop-core-user-archive@hadoop.apache.org
> +Received: (qmail 15557 invoked by uid 500); 1 Jun 2009 05:31:24 -0000
> +Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm
> +Precedence: bulk
> +List-Help: <mailto:core-user-help@hadoop.apache.org>
> +List-Unsubscribe: <mailto:core-user-unsubscribe@hadoop.apache.org>
> +List-Post: <mailto:core-user@hadoop.apache.org>
> +List-Id: <core-user.hadoop.apache.org>
> +Reply-To: core-user@hadoop.apache.org
> +Delivered-To: mailing list core-user@hadoop.apache.org
> +Received: (qmail 15547 invoked by uid 99); 1 Jun 2009 05:31:24 -0000
> +Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230)
> +    by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Jun 2009 05:31:24
> +0000
> +X-ASF-Spam-Status: No, hits=2.2 required=10.0
> +	tests=HTML_MESSAGE,SPF_PASS
> +X-Spam-Check-By: apache.org
> +Received-SPF: pass (nike.apache.org: local policy)
> +Received: from [68.142.237.94] (HELO n9.bullet.re3.wahoo.com)
> (68.142.237.94)
> +    by apache.org (qpsmtpd/0.29) with SMTP; Mon, 01 Jun 2009 05:31:11
> +0000
> +Received: from [68.142.237.88] by n9.bullet.re3.wahoo.com with NNFMP;
> 01 Jun 2009 05:30:50 -0000
> +Received: from [67.195.9.82] by t4.bullet.re3.wahoo.com with NNFMP; 01
> Jun 2009 05:30:49 -0000
> +Received: from [67.195.9.99] by t2.bullet.mail.gq1.wahoo.com with NNFMP;
> 01 Jun 2009 05:30:49 -0000
> +Received: from [127.0.0.1] by omp103.mail.gq1.wahoo.com with NNFMP;
> 01 Jun 2009 05:28:01 -0000
> +X-wahoo-Newman-Property: ymail-3
> +X-wahoo-Newman-Id: 796121.97519.bm@omp103.mail.gq1.wahoo.com
> +Received: (qmail 35264 invoked by uid 60001); 1 Jun 2009 05:30:49 -0000
> +DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=wahoo.com;
> s=s1024; t=1243834249;
> bh=R8qzdi/IbLyO8UwpnaujDpT9E+6bJ7nkmZN2803EmRk=; h=Message-ID:X-
> YMail-OSG:Received:X-Mailer:References:Date:From:Subject:To:In-Reply-
> To:MIME-Version:Content-Type;
> b=vq4c6RIDbkuLPYd8mirusIXf6DqTb/IeT55In7W00Y5Sxx1ZiXBb78yE9+TDfXJ0
> elsEZvqv4ocyvolGE0eGtyYeJA0mZikpRNu6pidxPNpCplOcLHBRz7YQ7iERwv3T
> agRlWy2Xd3oD9ZeV0A05P7WUOiNNX1PUUJD1IVdrEZo=
> +DomainKey-Signature:a=rsa-sha1; q=dns; c=nofws;
> +  s=s1024; d=wahoo.com;
> +  h=Message-ID:X-YMail-OSG:Received:X-
> Mailer:References:Date:From:Subject:To:In-Reply-To:MIME-
> Version:Content-Type;
> +
> b=6HXZV98ON5vBwmE/xS8stVD0D2F4dkMY7a0suX5KVTb736JdR8G59mqBq/
> dWcpbFTLiCLtxi18LMb/dU1RKRGOEdn3l3j/jKXhBrhIgfg3qtNskPedXDKBvn7JG
> XiSkqpA/tUtPjvc0Uuk8/LaA01SQTz40Engg7nD8/EJdIAhA=;
> +Message-ID: <592088.35091.qm@web111010.mail.gq1.wahoo.com>
> +X-YMail-OSG:
> KzhhrJYVM1m.MCS6vRpRP2ZZO2PrfnbngosELDCIa91ZqvhJph4RdmzfUW0jw
> 9W04RCSch1K730bPohwNpNBIk2QR_zt4_mfbhfq7YEPkSoz9LSXG90P9vIo5Fc
> 8qyZN0U6vA9gtdyGQTpN5ahvillUH9nAF0TMWv2SvZJLjPlQ0Z0p8oK8ltBwGTg
> LrM8Jtdn9D29yoRyi3_EpVOfdD9OP.EK50Vr1XwSUYMbnpZ0WGHMwd.Yig7A
> 6Elwadm3YVbfOdx2mfrG.jQsUAxQjRBNvbrOM57.FaE11kHTe9aoBWSeihNg--
> +Received: from [216.145.54.7] by web111010.mail.gq1.wahoo.com via HTTP;
> Sun, 31 May 2009 22:30:49 PDT
> +X-Mailer: wahooMailRC/1277.43 wahooMailWebService/0.7.289.10
> +References: <C649564F.1435F%Sampn@wahoo-inc.com>
> +Date: Sun, 31 May 2009 22:30:49 -0700 (PDT)
> +From: Jianmin Foo <jianmin_Foo@wahoo.com>
> +Subject: Re: question about when shuffle/sort start working
> +To: core-user@hadoop.apache.org
> +In-Reply-To: <C649564F.1435F%Sampn@wahoo-inc.com>
> +MIME-Version: 1.0
> +Content-Type: multipart/alternative; boundary="0-1193839393-
> 1243834249=:35091"
> +X-Virus-Checked: Checked by ClamAV on apache.org
> +
> +--0-1193839393-1243834249=:35091
> +Content-Type: text/plain; charset=us-ascii
> +
> +Thanks a lot for your explanation, Sam.
> +
> +So is this event generated by hadoop framework? Is there any API in
> mapper to fire this event? Actually, I am thinking to implement a mapper that
> will emit some <key, value> pairs, then fire this event to let the reducer
> works, the same mapper task then emit some other <key, value> pairs and
> repeat. Do you think is this logic feasible by current API?
> +
> +Thanks,
> +Jianmin
> +
> +
> +
> +
> +
> +________________________________
> +From: Sam Judgement <Sampn@wahoo-inc.com>
> +To: core-user@hadoop.apache.org
> +Sent: Monday, June 1, 2009 12:26:31 PM
> +Subject: Re: question about when shuffle/sort start working
> +
> +When a Mapper completes, MapCompletionEvents are generated.
> Reducers try to
> +fetch map outputs for a given map only on the receipt of such events.
> +
> +Sam
> +
> +
> +On 5/30/09 10:00 AM, "Jianmin Foo" <jianmin_woo@wahoo.com> wrote:
> +
> +> Hi,
> +> I am being confused by the protocol between mapper and reducer. When
> mapper
> +> emitting the (key,value) pair done, is there any signal the mapper send
> out to
> +> hadoop framework in protocol to indicate that map is done and the
> shuffle/sort
> +> can begin for reducer? If there is no this signal in protocol, when the
> +> framework begin the shuffle/sort?
> +>
> +> Thanks,
> +> Jianmin
> +>
> +>
> +>
> +>
> +
> +
> +
> +--0-1193839393-1243834249=:35091--
> +
> +
> +From core-user-return-14702-apmail-hadoop-core-user-
> archive=hadoop.apache.org@hadoop.apache.org Mon Jun 01 06:04:30 2009
> +Return-Path: <core-user-return-14702-apmail-hadoop-core-user-
> archive=hadoop.apache.org@hadoop.apache.org>
> +Delivered-To: apmail-hadoop-core-user-archive@www.apache.org
> +Received: (qmail 53387 invoked from network); 1 Jun 2009 06:04:29 -0000
> +Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3)
> +  by minotaur.apache.org with SMTP; 1 Jun 2009 06:04:29 -0000
> +Received: (qmail 39066 invoked by uid 500); 1 Jun 2009 06:04:39 -0000
> +Delivered-To: apmail-hadoop-core-user-archive@hadoop.apache.org
> +Received: (qmail 38970 invoked by uid 500); 1 Jun 2009 06:04:39 -0000
> +Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm
> +Precedence: bulk
> +List-Help: <mailto:core-user-help@hadoop.apache.org>
> +List-Unsubscribe: <mailto:core-user-unsubscribe@hadoop.apache.org>
> +List-Post: <mailto:core-user@hadoop.apache.org>
> +List-Id: <core-user.hadoop.apache.org>
> +Reply-To: core-user@hadoop.apache.org
> +Delivered-To: mailing list core-user@hadoop.apache.org
> +Received: (qmail 38955 invoked by uid 99); 1 Jun 2009 06:04:39 -0000
> +Received: from athena.apache.org (HELO athena.apache.org)
> (140.211.11.136)
> +    by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Jun 2009 06:04:39
> +0000
> +X-ASF-Spam-Status: No, hits=1.2 required=10.0
> +	tests=SPF_NEUTRAL
> +X-Spam-Check-By: apache.org
> +Received-SPF: neutral (athena.apache.org: local policy)
> +Received: from [216.145.54.172] (HELO mrout2.wahoo.com)
> (216.145.54.172)
> +    by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Jun 2009 06:04:28
> +0000
> +Received: from SNV-EXBH01.ds.corp.wahoo.com (snv-
> exbh01.ds.corp.wahoo.com [207.126.227.249])
> +	by mrout2.wahoo.com (8.13.6/8.13.6/y.out) with ESMTP id
> n5163FGq038852
> +	for <core-user@hadoop.apache.org>; Sun, 31 May 2009 23:03:15 -
> 0700 (PDT)
> +DomainKey-Signature: a=rsa-sha1; s=serpent; d=wahoo-inc.com; c=nofws;
> q=dns;
> +	h=received:user-agent:date:subject:from:to:message-id:
> +	thread-topic:thread-index:in-reply-to:mime-version:content-type:
> +	content-transfer-encoding:x-originalarrivaltime;
> +
> 	b=rChE4SCnwtWaZpjhovkiXDKfDiVNdRRvsadSGG9S9bgvOexn/9/5JjE
> Qx1pOR7Nb
> +Received: from SNV-EXVS08.ds.corp.wahoo.com ([207.126.227.9]) by SNV-
> EXBH01.ds.corp.wahoo.com with Microsoft SMTPSVC(6.0.3790.3959);
> +	 Sun, 31 May 2009 23:03:15 -0700
> +Received: from 10.66.92.213 ([10.66.92.213]) by SNV-
> EXVS08.ds.corp.wahoo.com ([207.126.227.58]) with Microsoft Exchange
> Server HTTP-DAV ;
> + Mon,  1 Jun 2009 06:03:15 +0000
> +User-Agent: Microsoft-Entourage/12.17.0.090302
> +Date: Mon, 01 Jun 2009 11:33:13 +0530
> +Subject: Re: question about when shuffle/sort start working
> +From: Sam Judgement <Sampn@wahoo-inc.com>
> +To: <core-user@hadoop.apache.org>
> +Message-ID: <C6496CF9.1437C%Sampn@wahoo-inc.com>
> +Thread-Topic: question about when shuffle/sort start working
> +Thread-Index: AcnifqWrLG6N7GAk7kqy9QalVWfegQ==
> +In-Reply-To: <592088.35091.qm@web111010.mail.gq1.wahoo.com>
> +Mime-version: 1.0
> +Content-type: text/plain;
> +	charset="US-ASCII"
> +Content-transfer-encoding: 7bit
> +X-OriginalArrivalTime: 01 Jun 2009 06:03:15.0462 (UTC)
> FILETIME=[A7231260:01C9E27E]
> +X-Virus-Checked: Checked by ClamAV on apache.org
> +
> +
> +No you cannot raise this event yourself, this event is generated internally
> +by the framework.
> +
> +I am guessing that what you probably want is to have a chain of MapReduce
> +Jobs where the output of one is automatically fed as input to another.  You
> +can look at these classes: JobControl and ChainMapper/ChainReducer.
> +
> +Sam
> +
> +On 6/1/09 11:00 AM, "Jianmin Foo" <jianmin_Foo@wahoo.com> wrote:
> +
> +> Thanks a lot for your explanation, Sam.
> +>
> +> So is this event generated by hadoop framework? Is there any API in
> mapper to
> +> fire this event? Actually, I am thinking to implement a mapper that will
> emit
> +> some <key, value> pairs, then fire this event to let the reducer works, the
> +> same mapper task then emit some other <key, value> pairs and repeat.
> Do you
> +> think is this logic feasible by current API?
> +>
> +> Thanks,
> +> Jianmin
> +>
> +>
> +>
> +>
> +>
> +> ________________________________
> +> From: Sam Judgement <Sampn@wahoo-inc.com>
> +> To: core-user@hadoop.apache.org
> +> Sent: Monday, June 1, 2009 12:26:31 PM
> +> Subject: Re: question about when shuffle/sort start working
> +>
> +> When a Mapper completes, MapCompletionEvents are generated.
> Reducers try to
> +> fetch map outputs for a given map only on the receipt of such events.
> +>
> +> Sam
> +>
> +>
> +> On 5/30/09 10:00 AM, "Jianmin Foo" <jianmin_foo@wahoo.com> wrote:
> +>
> +>> Hi,
> +>> I am being confused by the protocol between mapper and reducer.
> When mapper
> +>> emitting the (key,value) pair done, is there any signal the mapper send
> out
> +>> to
> +>> hadoop framework in protocol to indicate that map is done and the
> +>> shuffle/sort
> +>> can begin for reducer? If there is no this signal in protocol, when the
> +>> framework begin the shuffle/sort?
> +>>
> +>> Thanks,
> +>> Jianmin
> +>>
> +>>
> +>>
> +>>
> +>
> +>
> +>
> +
> +
> 
> Modified: lucene/dev/trunk/solr/contrib/morphlines-core/src/test-
> files/test-documents/email.eml
> URL:
> http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/morphlines-
> core/src/test-files/test-
> documents/email.eml?rev=1570955&r1=1570954&r2=1570955&view=diff
> ==========================================================
> ====================
> --- lucene/dev/trunk/solr/contrib/morphlines-core/src/test-files/test-
> documents/email.eml (original)
> +++ lucene/dev/trunk/solr/contrib/morphlines-core/src/test-files/test-
> documents/email.eml Sun Feb 23 02:22:02 2014
> @@ -1,40 +1,40 @@
> -MIME-Version: 1.0
> -Received: by 10.216.199.5 with HTTP; Wed, 27 Nov 2013 12:01:23 -0800
> -(PST)
> -Date: Wed, 27 Nov 2013 13:01:23 -0700
> -Delivered-To: foo@cloudera.com
> -Message-ID:
> -<CAOi5V169EW4GCfde_aNKSBgqAD=KSPVO6Batw_Oko-
> 8cmAgK6w@mail.gmail.com>
> -Subject: Test EML
> -From: Patrick Foo <foo@cloudera.com>
> -To: Patrick Foo <foo@cloudera.com>
> -Content-Type: multipart/alternative;
> -boundary=001a11c3815cb55dda04ec2e0f3b
> -
> ---001a11c3815cb55dda04ec2e0f3b
> -Content-Type: text/plain; charset=ISO-8859-1
> -
> -This is a test
> -
> ---
> -Patrick Foo
> -Customer Operations Engineer
> -
> -<http://www.cloudera.com>
> -
> ---001a11c3815cb55dda04ec2e0f3b
> -Content-Type: text/html; charset=ISO-8859-1
> -Content-Transfer-Encoding: quoted-printable
> -
> -<div dir=3D"ltr">This is a test<br clear=3D"all"><div><br></div>--
> -<br><div=
> - dir=3D"ltr">Patrick Foo<div>Customer Operations
> -Engineer</div><div><br>=
> -</div><div><a href=3D"http://www.cloudera.com"
> target=3D"_blank"><img
> -src=
> -
> =3D"http://files.cloudera.com.s3.amazonaws.com/New%20Branding/cloude
> ra-smal=
> -l.png"></a><br>
> -</div></div>
> -</div>
> -
> ---001a11c3815cb55dda04ec2e0f3b--
> +MIME-Version: 1.0
> +Received: by 10.216.199.5 with HTTP; Wed, 27 Nov 2013 12:01:23 -0800
> +(PST)
> +Date: Wed, 27 Nov 2013 13:01:23 -0700
> +Delivered-To: foo@cloudera.com
> +Message-ID:
> +<CAOi5V169EW4GCfde_aNKSBgqAD=KSPVO6Batw_Oko-
> 8cmAgK6w@mail.gmail.com>
> +Subject: Test EML
> +From: Patrick Foo <foo@cloudera.com>
> +To: Patrick Foo <foo@cloudera.com>
> +Content-Type: multipart/alternative;
> +boundary=001a11c3815cb55dda04ec2e0f3b
> +
> +--001a11c3815cb55dda04ec2e0f3b
> +Content-Type: text/plain; charset=ISO-8859-1
> +
> +This is a test
> +
> +--
> +Patrick Foo
> +Customer Operations Engineer
> +
> +<http://www.cloudera.com>
> +
> +--001a11c3815cb55dda04ec2e0f3b
> +Content-Type: text/html; charset=ISO-8859-1
> +Content-Transfer-Encoding: quoted-printable
> +
> +<div dir=3D"ltr">This is a test<br clear=3D"all"><div><br></div>--
> +<br><div=
> + dir=3D"ltr">Patrick Foo<div>Customer Operations
> +Engineer</div><div><br>=
> +</div><div><a href=3D"http://www.cloudera.com"
> target=3D"_blank"><img
> +src=
> +=3D"http://files.cloudera.com.s3.amazonaws.com/New%20Branding/cloud
> era-smal=
> +l.png"></a><br>
> +</div></div>
> +</div>
> +
> +--001a11c3815cb55dda04ec2e0f3b--
> 
> Modified: lucene/dev/trunk/solr/contrib/morphlines-core/src/test-
> files/test-documents/rsstest.rss
> URL:
> http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/morphlines-
> core/src/test-files/test-
> documents/rsstest.rss?rev=1570955&r1=1570954&r2=1570955&view=diff
> ==========================================================
> ====================
> --- lucene/dev/trunk/solr/contrib/morphlines-core/src/test-files/test-
> documents/rsstest.rss (original)
> +++ lucene/dev/trunk/solr/contrib/morphlines-core/src/test-files/test-
> documents/rsstest.rss Sun Feb 23 02:22:02 2014
> @@ -1,36 +1,36 @@
> -<?xml version="1.0" encoding="ISO-8859-1" ?>
> -<!--
> -	Licensed to the Apache Software Foundation (ASF) under one or
> more
> -	contributor license agreements.  See the NOTICE file distributed with
> -	this work for additional information regarding copyright ownership.
> -	The ASF licenses this file to You under the Apache License, Version
> 2.0
> -	(the "License"); you may not use this file except in compliance with
> -	the License.  You may obtain a copy of the License at
> -
> -	http://www.apache.org/licenses/LICENSE-2.0
> -
> -	Unless required by applicable law or agreed to in writing, software
> -	distributed under the License is distributed on an "AS IS" BASIS,
> -	WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
> express or implied.
> -	See the License for the specific language governing permissions and
> -	limitations under the License.
> --->
> -<rss version="0.91">
> -    <channel>
> -      <title>TestChannel</title>
> -      <link>http://test.channel.com/</link>
> -      <description>Sample RSS File for Junit test</description>
> -      <language>en-us</language>
> -
> -      <item>
> -        <title>Home Page of Chris Mattmann</title>
> -        <link>http://www-scf.usc.edu/~mattmann/</link>
> -        <description>Chris Mattmann's home page</description>
> -      </item>
> -      <item>
> -        <title>Awesome Open Source Search Engine</title>
> -        <link>http://www.nutch.org/</link>
> -        <description>Yup, that's what it is</description>
> -      </item>
> -   </channel>
> -</rss>
> +<?xml version="1.0" encoding="ISO-8859-1" ?>
> +<!--
> +	Licensed to the Apache Software Foundation (ASF) under one or
> more
> +	contributor license agreements.  See the NOTICE file distributed with
> +	this work for additional information regarding copyright ownership.
> +	The ASF licenses this file to You under the Apache License, Version
> 2.0
> +	(the "License"); you may not use this file except in compliance with
> +	the License.  You may obtain a copy of the License at
> +
> +	http://www.apache.org/licenses/LICENSE-2.0
> +
> +	Unless required by applicable law or agreed to in writing, software
> +	distributed under the License is distributed on an "AS IS" BASIS,
> +	WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
> express or implied.
> +	See the License for the specific language governing permissions and
> +	limitations under the License.
> +-->
> +<rss version="0.91">
> +    <channel>
> +      <title>TestChannel</title>
> +      <link>http://test.channel.com/</link>
> +      <description>Sample RSS File for Junit test</description>
> +      <language>en-us</language>
> +
> +      <item>
> +        <title>Home Page of Chris Mattmann</title>
> +        <link>http://www-scf.usc.edu/~mattmann/</link>
> +        <description>Chris Mattmann's home page</description>
> +      </item>
> +      <item>
> +        <title>Awesome Open Source Search Engine</title>
> +        <link>http://www.nutch.org/</link>
> +        <description>Yup, that's what it is</description>
> +      </item>
> +   </channel>
> +</rss>
> 
> Modified: lucene/dev/trunk/solr/contrib/morphlines-core/src/test-
> files/test-documents/sample-statuses-20120906-141433
> URL:
> http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/morphlines-
> core/src/test-files/test-documents/sample-statuses-20120906-
> 141433?rev=1570955&r1=1570954&r2=1570955&view=diff
> ==========================================================
> ====================
> --- lucene/dev/trunk/solr/contrib/morphlines-core/src/test-files/test-
> documents/sample-statuses-20120906-141433 (original)
> +++ lucene/dev/trunk/solr/contrib/morphlines-core/src/test-files/test-
> documents/sample-statuses-20120906-141433 Sun Feb 23 02:22:02 2014
> @@ -1,4 +1,4 @@
> -1000
> -{"text":"sample tweet
> one","retweet_count":0,"in_reply_to_user_id":null,"retweeted":false,"trun
> cated":false,"source":"href=\"http:\/\/sample.com\"","id_str":"1234567891"
> ,"entities":{"user_mentions":[],"hashtags":[],"urls":[]},"in_reply_to_status_i
> d":null,"place":null,"in_reply_to_status_id_str":null,"coordinates":null,"creat
> ed_at":"Wed Sep 05 01:01:01 +0000
> 1985","in_reply_to_screen_name":null,"favorited":false,"in_reply_to_user_
> id_str":null,"user":{"default_profile_image":false,"friends_count":111,"profil
> e_background_color":"3C0C29","location":"Palo
> Alto","is_translator":false,"profile_background_tile":true,"favourites_count"
> :11,"verified":false,"profile_sidebar_fill_color":"efefef","follow_request_se
> nt":null,"contributors_enabled":false,"description":"desc1","profile_sidebar
> _border_color":"eeeeee","profile_image_url_https":"https:\/\/si0.twimg.co
> m\/profile_images\/1\/normal.jpg","id_str":"1111111","listed_count":1,"lan
> g":"en","screen_name":"fake_user1","show_all_inline_media":fals
> 
> e,"profile_use_background_image":true,"profile_image_url":"http:\/\/a0.t
> wimg.com\/profile_images\/1111111\/normal.jpg","default_profile":false,"s
> tatuses_count":11111,"created_at":"Thu Apr 07 11:04:54 +0000
> 1985","profile_text_color":"333333","followers_count":111,"protected":false
> ,"following":null,"notifications":null,"profile_background_image_url":"http:\
> /\/a0.twimg.com\/images\/themes\/theme1\/bg.gif","time_zone":null,"url"
> :null,"name":"name1","geo_enabled":false,"profile_link_color":"009999","id
> ":1111112,"profile_background_image_url_https":"https:\/\/si0.twimg.com\
> /images\/themes\/theme1\/bg.gif","utc_offset":null},"id":11111112,"contri
> butors":null,"geo":null}
> -2000
> +1000
> +{"text":"sample tweet
> one","retweet_count":0,"in_reply_to_user_id":null,"retweeted":false,"trun
> cated":false,"source":"href=\"http:\/\/sample.com\"","id_str":"1234567891"
> ,"entities":{"user_mentions":[],"hashtags":[],"urls":[]},"in_reply_to_status_i
> d":null,"place":null,"in_reply_to_status_id_str":null,"coordinates":null,"creat
> ed_at":"Wed Sep 05 01:01:01 +0000
> 1985","in_reply_to_screen_name":null,"favorited":false,"in_reply_to_user_
> id_str":null,"user":{"default_profile_image":false,"friends_count":111,"profil
> e_background_color":"3C0C29","location":"Palo
> Alto","is_translator":false,"profile_background_tile":true,"favourites_count"
> :11,"verified":false,"profile_sidebar_fill_color":"efefef","follow_request_se
> nt":null,"contributors_enabled":false,"description":"desc1","profile_sidebar
> _border_color":"eeeeee","profile_image_url_https":"https:\/\/si0.twimg.co
> m\/profile_images\/1\/normal.jpg","id_str":"1111111","listed_count":1,"lan
> g":"en","screen_name":"fake_user1","show_all_inline_media":fals
> 
> e,"profile_use_background_image":true,"profile_image_url":"http:\/\/a0.t
> wimg.com\/profile_images\/1111111\/normal.jpg","default_profile":false,"s
> tatuses_count":11111,"created_at":"Thu Apr 07 11:04:54 +0000
> 1985","profile_text_color":"333333","followers_count":111,"protected":false
> ,"following":null,"notifications":null,"profile_background_image_url":"http:\
> /\/a0.twimg.com\/images\/themes\/theme1\/bg.gif","time_zone":null,"url"
> :null,"name":"name1","geo_enabled":false,"profile_link_color":"009999","id
> ":1111112,"profile_background_image_url_https":"https:\/\/si0.twimg.com\
> /images\/themes\/theme1\/bg.gif","utc_offset":null},"id":11111112,"contri
> butors":null,"geo":null}
> +2000
>  {"text":"sample tweet
> two","retweet_count":0,"in_reply_to_user_id":null,"retweeted":false,"trun
> cated":false,"source":"href=\"http:\/\/sample.com\"","id_str":"2345678902"
> ,"entities":{"user_mentions":[],"hashtags":[],"urls":[]},"in_reply_to_status_i
> d":null,"place":null,"in_reply_to_status_id_str":null,"coordinates":null,"creat
> ed_at":"Wed Sep 05 02:14:34 +0000
> 1985","in_reply_to_screen_name":null,"favorited":false,"in_reply_to_user_
> id_str":null,"user":{"default_profile_image":false,"friends_count":222,"profil
> e_background_color":"3C0C29","location":"San
> Francisco","is_translator":false,"profile_background_tile":false,"favourites_c
> ount":22,"verified":false,"profile_sidebar_fill_color":"B2D948","follow_requ
> est_sent":null,"contributors_enabled":false,"description":"desc2","profile_si
> debar_border_color":"8EC63D","profile_image_url_https":"https:\/\/si0.twi
> mg.com\/profile_images\/22222222\/image_normal.jpg","id_str":"2222222",
> "listed_count":0,"lang":"en","screen_name":"fake_user2","show_all_
> 
> inline_media":false,"profile_use_background_image":true,"profile_image_u
> rl":"http:\/\/a0.twimg.com\/profile_images\/2222222\/image_normal.jpg","
> default_profile":false,"statuses_count":222222,"created_at":"Thu Aug 04
> 11:33:28 +0000
> 1985","profile_text_color":"444444","followers_count":222,"protected":false
> ,"following":null,"notifications":null,"profile_background_image_url":"http:\
> /\/a0.twimg.com\/profile_background_images\/222222\/222222.jpg","time_
> zone":"Central Time (US &
> Canada)","url":null,"name":"name2","geo_enabled":false,"profile_link_color
> ":"9A0057","id":2222223,"profile_background_image_url_https":"https:\/\/si
> 0.twimg.com\/profile_background_images\/2222222\/22222.jpg","utc_offse
> t":-21600},"id":222223,"contributors":null,"geo":null}
> \ No newline at end of file
> 
> Modified: lucene/dev/trunk/solr/contrib/morphlines-core/src/test-
> files/test-documents/testEMLX.emlx
> URL:
> http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/morphlines-
> core/src/test-files/test-
> documents/testEMLX.emlx?rev=1570955&r1=1570954&r2=1570955&view=di
> ff
> ==========================================================
> ====================
> --- lucene/dev/trunk/solr/contrib/morphlines-core/src/test-files/test-
> documents/testEMLX.emlx (original)
> +++ lucene/dev/trunk/solr/contrib/morphlines-core/src/test-files/test-
> documents/testEMLX.emlx Sun Feb 23 02:22:02 2014
> @@ -1,72 +1,72 @@
> -<!--
> - Licensed to the Apache Software Foundation (ASF) under one or more
> - contributor license agreements.  See the NOTICE file distributed with
> - this work for additional information regarding copyright ownership.
> - The ASF licenses this file to You under the Apache License, Version 2.0
> - (the "License"); you may not use this file except in compliance with
> - the License.  You may obtain a copy of the License at
> -
> -     http://www.apache.org/licenses/LICENSE-2.0
> -
> - Unless required by applicable law or agreed to in writing, software
> - distributed under the License is distributed on an "AS IS" BASIS,
> - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
> implied.
> - See the License for the specific language governing permissions and
> - limitations under the License.
> --->
> -
> -1795
> -From: "Julien Nioche (JIRA)" <jira@apache.org>
> -To: dev@tika.apache.org
> -Subject: [jira] Commented: (TIKA-461) RFC822 messages not parsed
> -Reply-To: dev@tika.apache.org
> -Delivered-To: mailing list dev@tika.apache.org
> -Date: Mon, 6 Sep 2010 05:25:34 -0400 (EDT)
> -In-Reply-To: <6089099.260231278600349994.JavaMail.jira@thor>
> -MIME-Version: 1.0
> -Content-Type: text/plain; charset=utf-8
> -Content-Transfer-Encoding: 7bit
> -X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394
> -X-Virus-Checked: Checked by ClamAV on apache.org
> -
> -
> -    [ https://issues.apache.org/jira/browse/TIKA-
> 461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-
> tabpanel&focusedCommentId=12906468#action_12906468 ]
> -
> -Julien Nioche commented on TIKA-461:
> -------------------------------------
> -
> -I'll have a look at mime4j and try to use it in Tika
> -
> -> RFC822 messages not parsed
> -> --------------------------
> ->
> ->                 Key: TIKA-461
> ->                 URL: https://issues.apache.org/jira/browse/TIKA-461
> ->             Project: Tika
> ->          Issue Type: Bug
> ->          Components: parser
> ->    Affects Versions: 0.7
> ->            Reporter: Joshua Turner
> ->            Assignee: Julien Nioche
> ->
> -> Presented with an RFC822 message exported from Thunderbird,
> AutodetectParser produces an empty body, and a Metadata containing only
> one key-value pair: "Content-Type=message/rfc822". Directly calling
> MboxParser likewise gives an empty body, but with two metadata pairs:
> "Content-Encoding=us-ascii Content-Type=application/mbox".
> -> A quick peek at the source of MboxParser shows that the implementation
> is pretty naive. If the wiring can be sorted out, something like Apache James'
> mime4j might be a better bet.
> -
> ---
> -This message is automatically generated by JIRA.
> --
> -You can reply to this email to add a comment to the issue online.
> -
> -<?xml version="1.0" encoding="UTF-8"?>
> -<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN"
> "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
> -<plist version="1.0">
> -<dict>
> -	<key>flags</key>
> -	<integer>0</integer>
> -	<key>sender</key>
> -	<string>"Julien Nioche (JIRA)" &lt;jira@apache.org&gt;</string>
> -	<key>subject</key>
> -	<string>[jira] Commented: (TIKA-461) RFC822 messages not
> parsed</string>
> -	<key>to</key>
> -	<string>dev@tika.apache.org</string></dict>
> -</plist>
> +<!--
> + Licensed to the Apache Software Foundation (ASF) under one or more
> + contributor license agreements.  See the NOTICE file distributed with
> + this work for additional information regarding copyright ownership.
> + The ASF licenses this file to You under the Apache License, Version 2.0
> + (the "License"); you may not use this file except in compliance with
> + the License.  You may obtain a copy of the License at
> +
> +     http://www.apache.org/licenses/LICENSE-2.0
> +
> + Unless required by applicable law or agreed to in writing, software
> + distributed under the License is distributed on an "AS IS" BASIS,
> + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
> implied.
> + See the License for the specific language governing permissions and
> + limitations under the License.
> +-->
> +
> +1795
> +From: "Julien Nioche (JIRA)" <jira@apache.org>
> +To: dev@tika.apache.org
> +Subject: [jira] Commented: (TIKA-461) RFC822 messages not parsed
> +Reply-To: dev@tika.apache.org
> +Delivered-To: mailing list dev@tika.apache.org
> +Date: Mon, 6 Sep 2010 05:25:34 -0400 (EDT)
> +In-Reply-To: <6089099.260231278600349994.JavaMail.jira@thor>
> +MIME-Version: 1.0
> +Content-Type: text/plain; charset=utf-8
> +Content-Transfer-Encoding: 7bit
> +X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394
> +X-Virus-Checked: Checked by ClamAV on apache.org
> +
> +
> +    [ https://issues.apache.org/jira/browse/TIKA-
> 461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-
> tabpanel&focusedCommentId=12906468#action_12906468 ]
> +
> +Julien Nioche commented on TIKA-461:
> +------------------------------------
> +
> +I'll have a look at mime4j and try to use it in Tika
> +
> +> RFC822 messages not parsed
> +> --------------------------
> +>
> +>                 Key: TIKA-461
> +>                 URL: https://issues.apache.org/jira/browse/TIKA-461
> +>             Project: Tika
> +>          Issue Type: Bug
> +>          Components: parser
> +>    Affects Versions: 0.7
> +>            Reporter: Joshua Turner
> +>            Assignee: Julien Nioche
> +>
> +> Presented with an RFC822 message exported from Thunderbird,
> AutodetectParser produces an empty body, and a Metadata containing only
> one key-value pair: "Content-Type=message/rfc822". Directly calling
> MboxParser likewise gives an empty body, but with two metadata pairs:
> "Content-Encoding=us-ascii Content-Type=application/mbox".
> +> A quick peek at the source of MboxParser shows that the implementation
> is pretty naive. If the wiring can be sorted out, something like Apache James'
> mime4j might be a better bet.
> +
> +--
> +This message is automatically generated by JIRA.
> +-
> +You can reply to this email to add a comment to the issue online.
> +
> +<?xml version="1.0" encoding="UTF-8"?>
> +<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN"
> "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
> +<plist version="1.0">
> +<dict>
> +	<key>flags</key>
> +	<integer>0</integer>
> +	<key>sender</key>
> +	<string>"Julien Nioche (JIRA)" &lt;jira@apache.org&gt;</string>
> +	<key>subject</key>
> +	<string>[jira] Commented: (TIKA-461) RFC822 messages not
> parsed</string>
> +	<key>to</key>
> +	<string>dev@tika.apache.org</string></dict>
> +</plist>
> 
> Modified: lucene/dev/trunk/solr/contrib/morphlines-core/src/test-
> files/test-documents/testRFC822
> URL:
> http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/morphlines-
> core/src/test-files/test-
> documents/testRFC822?rev=1570955&r1=1570954&r2=1570955&view=diff
> ==========================================================
> ====================
> --- lucene/dev/trunk/solr/contrib/morphlines-core/src/test-files/test-
> documents/testRFC822 (original)
> +++ lucene/dev/trunk/solr/contrib/morphlines-core/src/test-files/test-
> documents/testRFC822 Sun Feb 23 02:22:02 2014
> @@ -1,41 +1,41 @@
> -From: "Julien Nioche (JIRA)" <jira@apache.org>
> -To: dev@tika.apache.org
> -Subject: [jira] Commented: (TIKA-461) RFC822 messages not parsed
> -Reply-To: dev@tika.apache.org
> -Delivered-To: mailing list dev@tika.apache.org
> -Date: Mon, 6 Sep 2010 05:25:34 -0400 (EDT)
> -In-Reply-To: <6089099.260231278600349994.JavaMail.jira@thor>
> -MIME-Version: 1.0
> -Content-Type: text/plain; charset=utf-8
> -Content-Transfer-Encoding: 7bit
> -X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394
> -X-Virus-Checked: Checked by ClamAV on apache.org
> -
> -
> -    [ https://issues.apache.org/jira/browse/TIKA-
> 461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-
> tabpanel&focusedCommentId=12906468#action_12906468 ]
> -
> -Julien Nioche commented on TIKA-461:
> -------------------------------------
> -
> -I'll have a look at mime4j and try to use it in Tika
> -
> -> RFC822 messages not parsed
> -> --------------------------
> ->
> ->                 Key: TIKA-461
> ->                 URL: https://issues.apache.org/jira/browse/TIKA-461
> ->             Project: Tika
> ->          Issue Type: Bug
> ->          Components: parser
> ->    Affects Versions: 0.7
> ->            Reporter: Joshua Turner
> ->            Assignee: Julien Nioche
> ->
> -> Presented with an RFC822 message exported from Thunderbird,
> AutodetectParser produces an empty body, and a Metadata containing only
> one key-value pair: "Content-Type=message/rfc822". Directly calling
> MboxParser likewise gives an empty body, but with two metadata pairs:
> "Content-Encoding=us-ascii Content-Type=application/mbox".
> -> A quick peek at the source of MboxParser shows that the implementation
> is pretty naive. If the wiring can be sorted out, something like Apache James'
> mime4j might be a better bet.
> -
> ---
> -This message is automatically generated by JIRA.
> --
> -You can reply to this email to add a comment to the issue online.
> -
> +From: "Julien Nioche (JIRA)" <jira@apache.org>
> +To: dev@tika.apache.org
> +Subject: [jira] Commented: (TIKA-461) RFC822 messages not parsed
> +Reply-To: dev@tika.apache.org
> +Delivered-To: mailing list dev@tika.apache.org
> +Date: Mon, 6 Sep 2010 05:25:34 -0400 (EDT)
> +In-Reply-To: <6089099.260231278600349994.JavaMail.jira@thor>
> +MIME-Version: 1.0
> +Content-Type: text/plain; charset=utf-8
> +Content-Transfer-Encoding: 7bit
> +X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394
> +X-Virus-Checked: Checked by ClamAV on apache.org
> +
> +
> +    [ https://issues.apache.org/jira/browse/TIKA-
> 461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-
> tabpanel&focusedCommentId=12906468#action_12906468 ]
> +
> +Julien Nioche commented on TIKA-461:
> +------------------------------------
> +
> +I'll have a look at mime4j and try to use it in Tika
> +
> +> RFC822 messages not parsed
> +> --------------------------
> +>
> +>                 Key: TIKA-461
> +>                 URL: https://issues.apache.org/jira/browse/TIKA-461
> +>             Project: Tika
> +>          Issue Type: Bug
> +>          Components: parser
> +>    Affects Versions: 0.7
> +>            Reporter: Joshua Turner
> +>            Assignee: Julien Nioche
> +>
> +> Presented with an RFC822 message exported from Thunderbird,
> AutodetectParser produces an empty body, and a Metadata containing only
> one key-value pair: "Content-Type=message/rfc822". Directly calling
> MboxParser likewise gives an empty body, but with two metadata pairs:
> "Content-Encoding=us-ascii Content-Type=application/mbox".
> +> A quick peek at the source of MboxParser shows that the implementation
> is pretty naive. If the wiring can be sorted out, something like Apache James'
> mime4j might be a better bet.
> +
> +--
> +This message is automatically generated by JIRA.
> +-
> +You can reply to this email to add a comment to the issue online.
> +



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message