Return-Path: Delivered-To: apmail-lucene-general-archive@www.apache.org Received: (qmail 11274 invoked from network); 2 Apr 2010 17:26:18 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 2 Apr 2010 17:26:18 -0000 Received: (qmail 70691 invoked by uid 500); 2 Apr 2010 17:26:18 -0000 Delivered-To: apmail-lucene-general-archive@lucene.apache.org Received: (qmail 70620 invoked by uid 500); 2 Apr 2010 17:26:18 -0000 Mailing-List: contact general-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@lucene.apache.org Delivered-To: mailing list general@lucene.apache.org Received: (qmail 70610 invoked by uid 99); 2 Apr 2010 17:26:18 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 Apr 2010 17:26:18 +0000 X-ASF-Spam-Status: No, hits=-2.2 required=10.0 tests=AWL,HTML_MESSAGE,RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [128.149.139.106] (HELO mail.jpl.nasa.gov) (128.149.139.106) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 Apr 2010 17:26:12 +0000 Received: from mail.jpl.nasa.gov (altvirehtstap01.jpl.nasa.gov [128.149.137.72]) by smtp.jpl.nasa.gov (Switch-3.4.2/Switch-3.4.1) with ESMTP id o32HPoW6024740 (using TLSv1/SSLv3 with cipher RC4-MD5 (128 bits) verified FAIL) for ; Fri, 2 Apr 2010 10:25:51 -0700 Received: from ALTPHYEMBEVSP20.RES.AD.JPL ([172.16.0.21]) by ALTVIREHTSTAP01.RES.AD.JPL ([128.149.137.72]) with mapi; Fri, 2 Apr 2010 10:25:51 -0700 From: "Mattmann, Chris A (388J)" To: "general@lucene.apache.org" Date: Fri, 2 Apr 2010 10:25:45 -0700 Subject: Re: [VOTE] Apache Tika 0.7 Release Candidate #1 Thread-Topic: [VOTE] Apache Tika 0.7 Release Candidate #1 Thread-Index: AcrRDQRG5k6V0oOwokymhqtWq1P/OgBcN+fAAAK4/kAAAC/GgA== Message-ID: In-Reply-To: <001e01cad288$d3301790$799046b0$@de> Accept-Language: en-US Content-Language: en X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: multipart/alternative; boundary="_000_C7DB76A9FA67ChrisAMattmannjplnasagov_" MIME-Version: 1.0 X-Source-IP: altvirehtstap01.jpl.nasa.gov [128.149.137.72] X-Source-Sender: chris.a.mattmann@jpl.nasa.gov X-AUTH: Authorized --_000_C7DB76A9FA67ChrisAMattmannjplnasagov_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Thanks, Uwe. We'll certainly address it going forward. I'll also throw up the .sha1 file= s as Jukka and you suggested. One question - all I have to do with the *.md= 5 is add a * in the .md5 file before the filename, right? Then it will work= on windows without forcing -binary? Cheers, Chris P.S. Welcome to having a PMC vote ^_^! On 4/2/10 10:20 AM, "Uwe Schindler" wrote: I opened https://issues.apache.org/jira/browse/TIKA-398 for the test proble= m. ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: uwe@thetaphi.de > -----Original Message----- > From: Uwe Schindler [mailto:uwe@thetaphi.de] > Sent: Friday, April 02, 2010 6:31 PM > To: general@lucene.apache.org > Subject: RE: [VOTE] Apache Tika 0.7 Release Candidate #1 > > Hi, > > I checked: > > * signature of the src zip - OK. > > * md5 sum, is ok, but in windows I had a problem because the md5 > signature file has no "*" before the file name, which means that the > signature is non-binary: > "The sums are computed as described in RFC 1321. When checking, the > input > should be a former output of this program. The default mode is to > print > a line with checksum, a character indicating type (`*' for binary, ` ' > for > text), and name for each FILE." > So I had to force md5sum to binary mode with --binary. > > * mvn install call was unsuccessful, but one test failed (java > 1.5.0_22, 64bit, Win7): > > Running org.apache.tika.TestParsers > Tests run: 12, Failures: 0, Errors: 12, Skipped: 0, Time elapsed: 0.265 > sec <<< FAILURE! > > Tests in error: > testPDFExtraction(org.apache.tika.TestParsers) > testTXTExtraction(org.apache.tika.TestParsers) > testRTFExtraction(org.apache.tika.TestParsers) > testXMLExtraction(org.apache.tika.TestParsers) > testPPTExtraction(org.apache.tika.TestParsers) > testWORDxtraction(org.apache.tika.TestParsers) > testEXCELExtraction(org.apache.tika.TestParsers) > testOOExtraction(org.apache.tika.TestParsers) > testOutlookExtraction(org.apache.tika.TestParsers) > testHTMLExtraction(org.apache.tika.TestParsers) > testZipFileExtraction(org.apache.tika.TestParsers) > testMP3Extraction(org.apache.tika.TestParsers) > > Tests run: 127, Failures: 0, Errors: 12, Skipped: 0 > > All errors look like that: > name=3D"testOutlookExtraction"> > message=3D"C:\Users\Uwe%20Schindler\Desktop\tika-0.7\tika- > parsers\target\test-classes\test-documents\test-outlook.msg (Das System > kann den angegebenen Pfad nicht finden)">java.io.FileNotFoundException: > C:\Users\Uwe%20Schindler\Desktop\tika-0.7\tika-parsers\target\test- > classes\test-documents\test-outlook.msg (Das System kann den > angegebenen Pfad nicht finden) at java.io.FileInputStream.open(Native > Method) at java.io.FileInputStream.(FileInputStream.java:106) at > org.apache.tika.utils.ParseUtils.getStringContent(ParseUtils.java:167) > at > org.apache.tika.utils.ParseUtils.getStringContent(ParseUtils.java:188) > at > org.apache.tika.TestParsers.testOutlookExtraction(TestParsers.java:137) > > > > If this is caused by the whitespace in my windows user's directory, it > should maybe fixed like in Lucene's tests (we had a similar problem > there, too). If you search for test files in test's classpath and open > them using the Class.getResource() method and converting the URL to a > patch, you should not simply use the getPath() method from the URL as > this exactly creates those wrong filenames. The fix is in > LuceneTestCase(J4).java in Lucene's classes (method getDataFile()). You > should convert the URL to an URI and create the File instance using > "new File(url.toURI())". This is the "correct" way to convert a URL to > a file system path. > > Should I open a test bug report? > > This is not release critical, so I think you can release with this bug, > as it only affects tests. > > * I downloaded the repository folder and checked all signatures using > 'find . -name "*.asc" | xargs -L1 gpg --verify' - OK. > > I am +1 as a new Lucene PMC member (although the is the test bug), but > in my opinion, you should fix the md5 signatures and possibly add sha1 > signatures before release. Just check that the sums inside the files > are identical. > > Uwe > > ----- > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: uwe@thetaphi.de > > > > -----Original Message----- > > From: Mattmann, Chris A (388J) [mailto:chris.a.mattmann@jpl.nasa.gov] > > Sent: Wednesday, March 31, 2010 10:02 PM > > To: Lucene mailing list > > Cc: tika-dev@lucene.apache.org > > Subject: [VOTE] Apache Tika 0.7 Release Candidate #1 > > > > Hi Folks, > > > > I have posted a candidate for the Apache Tika 0.7 release. The source > > code > > is at: > > > > http://people.apache.org/~mattmann/apache-tika-0.7/rc1/ > > > > See the included CHANGES.txt file for details on release contents and > > latest > > changes. The release was made using the Maven2 release plugin, > > according to > > Jukka Zitting's notes: > > > > http://tinyurl.com/yz2cqls > > > > This plugin creates a Tika 0.7 tag at: > > > > http://svn.apache.org/repos/asf/lucene/tika/tags/0.7/ > > > > And a staged M2 repository at repository.apache.org, here: > > > > https://repository.apache.org/content/repositories/orgapachetika-001/ > > > > Please vote on releasing these packages as Apache Tika 0.7. The vote > is > > open > > for the next 72 hours. Only votes from Lucene PMC are binding, but > > everyone > > is welcome to check the release candidate and voice their approval or > > disapproval. The vote passes if at least three binding +1 votes are > > cast. > > > > [ ] +1 Release the packages as Apache Tika 0.7. > > > > [ ] -1 Do not release the packages because... > > > > Thanks! > > > > Cheers, > > Chris > > > > P.S. Note, this will likely be the *last* Tika release under the > Lucene > > umbrella since we've VOTE'd to turn Tika into a TLP. Thanks for > > participation over the years from the Lucene PMC and others in the > > Lucene > > community! > > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > Chris Mattmann, Ph.D. > > Senior Computer Scientist > > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > > Office: 171-266B, Mailstop: 171-246 > > Email: Chris.Mattmann@jpl.nasa.gov > > WWW: http://sunset.usc.edu/~mattmann/ > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > Adjunct Assistant Professor, Computer Science Department > > University of Southern California, Los Angeles, CA 90089 USA > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: Chris.Mattmann@jpl.nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ --_000_C7DB76A9FA67ChrisAMattmannjplnasagov_--