lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: [VOTE] Apache Tika 0.7 Release Candidate #1
Date Fri, 02 Apr 2010 17:30:42 GMT
Yes, a * before filename is the correct way. Or use md5sum --binary when creating them - the
same applies for sha1 files. ANT does it correctly, not sure how you create the md5's with
maven?

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Mattmann, Chris A (388J) [mailto:chris.a.mattmann@jpl.nasa.gov]
> Sent: Friday, April 02, 2010 7:26 PM
> To: general@lucene.apache.org
> Subject: Re: [VOTE] Apache Tika 0.7 Release Candidate #1
> 
> Thanks, Uwe.
> 
> We'll certainly address it going forward. I'll also throw up the .sha1
> files as Jukka and you suggested. One question - all I have to do with
> the *.md5 is add a * in the .md5 file before the filename, right? Then
> it will work on windows without forcing -binary?
> 
> Cheers,
> Chris
> 
> P.S. Welcome to having a PMC vote ^_^!
> 
> On 4/2/10 10:20 AM, "Uwe Schindler" <uwe@thetaphi.de> wrote:
> 
> I opened https://issues.apache.org/jira/browse/TIKA-398 for the test
> problem.
> 
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
> 
> 
> > -----Original Message-----
> > From: Uwe Schindler [mailto:uwe@thetaphi.de]
> > Sent: Friday, April 02, 2010 6:31 PM
> > To: general@lucene.apache.org
> > Subject: RE: [VOTE] Apache Tika 0.7 Release Candidate #1
> >
> > Hi,
> >
> > I checked:
> >
> > * signature of the src zip - OK.
> >
> > * md5 sum, is ok, but in windows I had a problem because the md5
> > signature file has no "*" before the file name, which means that the
> > signature is non-binary:
> > "The sums are computed as described in RFC 1321.  When checking, the
> > input
> > should be a former output of this program.  The default mode is to
> > print
> > a line with checksum, a character indicating type (`*' for binary, `
> '
> > for
> > text), and name for each FILE."
> > So I had to force md5sum to binary mode with --binary.
> >
> > * mvn install call was unsuccessful, but one test failed (java
> > 1.5.0_22, 64bit, Win7):
> >
> > Running org.apache.tika.TestParsers
> > Tests run: 12, Failures: 0, Errors: 12, Skipped: 0, Time elapsed:
> 0.265
> > sec <<< FAILURE!
> >
> > Tests in error:
> >   testPDFExtraction(org.apache.tika.TestParsers)
> >   testTXTExtraction(org.apache.tika.TestParsers)
> >   testRTFExtraction(org.apache.tika.TestParsers)
> >   testXMLExtraction(org.apache.tika.TestParsers)
> >   testPPTExtraction(org.apache.tika.TestParsers)
> >   testWORDxtraction(org.apache.tika.TestParsers)
> >   testEXCELExtraction(org.apache.tika.TestParsers)
> >   testOOExtraction(org.apache.tika.TestParsers)
> >   testOutlookExtraction(org.apache.tika.TestParsers)
> >   testHTMLExtraction(org.apache.tika.TestParsers)
> >   testZipFileExtraction(org.apache.tika.TestParsers)
> >   testMP3Extraction(org.apache.tika.TestParsers)
> >
> > Tests run: 127, Failures: 0, Errors: 12, Skipped: 0
> >
> > All errors look like that:
> > <testcase classname="org.apache.tika.TestParsers" time="0.015"
> > name="testOutlookExtraction">
> >   <error type="java.io.FileNotFoundException"
> > message="C:\Users\Uwe%20Schindler\Desktop\tika-0.7\tika-
> > parsers\target\test-classes\test-documents\test-outlook.msg (Das
> System
> > kann den angegebenen Pfad nicht
> finden)">java.io.FileNotFoundException:
> > C:\Users\Uwe%20Schindler\Desktop\tika-0.7\tika-parsers\target\test-
> > classes\test-documents\test-outlook.msg (Das System kann den
> > angegebenen Pfad nicht finden) at java.io.FileInputStream.open(Native
> > Method) at java.io.FileInputStream.<init>(FileInputStream.java:106)
> at
> >
> org.apache.tika.utils.ParseUtils.getStringContent(ParseUtils.java:167)
> > at
> >
> org.apache.tika.utils.ParseUtils.getStringContent(ParseUtils.java:188)
> > at
> >
> org.apache.tika.TestParsers.testOutlookExtraction(TestParsers.java:137)
> > </error>
> >   </testcase>
> >
> > If this is caused by the whitespace in my windows user's directory,
> it
> > should maybe fixed like in Lucene's tests (we had a similar problem
> > there, too). If you search for test files in test's classpath and
> open
> > them using the Class.getResource() method and converting the URL to a
> > patch, you should not simply use the getPath() method from the URL as
> > this exactly creates those wrong filenames. The fix is in
> > LuceneTestCase(J4).java in Lucene's classes (method getDataFile()).
> You
> > should convert the URL to an URI and create the File instance using
> > "new File(url.toURI())". This is the "correct" way to convert a URL
> to
> > a file system path.
> >
> > Should I open a test bug report?
> >
> > This is not release critical, so I think you can release with this
> bug,
> > as it only affects tests.
> >
> > * I downloaded the repository folder and checked all signatures using
> > 'find . -name "*.asc" | xargs -L1 gpg --verify' - OK.
> >
> > I am +1 as a new Lucene PMC member (although the is the test bug),
> but
> > in my opinion, you should fix the md5 signatures and possibly add
> sha1
> > signatures before release. Just check that the sums inside the files
> > are identical.
> >
> > Uwe
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: uwe@thetaphi.de
> >
> >
> > > -----Original Message-----
> > > From: Mattmann, Chris A (388J)
> [mailto:chris.a.mattmann@jpl.nasa.gov]
> > > Sent: Wednesday, March 31, 2010 10:02 PM
> > > To: Lucene mailing list
> > > Cc: tika-dev@lucene.apache.org
> > > Subject: [VOTE] Apache Tika 0.7 Release Candidate #1
> > >
> > > Hi Folks,
> > >
> > > I have posted a candidate for the Apache Tika 0.7 release. The
> source
> > > code
> > > is at:
> > >
> > > http://people.apache.org/~mattmann/apache-tika-0.7/rc1/
> > >
> > > See the included CHANGES.txt file for details on release contents
> and
> > > latest
> > > changes. The release was made using the Maven2 release plugin,
> > > according to
> > > Jukka Zitting's notes:
> > >
> > > http://tinyurl.com/yz2cqls
> > >
> > > This plugin creates a Tika 0.7 tag at:
> > >
> > > http://svn.apache.org/repos/asf/lucene/tika/tags/0.7/
> > >
> > > And a staged M2 repository at repository.apache.org, here:
> > >
> > > https://repository.apache.org/content/repositories/orgapachetika-
> 001/
> > >
> > > Please vote on releasing these packages as Apache Tika 0.7. The
> vote
> > is
> > > open
> > > for the next 72 hours. Only votes from Lucene PMC are binding, but
> > > everyone
> > > is welcome to check the release candidate and voice their approval
> or
> > > disapproval. The vote passes if at least three binding +1 votes are
> > > cast.
> > >
> > > [ ] +1 Release the packages as Apache Tika 0.7.
> > >
> > > [ ] -1 Do not release the packages because...
> > >
> > > Thanks!
> > >
> > > Cheers,
> > > Chris
> > >
> > > P.S. Note, this will likely be the *last* Tika release under the
> > Lucene
> > > umbrella since we've VOTE'd to turn Tika into a TLP. Thanks for
> > > participation over the years from the Lucene PMC and others in the
> > > Lucene
> > > community!
> > >
> > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > > Chris Mattmann, Ph.D.
> > > Senior Computer Scientist
> > > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> > > Office: 171-266B, Mailstop: 171-246
> > > Email: Chris.Mattmann@jpl.nasa.gov
> > > WWW:   http://sunset.usc.edu/~mattmann/
> > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > > Adjunct Assistant Professor, Computer Science Department
> > > University of Southern California, Los Angeles, CA 90089 USA
> > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > >
> >
> 
> 
> 
> 
> 
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: Chris.Mattmann@jpl.nasa.gov
> WWW:   http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++



Mime
View raw message