lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: [VOTE] Apache Tika 0.7 Release Candidate #1
Date Fri, 02 Apr 2010 17:35:29 GMT
By the way here isan issue from another project facing the same problem:

https://fedorahosted.org/rel-eng/ticket/719

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Mattmann, Chris A (388J) [mailto:chris.a.mattmann@jpl.nasa.gov]
> Sent: Friday, April 02, 2010 7:33 PM
> To: general@lucene.apache.org
> Subject: Re: [VOTE] Apache Tika 0.7 Release Candidate #1
> 
> Thanks Uwe. I had to do the md5 sum on my own on my Mac OS X 10.5.6 box
> with md5sum installed. I just do md5sum <file> > <file>.md5.
> 
> Cheers,
> Chris
> 
> 
> 
> On 4/2/10 10:30 AM, "Uwe Schindler" <uwe@thetaphi.de> wrote:
> 
> Yes, a * before filename is the correct way. Or use md5sum --binary
> when creating them - the same applies for sha1 files. ANT does it
> correctly, not sure how you create the md5's with maven?
> 
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
> 
> 
> > -----Original Message-----
> > From: Mattmann, Chris A (388J) [mailto:chris.a.mattmann@jpl.nasa.gov]
> > Sent: Friday, April 02, 2010 7:26 PM
> > To: general@lucene.apache.org
> > Subject: Re: [VOTE] Apache Tika 0.7 Release Candidate #1
> >
> > Thanks, Uwe.
> >
> > We'll certainly address it going forward. I'll also throw up the
> .sha1
> > files as Jukka and you suggested. One question - all I have to do
> with
> > the *.md5 is add a * in the .md5 file before the filename, right?
> Then
> > it will work on windows without forcing -binary?
> >
> > Cheers,
> > Chris
> >
> > P.S. Welcome to having a PMC vote ^_^!
> >
> > On 4/2/10 10:20 AM, "Uwe Schindler" <uwe@thetaphi.de> wrote:
> >
> > I opened https://issues.apache.org/jira/browse/TIKA-398 for the test
> > problem.
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: uwe@thetaphi.de
> >
> >
> > > -----Original Message-----
> > > From: Uwe Schindler [mailto:uwe@thetaphi.de]
> > > Sent: Friday, April 02, 2010 6:31 PM
> > > To: general@lucene.apache.org
> > > Subject: RE: [VOTE] Apache Tika 0.7 Release Candidate #1
> > >
> > > Hi,
> > >
> > > I checked:
> > >
> > > * signature of the src zip - OK.
> > >
> > > * md5 sum, is ok, but in windows I had a problem because the md5
> > > signature file has no "*" before the file name, which means that
> the
> > > signature is non-binary:
> > > "The sums are computed as described in RFC 1321.  When checking,
> the
> > > input
> > > should be a former output of this program.  The default mode is to
> > > print
> > > a line with checksum, a character indicating type (`*' for binary,
> `
> > '
> > > for
> > > text), and name for each FILE."
> > > So I had to force md5sum to binary mode with --binary.
> > >
> > > * mvn install call was unsuccessful, but one test failed (java
> > > 1.5.0_22, 64bit, Win7):
> > >
> > > Running org.apache.tika.TestParsers
> > > Tests run: 12, Failures: 0, Errors: 12, Skipped: 0, Time elapsed:
> > 0.265
> > > sec <<< FAILURE!
> > >
> > > Tests in error:
> > >   testPDFExtraction(org.apache.tika.TestParsers)
> > >   testTXTExtraction(org.apache.tika.TestParsers)
> > >   testRTFExtraction(org.apache.tika.TestParsers)
> > >   testXMLExtraction(org.apache.tika.TestParsers)
> > >   testPPTExtraction(org.apache.tika.TestParsers)
> > >   testWORDxtraction(org.apache.tika.TestParsers)
> > >   testEXCELExtraction(org.apache.tika.TestParsers)
> > >   testOOExtraction(org.apache.tika.TestParsers)
> > >   testOutlookExtraction(org.apache.tika.TestParsers)
> > >   testHTMLExtraction(org.apache.tika.TestParsers)
> > >   testZipFileExtraction(org.apache.tika.TestParsers)
> > >   testMP3Extraction(org.apache.tika.TestParsers)
> > >
> > > Tests run: 127, Failures: 0, Errors: 12, Skipped: 0
> > >
> > > All errors look like that:
> > > <testcase classname="org.apache.tika.TestParsers" time="0.015"
> > > name="testOutlookExtraction">
> > >   <error type="java.io.FileNotFoundException"
> > > message="C:\Users\Uwe%20Schindler\Desktop\tika-0.7\tika-
> > > parsers\target\test-classes\test-documents\test-outlook.msg (Das
> > System
> > > kann den angegebenen Pfad nicht
> > finden)">java.io.FileNotFoundException:
> > > C:\Users\Uwe%20Schindler\Desktop\tika-0.7\tika-parsers\target\test-
> > > classes\test-documents\test-outlook.msg (Das System kann den
> > > angegebenen Pfad nicht finden) at
> java.io.FileInputStream.open(Native
> > > Method) at java.io.FileInputStream.<init>(FileInputStream.java:106)
> > at
> > >
> >
> org.apache.tika.utils.ParseUtils.getStringContent(ParseUtils.java:167)
> > > at
> > >
> >
> org.apache.tika.utils.ParseUtils.getStringContent(ParseUtils.java:188)
> > > at
> > >
> >
> org.apache.tika.TestParsers.testOutlookExtraction(TestParsers.java:137)
> > > </error>
> > >   </testcase>
> > >
> > > If this is caused by the whitespace in my windows user's directory,
> > it
> > > should maybe fixed like in Lucene's tests (we had a similar problem
> > > there, too). If you search for test files in test's classpath and
> > open
> > > them using the Class.getResource() method and converting the URL to
> a
> > > patch, you should not simply use the getPath() method from the URL
> as
> > > this exactly creates those wrong filenames. The fix is in
> > > LuceneTestCase(J4).java in Lucene's classes (method getDataFile()).
> > You
> > > should convert the URL to an URI and create the File instance using
> > > "new File(url.toURI())". This is the "correct" way to convert a URL
> > to
> > > a file system path.
> > >
> > > Should I open a test bug report?
> > >
> > > This is not release critical, so I think you can release with this
> > bug,
> > > as it only affects tests.
> > >
> > > * I downloaded the repository folder and checked all signatures
> using
> > > 'find . -name "*.asc" | xargs -L1 gpg --verify' - OK.
> > >
> > > I am +1 as a new Lucene PMC member (although the is the test bug),
> > but
> > > in my opinion, you should fix the md5 signatures and possibly add
> > sha1
> > > signatures before release. Just check that the sums inside the
> files
> > > are identical.
> > >
> > > Uwe
> > >
> > > -----
> > > Uwe Schindler
> > > H.-H.-Meier-Allee 63, D-28213 Bremen
> > > http://www.thetaphi.de
> > > eMail: uwe@thetaphi.de
> > >
> > >
> > > > -----Original Message-----
> > > > From: Mattmann, Chris A (388J)
> > [mailto:chris.a.mattmann@jpl.nasa.gov]
> > > > Sent: Wednesday, March 31, 2010 10:02 PM
> > > > To: Lucene mailing list
> > > > Cc: tika-dev@lucene.apache.org
> > > > Subject: [VOTE] Apache Tika 0.7 Release Candidate #1
> > > >
> > > > Hi Folks,
> > > >
> > > > I have posted a candidate for the Apache Tika 0.7 release. The
> > source
> > > > code
> > > > is at:
> > > >
> > > > http://people.apache.org/~mattmann/apache-tika-0.7/rc1/
> > > >
> > > > See the included CHANGES.txt file for details on release contents
> > and
> > > > latest
> > > > changes. The release was made using the Maven2 release plugin,
> > > > according to
> > > > Jukka Zitting's notes:
> > > >
> > > > http://tinyurl.com/yz2cqls
> > > >
> > > > This plugin creates a Tika 0.7 tag at:
> > > >
> > > > http://svn.apache.org/repos/asf/lucene/tika/tags/0.7/
> > > >
> > > > And a staged M2 repository at repository.apache.org, here:
> > > >
> > > > https://repository.apache.org/content/repositories/orgapachetika-
> > 001/
> > > >
> > > > Please vote on releasing these packages as Apache Tika 0.7. The
> > vote
> > > is
> > > > open
> > > > for the next 72 hours. Only votes from Lucene PMC are binding,
> but
> > > > everyone
> > > > is welcome to check the release candidate and voice their
> approval
> > or
> > > > disapproval. The vote passes if at least three binding +1 votes
> are
> > > > cast.
> > > >
> > > > [ ] +1 Release the packages as Apache Tika 0.7.
> > > >
> > > > [ ] -1 Do not release the packages because...
> > > >
> > > > Thanks!
> > > >
> > > > Cheers,
> > > > Chris
> > > >
> > > > P.S. Note, this will likely be the *last* Tika release under the
> > > Lucene
> > > > umbrella since we've VOTE'd to turn Tika into a TLP. Thanks for
> > > > participation over the years from the Lucene PMC and others in
> the
> > > > Lucene
> > > > community!
> > > >
> > > >
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > > > Chris Mattmann, Ph.D.
> > > > Senior Computer Scientist
> > > > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> > > > Office: 171-266B, Mailstop: 171-246
> > > > Email: Chris.Mattmann@jpl.nasa.gov
> > > > WWW:   http://sunset.usc.edu/~mattmann/
> > > >
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > > > Adjunct Assistant Professor, Computer Science Department
> > > > University of Southern California, Los Angeles, CA 90089 USA
> > > >
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > > >
> > >
> >
> >
> >
> >
> >
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > Chris Mattmann, Ph.D.
> > Senior Computer Scientist
> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> > Office: 171-266B, Mailstop: 171-246
> > Email: Chris.Mattmann@jpl.nasa.gov
> > WWW:   http://sunset.usc.edu/~mattmann/
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > Adjunct Assistant Professor, Computer Science Department
> > University of Southern California, Los Angeles, CA 90089 USA
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 
> 
> 
> 
> 
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: Chris.Mattmann@jpl.nasa.gov
> WWW:   http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++



Mime
View raw message