lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mattmann, Chris A (388J)" <chris.a.mattm...@jpl.nasa.gov>
Subject Re: [VOTE] Apache Tika 0.7 Release Candidate #1
Date Fri, 02 Apr 2010 17:25:45 GMT
Thanks, Uwe.

We'll certainly address it going forward. I'll also throw up the .sha1 files as Jukka and
you suggested. One question - all I have to do with the *.md5 is add a * in the .md5 file
before the filename, right? Then it will work on windows without forcing -binary?

Cheers,
Chris

P.S. Welcome to having a PMC vote ^_^!

On 4/2/10 10:20 AM, "Uwe Schindler" <uwe@thetaphi.de> wrote:

I opened https://issues.apache.org/jira/browse/TIKA-398 for the test problem.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Uwe Schindler [mailto:uwe@thetaphi.de]
> Sent: Friday, April 02, 2010 6:31 PM
> To: general@lucene.apache.org
> Subject: RE: [VOTE] Apache Tika 0.7 Release Candidate #1
>
> Hi,
>
> I checked:
>
> * signature of the src zip - OK.
>
> * md5 sum, is ok, but in windows I had a problem because the md5
> signature file has no "*" before the file name, which means that the
> signature is non-binary:
> "The sums are computed as described in RFC 1321.  When checking, the
> input
> should be a former output of this program.  The default mode is to
> print
> a line with checksum, a character indicating type (`*' for binary, ` '
> for
> text), and name for each FILE."
> So I had to force md5sum to binary mode with --binary.
>
> * mvn install call was unsuccessful, but one test failed (java
> 1.5.0_22, 64bit, Win7):
>
> Running org.apache.tika.TestParsers
> Tests run: 12, Failures: 0, Errors: 12, Skipped: 0, Time elapsed: 0.265
> sec <<< FAILURE!
>
> Tests in error:
>   testPDFExtraction(org.apache.tika.TestParsers)
>   testTXTExtraction(org.apache.tika.TestParsers)
>   testRTFExtraction(org.apache.tika.TestParsers)
>   testXMLExtraction(org.apache.tika.TestParsers)
>   testPPTExtraction(org.apache.tika.TestParsers)
>   testWORDxtraction(org.apache.tika.TestParsers)
>   testEXCELExtraction(org.apache.tika.TestParsers)
>   testOOExtraction(org.apache.tika.TestParsers)
>   testOutlookExtraction(org.apache.tika.TestParsers)
>   testHTMLExtraction(org.apache.tika.TestParsers)
>   testZipFileExtraction(org.apache.tika.TestParsers)
>   testMP3Extraction(org.apache.tika.TestParsers)
>
> Tests run: 127, Failures: 0, Errors: 12, Skipped: 0
>
> All errors look like that:
> <testcase classname="org.apache.tika.TestParsers" time="0.015"
> name="testOutlookExtraction">
>   <error type="java.io.FileNotFoundException"
> message="C:\Users\Uwe%20Schindler\Desktop\tika-0.7\tika-
> parsers\target\test-classes\test-documents\test-outlook.msg (Das System
> kann den angegebenen Pfad nicht finden)">java.io.FileNotFoundException:
> C:\Users\Uwe%20Schindler\Desktop\tika-0.7\tika-parsers\target\test-
> classes\test-documents\test-outlook.msg (Das System kann den
> angegebenen Pfad nicht finden) at java.io.FileInputStream.open(Native
> Method) at java.io.FileInputStream.<init>(FileInputStream.java:106) at
> org.apache.tika.utils.ParseUtils.getStringContent(ParseUtils.java:167)
> at
> org.apache.tika.utils.ParseUtils.getStringContent(ParseUtils.java:188)
> at
> org.apache.tika.TestParsers.testOutlookExtraction(TestParsers.java:137)
> </error>
>   </testcase>
>
> If this is caused by the whitespace in my windows user's directory, it
> should maybe fixed like in Lucene's tests (we had a similar problem
> there, too). If you search for test files in test's classpath and open
> them using the Class.getResource() method and converting the URL to a
> patch, you should not simply use the getPath() method from the URL as
> this exactly creates those wrong filenames. The fix is in
> LuceneTestCase(J4).java in Lucene's classes (method getDataFile()). You
> should convert the URL to an URI and create the File instance using
> "new File(url.toURI())". This is the "correct" way to convert a URL to
> a file system path.
>
> Should I open a test bug report?
>
> This is not release critical, so I think you can release with this bug,
> as it only affects tests.
>
> * I downloaded the repository folder and checked all signatures using
> 'find . -name "*.asc" | xargs -L1 gpg --verify' - OK.
>
> I am +1 as a new Lucene PMC member (although the is the test bug), but
> in my opinion, you should fix the md5 signatures and possibly add sha1
> signatures before release. Just check that the sums inside the files
> are identical.
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
> > -----Original Message-----
> > From: Mattmann, Chris A (388J) [mailto:chris.a.mattmann@jpl.nasa.gov]
> > Sent: Wednesday, March 31, 2010 10:02 PM
> > To: Lucene mailing list
> > Cc: tika-dev@lucene.apache.org
> > Subject: [VOTE] Apache Tika 0.7 Release Candidate #1
> >
> > Hi Folks,
> >
> > I have posted a candidate for the Apache Tika 0.7 release. The source
> > code
> > is at:
> >
> > http://people.apache.org/~mattmann/apache-tika-0.7/rc1/
> >
> > See the included CHANGES.txt file for details on release contents and
> > latest
> > changes. The release was made using the Maven2 release plugin,
> > according to
> > Jukka Zitting's notes:
> >
> > http://tinyurl.com/yz2cqls
> >
> > This plugin creates a Tika 0.7 tag at:
> >
> > http://svn.apache.org/repos/asf/lucene/tika/tags/0.7/
> >
> > And a staged M2 repository at repository.apache.org, here:
> >
> > https://repository.apache.org/content/repositories/orgapachetika-001/
> >
> > Please vote on releasing these packages as Apache Tika 0.7. The vote
> is
> > open
> > for the next 72 hours. Only votes from Lucene PMC are binding, but
> > everyone
> > is welcome to check the release candidate and voice their approval or
> > disapproval. The vote passes if at least three binding +1 votes are
> > cast.
> >
> > [ ] +1 Release the packages as Apache Tika 0.7.
> >
> > [ ] -1 Do not release the packages because...
> >
> > Thanks!
> >
> > Cheers,
> > Chris
> >
> > P.S. Note, this will likely be the *last* Tika release under the
> Lucene
> > umbrella since we've VOTE'd to turn Tika into a TLP. Thanks for
> > participation over the years from the Lucene PMC and others in the
> > Lucene
> > community!
> >
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > Chris Mattmann, Ph.D.
> > Senior Computer Scientist
> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> > Office: 171-266B, Mailstop: 171-246
> > Email: Chris.Mattmann@jpl.nasa.gov
> > WWW:   http://sunset.usc.edu/~mattmann/
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > Adjunct Assistant Professor, Computer Science Department
> > University of Southern California, Los Angeles, CA 90089 USA
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >
>





++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: Chris.Mattmann@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message