Return-Path: Delivered-To: apmail-incubator-tika-commits-archive@locus.apache.org Received: (qmail 10956 invoked from network); 6 Jun 2008 18:35:23 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 6 Jun 2008 18:35:23 -0000 Received: (qmail 82304 invoked by uid 500); 6 Jun 2008 18:35:26 -0000 Delivered-To: apmail-incubator-tika-commits-archive@incubator.apache.org Received: (qmail 82295 invoked by uid 500); 6 Jun 2008 18:35:26 -0000 Mailing-List: contact tika-commits-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: tika-dev@incubator.apache.org Delivered-To: mailing list tika-commits@incubator.apache.org Received: (qmail 82286 invoked by uid 99); 6 Jun 2008 18:35:25 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 Jun 2008 11:35:25 -0700 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO eris.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 Jun 2008 18:34:23 +0000 Received: by eris.apache.org (Postfix, from userid 65534) id 6CD9A2388A31; Fri, 6 Jun 2008 11:34:44 -0700 (PDT) Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Subject: svn commit: r664072 [2/3] - /incubator/tika/site/ Date: Fri, 06 Jun 2008 18:34:43 -0000 To: tika-commits@incubator.apache.org From: jukka@apache.org X-Mailer: svnmailer-1.0.8 Message-Id: <20080606183444.6CD9A2388A31@eris.apache.org> X-Virus-Checked: Checked by ClamAV on apache.org Modified: incubator/tika/site/findbugs.html URL: http://svn.apache.org/viewvc/incubator/tika/site/findbugs.html?rev=664072&r1=664071&r2=664072&view=diff ============================================================================== --- incubator/tika/site/findbugs.html (original) +++ incubator/tika/site/findbugs.html Fri Jun 6 11:34:43 2008 @@ -78,7 +78,7 @@
  • - Download + Download
  • Project Documentation
    @@ -170,7 +170,7 @@
    -

    FindBugs Bug Detector Report

    The following document contains the results of FindBugs Report

    FindBugs Version is 1.1.1

    Threshold is Normal

    Effort is Default

    Summary

    ClassesBugsErrorsMissing Classes
    456181730

    Files

    ClassBugs
    org.apache.tika.config.TikaConfig1
    org.apache.tika.metadata.Metadata1
    org.apache.tika.mime.MimeType$RootXML1
    org.apache.tika.parser.microsoft.ExcelEventParser$TikaHSSFListener1
    org.apache.tika.parser.microsoft.OfficeParser1
    org.apache.tika.parser.microsoft.PowerPointExtractor6
    org.apache.tika.parser.microsoft.Slide2
    org.apache.tika.parser.microsoft.TextBox1
    org.apache.tika.parser.microsoft.WordParser2
    org.apache.tika.parser.opendocument.OpenOfficeParse r1
    org.apache.tika.utils.StringUtil1

    org.apache.tika.config.TikaConfig

    BugCategoryDetailsLine
    Write to static field org.apache.tika.config.TikaConfig.mimeTypes from instance method org.apache.tika.config.TikaConfig.TikaConfig(org.jdom.Element)STYLEST_WRITE_TO_STATIC_FROM_INSTANCE_METHOD73

    org.apache.tika.metadata.Metadata

    BugCategoryDetails Line
    org.apache.tika.metadata.Metadata defines equals and uses Object.hashCode()BAD_PRACTICEHE_EQUALS_USE_HASHCODE173-201

    org.apache.tika.mime.MimeType$RootXML

    BugCategoryDetailsLine
    Should org.apache.tika.mime.MimeType$RootXML be a _static_ inner class?PERFORMANCESIC_INNER_SHOULD_BE_STATICNot available

    org.apache.tika.parser.microsoft.ExcelEventParser$TikaHSSFListener

    BugCategoryDetailsLine
    Class org.apache.tika.parser.microsoft.ExcelEventParser$TikaHSSFListener defines non-transient non-serializable instance field appendableBAD_PRACTICESE_BAD_FIELDNot available

    org.apache.tika.parser.microsoft.OfficeParser

    BugCategoryDetailsLine
    Method org.apache.tika.parser.microsoft.OfficeParser.getMetadata(org.apache.poi.poifs.filesystem.POIFSFileSy stem,String,org.apache.tika.metadata.Metadata) catches Exception, but Exception is not thrown in the try block and RuntimeException is not explicitly caughtSTYLEREC_CATCH_EXCEPTION88

    org.apache.tika.parser.microsoft.PowerPointExtractor

    BugCategoryDetailsLine
    Dead store to outStream in method org.apache.tika.parser.microsoft.PowerPointExtractor.extractSlides(long,byte[],long)STYLEDLS_DEAD_LOCAL_STORE410
    Dead store to outStream in method org.apache.tika.parser.microsoft.PowerPointExtractor.extractTextBoxes(java.util.Hashtable,int,byte[],long)STYLEDLS_DEAD_LOCAL_STORE169
    Method org.apache.tika.parser.microsoft.PowerPointExtractor.extractTextBoxes(java.util.Hashtable,int,byte[],long) invokes inefficient Long(long) constructor; use Long.valueOf(long) insteadPERFORMANCEDM_NUMBER_CTOR206
    Method org.apache.tika.parser.microsoft.PowerPointExtractor.extractTextBoxes(java.util.Hashtable,int,byte[],long) invokes ineffi cient Long(long) constructor; use Long.valueOf(long) insteadPERFORMANCEDM_NUMBER_CTOR208
    Method org.apache.tika.parser.microsoft.PowerPointExtractor.extractTextBoxes(java.util.Hashtable,int,byte[],long) invokes inefficient Long(long) constructor; use Long.valueOf(long) insteadPERFORMANCEDM_NUMBER_CTOR214
    Useless control flow in org.apache.tika.parser.microsoft.PowerPointExtractor.extract(java.io.InputStream)STYLEUCF_USELESS_CONTROL_FLOW 94

    org.apache.tika.parser.microsoft.Slide

    BugCategoryDetailsLine
    org.apache.tika.parser.microsoft.Slide.contents is transient but org.apache.tika.parser.microsoft.Slide isn't SerializableSTYLESE_TRANSIENT_FIELD_OF_NONSERIALIZABLE_CLASSNot available
    org.apache.tika.parser.microsoft.Slide.slideNumber is transient but org.apache.tika.parser.microsoft.Slide isn't SerializableSTYLESE_TRANSIENT_FIELD_OF_NONSERIALIZABLE_CLASSNot available

    org.apache.tika.parser.microsoft.TextBox

    BugCategoryDetailsLine
    org.apache.tika.parser.microsoft.TextBox.currentID is transient but org.apache.tika.parser.microsoft.TextBox isn't SerializableSTYLESE_TRANSIENT_FIELD_OF_NONSERIALIZABLE_CLASSNot available

    org.apache.tika.parser.microsoft.WordParser

    BugCategoryDetailsLine
    org.apache.tika.parser.microsoft.WordParser.extractText(org.apache.poi.poifs.filesystem.POIFSFileSystem,Appendable) ignores result of org.apache.poi.poifs.filesystem.DocumentInputStream.read(byte[])BAD_PRACTICERR_NOT_CHECKED58
    org.apache.tika.parser.microsoft.WordParser.extractText(org.apache.poi.poifs.filesystem.POIFSFileSystem,Appendable) ignores result of org.apache.poi.poifs.filesystem.DocumentInputStream.read(byte[])BAD_PRACTICERR_NOT_CHECKED99

    org.apache.tika.parser.opendocument.OpenOfficeParser

    BugCategoryDetailsLine
    Dead store to xmlMeta in method org.apache.tika.parser.opendocument.OpenOfficeParser.parse(java.io.InputStream)STYLEDLS_DEAD_LOCAL_STORE57

    org.apache.tika.utils.StringUtil

    < td>DM_STRING_CTOR
    BugCategoryDetailsLine
    org.apache.tika.utils.StringUtil.resolveEncodingAlias(String) invokes inefficient new String(String) constructor; just use the argumentPERFORMANCE199
    +

    FindBugs Bug Detector Report

    The following document contains the results of FindBugs Report

    FindBugs Version is 1.1.1

    Threshold is Normal

    Effort is Default

    Summary

    ClassesBugsErrorsMissing Classes
    55482238

    Files

    ClassBugs
    org.apache.tika.config.TikaConfig1
    org.apache.tika.gui.TikaGUI1
    org.apache.tika.metadata.Metadata1
    org.apache.tika.mime.MimeType$RootXML1
    org.apache.tika.parser.ParsingReader1
    org.apache.tika.parser.microsoft.ExcelExtractor$PointComparator1
    org.apache.tika.sax.TeeContentHandler1
    org.apache.tika.utils.StringUtil1

    org.apache.tika.config.TikaConfig

    BugCategoryDetailsLine
    Write to static field org.apache.tika.config.TikaConfig.mimeTypes from instance method org.apache.tika.config.TikaConfig.TikaCo nfig(org.w3c.dom.Element)STYLEST_WRITE_TO_STATIC_FROM_INSTANCE_METHOD79

    org.apache.tika.gui.TikaGUI

    BugCategoryDetailsLine
    Class org.apache.tika.gui.TikaGUI defines non-transient non-serializable instance field parserBAD_PRACTICESE_BAD_FIELDNot available

    org.apache.tika.metadata.Metadata

    BugC ategoryDetailsLine
    org.apache.tika.metadata.Metadata defines equals and uses Object.hashCode()BAD_PRACTICEHE_EQUALS_USE_HASHCODE173-201

    org.apache.tika.mime.MimeType$RootXML

    BugCategoryDetailsLine
    Should org.apache.tika.mime.MimeType$RootXML be a _static_ inner class?PERFORMANCESIC_INNER_SHOULD_BE_STATICNot available

    org.apache.tika.parser.ParsingReader

    BugCategoryDetailsLine
    org.apache.tika.parser.ParsingReader.ParsingReader(Parser,java.io.InputStream,org.apache.tika.metadata.Metadata) invokes java.lang.Thread.start()MT_CORRECTNESSSC_START_IN_CTOR144

    org.apache.tika.parser.microsoft.ExcelExtractor$PointComparator

    BugCategoryDetailsLine
    org.apache.tika.parser.microsoft.ExcelExtractor$PointComparator implements Comparator but not SerializableBAD_PRACTI CESE_COMPARATOR_SHOULD_BE_SERIALIZABLENot available

    org.apache.tika.sax.TeeContentHandler

    BugCategoryDetailsLine
    org.apache.tika.sax.TeeContentHandler.TeeContentHandler(org.xml.sax.ContentHandler[]) may expose internal representation by storing an externally mutable object into org.apache.tika.sax.TeeContentHandler.handlersMALICIOUS_CODEEI_EXPOSE_REP234

    org.apache.tika.utils.StringUtil

    BugCategoryDetailsLine
    org.apache.tika.utils.StringUtil.resolveEncodingAlias(String) invokes inefficient new String(String) constructor; just use the argumentPERFORMANCEDM_STRING_CTOR199
    Modified: incubator/tika/site/index.html URL: http://svn.apache.org/viewvc/incubator/tika/site/index.html?rev=664072&r1=664071&r2=664072&view=diff ============================================================================== --- incubator/tika/site/index.html (original) +++ incubator/tika/site/index.html Fri Jun 6 11:34:43 2008 @@ -78,7 +78,7 @@
  • - Download + Download
  • Project Documentation
    Modified: incubator/tika/site/integration.html URL: http://svn.apache.org/viewvc/incubator/tika/site/integration.html?rev=664072&r1=664071&r2=664072&view=diff ============================================================================== --- incubator/tika/site/integration.html (original) +++ incubator/tika/site/integration.html Fri Jun 6 11:34:43 2008 @@ -78,7 +78,7 @@
  • - Download + Download
  • Project Documentation
    Modified: incubator/tika/site/issue-tracking.html URL: http://svn.apache.org/viewvc/incubator/tika/site/issue-tracking.html?rev=664072&r1=664071&r2=664072&view=diff ============================================================================== --- incubator/tika/site/issue-tracking.html (original) +++ incubator/tika/site/issue-tracking.html Fri Jun 6 11:34:43 2008 @@ -78,7 +78,7 @@
  • - Download + Download
  • Project Documentation
    Modified: incubator/tika/site/license.html URL: http://svn.apache.org/viewvc/incubator/tika/site/license.html?rev=664072&r1=664071&r2=664072&view=diff ============================================================================== --- incubator/tika/site/license.html (original) +++ incubator/tika/site/license.html Fri Jun 6 11:34:43 2008 @@ -78,7 +78,7 @@
  • - Download + Download
  • Project Documentation
    Modified: incubator/tika/site/mail-lists.html URL: http://svn.apache.org/viewvc/incubator/tika/site/mail-lists.html?rev=664072&r1=664071&r2=664072&view=diff ============================================================================== --- incubator/tika/site/mail-lists.html (original) +++ incubator/tika/site/mail-lists.html Fri Jun 6 11:34:43 2008 @@ -78,7 +78,7 @@
  • - Download + Download
  • Project Documentation
    Modified: incubator/tika/site/project-info.html URL: http://svn.apache.org/viewvc/incubator/tika/site/project-info.html?rev=664072&r1=664071&r2=664072&view=diff ============================================================================== --- incubator/tika/site/project-info.html (original) +++ incubator/tika/site/project-info.html Fri Jun 6 11:34:43 2008 @@ -78,7 +78,7 @@
  • - Download + Download
  • Project Documentation
    Modified: incubator/tika/site/project-reports.html URL: http://svn.apache.org/viewvc/incubator/tika/site/project-reports.html?rev=664072&r1=664071&r2=664072&view=diff ============================================================================== --- incubator/tika/site/project-reports.html (original) +++ incubator/tika/site/project-reports.html Fri Jun 6 11:34:43 2008 @@ -78,7 +78,7 @@
  • - Download + Download
  • Project Documentation
    Modified: incubator/tika/site/project-summary.html URL: http://svn.apache.org/viewvc/incubator/tika/site/project-summary.html?rev=664072&r1=664071&r2=664072&view=diff ============================================================================== --- incubator/tika/site/project-summary.html (original) +++ incubator/tika/site/project-summary.html Fri Jun 6 11:34:43 2008 @@ -78,7 +78,7 @@
  • - Download + Download
  • Project Documentation
    Modified: incubator/tika/site/rat-report.html URL: http://svn.apache.org/viewvc/incubator/tika/site/rat-report.html?rev=664072&r1=664071&r2=664072&view=diff ============================================================================== --- incubator/tika/site/rat-report.html (original) +++ incubator/tika/site/rat-report.html Fri Jun 6 11:34:43 2008 @@ -78,7 +78,7 @@
  • - Download + Download
  • Project Documentation
    @@ -170,336 +170,420 @@
    -

    RAT (Release Audit Tool) results

    The following document contains the results of RAT (Release Audit Tool).

    -*****************************************************
    -Summary
    --------
    -Notes: 4
    -Binaries: 7
    -Archives: 1
    -Standards: 96
    -
    -Apache Licensed: 88
    -Generated Documents: 0
    -
    -JavaDocs are generated and so license header is optional
    -Generated files do not required license headers
    -
    -8 Unknown Licenses
    -
    -*******************************
    -
    -Archives (+ indicates readable, $ unreadable): 
    -
    - + src/test/resources/test-documents/test-documents.zip
    - 
    -*****************************************************
    -  Files with AL headers will be marked L
    -  Binary files (which do not require AL headers) will be marked B
    -  Compressed archives will be marked A
    -  Notices, licenses etc will be marked N
    - !????? CHANGES.txt
    -  AL    HEADER.txt
    -  N     KEYS
    -  N     LICENSE.txt
    -  N     NOTICE.txt
    -  AL    pom.xml
    -  N     README.txt
    -  AL    src/main/assembly/bin.xml
    -  AL    src/main/assembly/src.xml
    -  AL    src/main/java/org/apache/tika/config/TikaConfig.java
    -  AL    src/main/java/org/apache/tika/exception/CauseIOException.java
    -  AL    src/main/java/org/apache/tika/exception/TikaException.java
    -  AL    src/main/java/org/apache/tika/metadata/CreativeCommons.java
    -  AL    src/main/java/org/apache/tika/metadata/DublinCore.java
    -  AL    src/main/java/org/apache/tika/metadata/HttpHeaders.java
    -  AL    src/main/java/org/apache/tika/metadata/Metadata.java
    -  AL    src/main/java/org/apache/tika/metadata/MSOffice.java
    -  AL    src/main/java/org/apache/tika/metadata/package.html
    -  AL    src/main/java/org/apache/tika/metadata/SpellCheckedMetadata.java
    -  AL    src/main/java/org/apache/tika/metadata/TikaMetadataKeys.java
    -  AL    src/main/java/org/apache/tika/metadata/TikaMimeKeys.java
    -  AL    src/main/java/org/apache/tika/mime/Clause.java
    -  AL    src/main/java/org/apache/tika/mime/HexCoDec.java
    -  AL    src/main/java/org/apache/tika/mime/Magic.java
    -  AL    src/main/java/org/apache/tika/mime/MagicClause.java
    -  AL    src/main/java/org/apache/tika/mime/MagicMatch.java
    -  AL    src/main/java/org/apache/tika/mime/MimeType.java
    -  AL    src/main/java/org/apache/tika/mime/MimeTypeException.java
    -  AL    src/main/java/org/apache/tika/mime/MimeTypes.java
    -  AL    src/main/java/org/apache/tika/mime/MimeTypesFactory.java
    -  AL    src/main/java/org/apache/tika/mime/MimeTypesReader.java
    -  AL    src/main/java/org/apache/tika/mime/Operator.java
    -  AL    src/main/java/org/apache/tika/mime/Patterns.java
    -  AL    src/main/java/org/apache/tika/parser/AutoDetectParser.java
    -  AL    src/main/java/org/apache/tika/parser/EmptyParser.java
    -  AL    src/main/java/org/apache/tika/parser/ErrorParser.java
    -  AL    src/main/java/org/apache/tika/parser/html/HtmlParser.java
    -  AL    src/main/java/org/apache/tika/parser/microsoft/ExcelEventParser.java
    -  AL    src/main/java/org/apache/tika/parser/microsoft/ExcelParser.java
    -  AL    src/main/java/org/apache/tika/parser/microsoft/FilteredStringWriter.java
    -  AL    src/main/java/org/apache/tika/parser/microsoft/OfficeParser.java
    -  AL    src/main/java/org/apache/tika/parser/microsoft/PowerPointExtractor.java
    -  AL    src/main/java/org/apache/tika/parser/microsoft/PowerPointParser.java
    -  AL    src/main/java/org/apache/tika/parser/microsoft/PPTConstants.java
    -  AL    src/main/java/org/apache/tika/parser/microsoft/Slide.java
    -  AL    src/main/java/org/apache/tika/parser/microsoft/TextBox.java
    -  AL    src/main/java/org/apache/tika/parser/microsoft/Word6CHPBinTable.java
    -  AL    src/main/java/org/apache/tika/parser/microsoft/Word6Extractor.java
    -  AL    src/main/java/org/apache/tika/parser/microsoft/WordParser.java
    -  AL    src/main/java/org/apache/tika/parser/microsoft/WordTextBuffer.java
    -  AL    src/main/java/org/apache/tika/parser/microsoft/WordTextPiece.java
    -  AL    src/main/java/org/apache/tika/parser/opendocument/OpenOfficeEntityResolver.java
    -  AL    src/main/java/org/apache/tika/parser/opendocument/OpenOfficeParser.java
    -  AL    src/main/java/org/apache/tika/parser/Parser.java
    -  AL    src/main/java/org/apache/tika/parser/ParserDecorator.java
    -  AL    src/main/java/org/apache/tika/parser/ParserPostProcessor.java
    -  AL    src/main/java/org/apache/tika/parser/pdf/PDF2XHTML.java
    -  AL    src/main/java/org/apache/tika/parser/pdf/PDFParser.java
    -  AL    src/main/java/org/apache/tika/parser/rtf/RTFParser.java
    -  AL    src/main/java/org/apache/tika/parser/txt/TXTParser.java
    -  AL    src/main/java/org/apache/tika/parser/xml/XMLParser.java
    -  AL    src/main/java/org/apache/tika/sax/AppendableAdaptor.java
    -  AL    src/main/java/org/apache/tika/sax/ContentHandlerDecorator.java
    -  AL    src/main/java/org/apache/tika/sax/TeeContentHandler.java
    -  AL    src/main/java/org/apache/tika/sax/WriteOutContentHandler.java
    -  AL    src/main/java/org/apache/tika/sax/XHTMLContentHandler.java
    -  AL    src/main/java/org/apache/tika/utils/ParseUtils.java
    -  AL    src/main/java/org/apache/tika/utils/RegexUtils.java
    -  AL    src/main/java/org/apache/tika/utils/RereadableInputStream.java
    -  AL    src/main/java/org/apache/tika/utils/StringUtil.java
    -  AL    src/main/java/org/apache/tika/utils/Utils.java
    -  AL    src/main/resources/mime/tika-mimetypes.xml
    -  AL    src/main/resources/tika-config.xml
    -  AL    src/site/apt/index.apt
    -  B     src/site/resources/tika.png
    -  B     src/site/resources/tika.xcf
    -  AL    src/site/site.xml
    -  AL    src/test/java/org/apache/tika/exception/CauseIOExceptionTest.java
    -  AL    src/test/java/org/apache/tika/metadata/TestMetadata.java
    -  AL    src/test/java/org/apache/tika/metadata/TestSpellCheckedMetadata.java
    -  AL    src/test/java/org/apache/tika/mime/MimeTypesTest.java
    -  AL    src/test/java/org/apache/tika/mime/MimeTypeTest.java
    -  AL    src/test/java/org/apache/tika/mime/PatternsTest.java
    -  AL    src/test/java/org/apache/tika/mime/TestMimeTypes.java
    -  AL    src/test/java/org/apache/tika/parser/AutoDetectParserTest.java
    -  AL    src/test/java/org/apache/tika/parser/html/HtmlParserTest.java
    -  AL    src/test/java/org/apache/tika/parser/microsoft/ExcelParserTest.java
    -  AL    src/test/java/org/apache/tika/parser/microsoft/PowerPointParserTest.java
    -  AL    src/test/java/org/apache/tika/parser/microsoft/WordParserTest.java
    -  AL    src/test/java/org/apache/tika/parser/txt/TXTParserTest.java
    -  AL    src/test/java/org/apache/tika/sax/AppendableAdaptorTest.java
    -  AL    src/test/java/org/apache/tika/TestParsers.java
    -  AL    src/test/java/org/apache/tika/TestRereadableInputStream.java
    -  AL    src/test/java/org/apache/tika/utils/RegexUtilsTest.java
    -  AL    src/test/resources/log4j/log4j.properties
    -  A     src/test/resources/test-documents/test-documents.zip
    -  B     src/test/resources/test-documents/testEXCEL.xls
    - !????? src/test/resources/test-documents/testHTML.html
    - !????? src/test/resources/test-documents/testHTML_utf8.html
    -  B     src/test/resources/test-documents/testOpenOffice2.odt
    -  B     src/test/resources/test-documents/testPDF.pdf
    -  B     src/test/resources/test-documents/testPPT.ppt
    - !????? src/test/resources/test-documents/testRTF.rtf
    - !????? src/test/resources/test-documents/testTXT.txt
    -  B     src/test/resources/test-documents/testWORD.doc
    - !????? src/test/resources/test-documents/testXML.xml
    - !????? tika.log
    - !????? velocity.log
    - 
    - *****************************************************
    - Printing headers for files without AL header...
    - 
    - 
    - =======================================================================
    - ==CHANGES.txt
    - =======================================================================
    - Tika Change Log
    -
    -Release 0.1-incubating - 12/27/2007
    -
    -1. TIKA-5 - Port Metadata Framework from Nutch (mattmann)
    -
    -2. TIKA-11 - Consolidate test classes into a src/test/java directory tree (mattmann)
    -
    -3. TIKA-15 - Utils.print does not print a Content having no value (jukka)
    -
    -4. TIKA-19 - org.apache.tika.TestParsers fails (bdelacretaz)
    -
    -5. TIKA-16 - Issues with data files used for testing by TestParsers (bdelacretaz)
    -
    -6. TIKA-14 - MimeTypeUtils.getMimeType() returns the default mime type for 
    -             .odt (Open Office) file (bdelacretaz)
    -
    -7. TIKA-12 - Add URL capability to MimeTypesUtils (jukka)
    -
    -8. TIKA-13 - Fix obsolete package names in config.xml (siren)
    -
    -9. TIKA-10 - Remove MimeInfoException catch clauses and import from TestParsers (siren)
    -
    -10. TIKA-8 - Replaced the jmimeinfo dependency with a trivial mime type detector (jukka)
    -
    -11. TIKA-7 - Added the Lius Lite code. Added missing dependencies to POM (jukka)
    -
    -12. TIKA-18 - "Office" interface should be renamed "MSOffice" (mattmann)
    -
    -13. TIKA-23 - Decouple Parser from ParserConfig (jukka)
    -
    -14. TIKA-6 - Port Nutch (or better) MimeType detection system into Tika (J. Charron & mattmann)
    -
    -15. TIKA-25 - Removed hardcoded reference to C:\oo.xml in OpenOfficeParser (K. Bennett & jukka)
    -
    -16. TIKA-17 - Need to support URL's for input resources. (K. Bennett & mattmann)
    -
    -17. TIKA-22 - Remove @author tags from the java source (mattmann)
    -
    -18. TIKA-21 - Simplified configuration code (jukka)
    -
    -19. TIKA-17 - Rename all "Lius" classes to be "Tika" classes (jukka)
    -
    -20. TIKA-30 - Added utility constructors to TikaConfig (K. Bennett & jukka)
    -
    -21. TIKA-28 - Rename config.xml to tika-config.xml or similar (mattmann)
    -
    -22. TIKA-26 - Use Map<String, Content> instead of List<Content> (jukka)
    -
    -23. TIKA-31 - protected Parser.parse(InputStream stream,
    -
    - =======================================================================
    - ==src/test/resources/test-documents/testHTML.html
    - =======================================================================
    - <html>
    -	<head>
    -		<title>Title : Test Indexation Html</title>	
    -	</head>
    -	<body>
    -		<h1>Test Indexation Html</h1>
    -		<p>Indexation du fichier</p>
    -	</body>	
    -</html>
    -
    - =======================================================================
    - ==src/test/resources/test-documents/testHTML_utf8.html
    - =======================================================================
    - <html>
    -	<head>
    -		<title>Title : Tilte with UTF-8 chars ???§??</title>	
    -	</head>
    -	<body>
    -		<h1>Content with UTF-8 chars</h1>
    -		<p>???§??</p>
    -	</body>	
    -</html>
    -
    - =======================================================================
    - ==src/test/resources/test-documents/testRTF.rtf
    - =======================================================================
    - {\rtf1\ansi\ansicpg1252\uc1\deff0\stshfdbch0\stshfloch0\stshfhich0\stshfbi0\deflang1036\deflangfe1036{\fonttbl{\f0\froman\fcharset0\fprq2{\*\panose 02020603050405020304}Times New Roman;}{\f37\froman\fcharset238\fprq2 Times New Roman CE;}
    -{\f38\froman\fcharset204\fprq2 Times New Roman Cyr;}{\f40\froman\fcharset161\fprq2 Times New Roman Greek;}{\f41\froman\fcharset162\fprq2 Times New Roman Tur;}{\f42\froman\fcharset177\fprq2 Times New Roman (Hebrew);}
    -{\f43\froman\fcharset178\fprq2 Times New Roman (Arabic);}{\f44\froman\fcharset186\fprq2 Times New Roman Baltic;}{\f45\froman\fcharset163\fprq2 Times New Roman (Vietnamese);}}{\colortbl;\red0\green0\blue0;\red0\green0\blue255;\red0\green255\blue255;
    -\red0\green255\blue0;\red255\green0\blue255;\red255\green0\blue0;\red255\green255\blue0;\red255\green255\blue255;\red0\green0\blue128;\red0\green128\blue128;\red0\green128\blue0;\red128\green0\blue128;\red128\green0\blue0;\red128\green128\blue0;
    -\red128\green128\blue128;\red192\green192\blue192;}{\stylesheet{\ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \fs24\lang1036\langfe1036\cgrid\langnp1036\langfenp1036 \snext0 Normal;}{\*\cs10 \additive \ssemihidden 
    -Default Paragraph Font;}{\*\ts11\tsrowd\trftsWidthB3\trpaddl108\trpaddr108\trpaddfl3\trpaddft3\trpaddfb3\trpaddfr3\trcbpat1\trcfpat1\tscellwidthfts0\tsvertalt\tsbrdrt\tsbrdrl\tsbrdrb\tsbrdrr\tsbrdrdgl\tsbrdrdgr\tsbrdrh\tsbrdrv 
    -\ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \fs20\lang1024\langfe1024\cgrid\langnp1024\langfenp1024 \snext11 \ssemihidden Normal Table;}}{\*\latentstyles\lsdstimax156\lsdlockeddef0}{\*\rsidtbl \rsid2954171\rsid10375891}
    -{\*\generator Microsoft Word 11.0.6568;}{\info{\title Test d\'92indexation Word}{\author Bibliotheque}{\operator Bibliotheque}{\creatim\yr2006\mo5\dy18\hr12\min19}{\revtim\yr2006\mo5\dy18\hr12\min19}{\version2}{\edmins0}{\nofpages1}{\nofwords3}
    -{\nofchars21}{\*\company Universite Laval}{\nofcharsws23}{\vern24579}}\paperw11906\paperh16838\margl1417\margr1417\margt1417\margb1417 
    -\deftab708\widowctrl\ftnbj\aenddoc\hyphhotz425\noxlattoyen\expshrtn\noultrlspc\dntblnsbdb\nospaceforul\formshade\horzdoc\dgmargin\dghspace180\dgvspace180\dghorigin1417\dgvorigin1417\dghshow1\dgvshow1
    -\jexpand\viewkind1\viewscale100\pgbrdrhead\pgbrdrfoot\splytwnine\ftnlytwnine\htmautsp\nolnhtadjtbl\useltbaln\alntblind\lytcalctblwd\lyttblrtgr\lnbrkrule\nobrkwrptbl\snaptogridincell\allowfieldendsel\wrppunct\asianbrkrule\nojkernpunct\rsidroot2954171 \fet0
    -\sectd \linex0\headery708\footery708\colsx708\endnhere\sectlinegrid360\sectdefaultcl\sftnbj {\*\pnseclvl1\pnucrm\pnstart1\pnindent720\pnhang {\pntxta .}}{\*\pnseclvl2\pnucltr\pnstart1\pnindent720\pnhang {\pntxta .}}{\*\pnseclvl3
    -\pndec\pnstart1\pnindent720\pnhang {\pntxta .}}{\*\pnseclvl4\pnlcltr\pnstart1\pnindent720\pnhang {\pntxta )}}{\*\pnseclvl5\pndec\pnstart1\pnindent720\pnhang {\pntxtb (}{\pntxta )}}{\*\pnseclvl6\pnlcltr\pnstart1\pnindent720\pnhang {\pntxtb (}{\pntxta )}}
    -{\*\pnseclvl7\pnlcrm\pnstart1\pnindent720\pnhang {\pntxtb (}{\pntxta )}}{\*\pnseclvl8\pnlcltr\pnstart1\pnindent720\pnhang {\pntxtb (}{\pntxta )}}{\*\pnseclvl9\pnlcrm\pnstart1\pnindent720\pnhang {\pntxtb (}{\pntxta )}}\pard\plain 
    -\ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \fs24\lang1036\langfe1036\cgrid\langnp1036\langfenp1036 {\insrsid2954171 Test d\rquote indexation Word
    -\par 
    -\par }}
    -
    - =======================================================================
    - ==src/test/resources/test-documents/testTXT.txt
    - =======================================================================
    - Test d'indexation de Txt
    -http://www.apache.org
    -
    - =======================================================================
    - ==src/test/resources/test-documents/testXML.xml
    - =======================================================================
    - <?xml version="1.0" encoding="UTF-8"?>
    -<oaidc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oaidc="http://www.openarchives.org/OAI/2.0/oai_dc/">
    -
    -	<dc:title>Archim?®de et Lius</dc:title>
    -
    -	<dc:creator>Rida Benjelloun</dc:creator>
    -
    -	<dc:subject>Java</dc:subject>
    -
    -	<dc:subject>XML</dc:subject>
    -
    -	<dc:subject>XSLT</dc:subject>
    -
    -	<dc:subject>JDOM</dc:subject>
    - 
    -	<dc:subject>Indexation</dc:subject>
    -
    -	<dc:description>Framework d'indexation des documents XML, HTML, PDF etc.. </dc:description>
    -
    -	<dc:identifier>http://www.apache.org</dc:identifier>
    -
    -	<dc:date>2000-12</dc:date>
    -
    -	<dc:type>test</dc:type>
    -
    -	<dc:format>application/msword</dc:format>
    -
    -	<dc:language>Fr</dc:language>
    -
    -	<dc:rights>Non restreint</dc:rights>	
    -
    -</oaidc:dc>
    -
    - =======================================================================
    - ==tika.log
    - =======================================================================
    - 
    - =======================================================================
    - ==velocity.log
    - =======================================================================
    - Sun Jan 06 19:01:24 PST 2008  [debug] AvalonLogSystem initialized using logfile 'velocity.log'
    -Sun Jan 06 19:01:24 PST 2008   [info] ************************************************************** 
    -Sun Jan 06 19:01:24 PST 2008   [info] Starting Jakarta Velocity v1.4
    -Sun Jan 06 19:01:24 PST 2008   [info] RuntimeInstance initializing.
    -Sun Jan 06 19:01:24 PST 2008   [info] Default Properties File: org/apache/velocity/runtime/defaults/velocity.properties
    -Sun Jan 06 19:01:24 PST 2008   [info] Trying to use logger class org.apache.velocity.runtime.log.AvalonLogSystem
    -Sun Jan 06 19:01:24 PST 2008   [info] Using logger class org.apache.velocity.runtime.log.AvalonLogSystem
    -Sun Jan 06 19:01:24 PST 2008   [info] Default ResourceManager initializing. (class org.apache.velocity.runtime.resource.ResourceManagerImpl)
    -Sun Jan 06 19:01:24 PST 2008   [info] Resource Loader Instantiated: org.apache.velocity.runtime.resource.loader.FileResourceLoader
    -Sun Jan 06 19:01:24 PST 2008   [info] FileResourceLoader : initialization starting.
    -Sun Jan 06 19:01:24 PST 2008   [info] FileResourceLoader : adding path '/Users/mattmann/.maven/cache/maven-xdoc-plugin-1.8/plugin-resources/templates'
    -Sun Jan 06 19:01:24 PST 2008   [info] FileResourceLoader : initialization complete.
    -Sun Jan 06 19:01:24 PST 2008   [info] ResourceCache : initialized. (class org.apache.velocity.runtime.resource.ResourceCacheImpl)
    -Sun Jan 06 19:01:24 PST 2008   [info] Default ResourceManager initialization complete.
    -Sun Jan 06 19:01:24 PST 2008   [info] Loaded System Directive: org.apache.velocity.runtime.directive.Literal
    -Sun Jan 06 19:01:24 PST 2008   [info] Loaded System Directive: org.apache.velocity.runtime.directive.Macro
    -Sun Jan 06 19:01:24 PST 2008   [info] Loaded System Directive: org.apache.velocity.runtime.directive.Parse
    -Sun Jan 06 19:01:24 PST 2008   [info] Loaded System Directive: org.apache.velocity.runtime.directive.Include
    -Sun Jan 06 19:01:24 PST 2008   [info] Loaded System Directive: org.apache.velocity.runtime.directive.Foreach
    -Sun Jan 06 19:01:24 PST 2008   [info] Created: 20 parsers.
    -Sun Jan 06 19:01:24 PST 2008   [info] Velocimacro : initialization starting.
    -Sun Jan 06 19:01:24 PST 2008   [info] Velocimacro : adding VMs from VM library template : VM_global_library.vm
    -Sun Jan 06 19:01:24 PST 2008  [error] ResourceManager : unable to find resource 'VM_global_library.vm' in any resource loader.
    -Sun Jan 06 19:01:24 PST 2008   [info] Velocimacro : error using  VM library template VM_global_library.vm : org.apache.velocity.exception.ResourceNotFoundException: Unable to find resource 'VM_global_library.vm'
    -Sun Jan 06 19:01:24 PST 2008   [info] Velocimacro :  VM library template macro registration complete.
    -Sun Jan 06 19:01:24 PST 2008   [info] Velocimacro : allowInline = true : VMs can be defined inline in templates
    -Sun Jan 06 19:01:24 PST 2008   [info] Velocimacro : allowInlineToOverride = false : VMs defined inline may NOT replace previous VM definitions
    -Sun Jan 06 19:01:24 PST 2008   [info] Velocimacro : allowInlineLocal = false : VMs defined inline will be  global in scope if allowed.
    -Sun Jan 06 19:01:24 PST 2008   [info] Velocimacro : messages on  : VM system will output logging messages
    -Sun Jan 06 19:01:24 PST 2008   [info] Velocimacro : autoload off  : VM system will not automatically reload global library macros
    -Sun Jan 06 19:01:24 PST 2008   [info] Velocimacro : initialization complete.
    -Sun Jan 06 19:01:24 PST 2008   [info] Velocity successfully started.
    -Sun Jan 06 19:01:24 PST 2008   [info] ResourceManager : found cvs-usage.xml with loader org.apache.velocity.runtime.resource.loader.FileResourceLoader
    -Sun Jan 06 19:01:24 PST 2008  [error] RHS of #set statement is null. Context will not be modified. cvs-usage.xml [line 28, column 5]
    -Sun Jan 06 19:01:24 PST 2008   [info] ResourceManager : found index.xml with loader org.apache.velocity.runtime.resource.loader.FileResourceLoader
    -Sun Jan 06 19:01:24 PST 2008   [info] ResourceManager : found maven-reports.xml with loader org.apache.velocity.runtime.resource.loader.FileResourceLoader
    -Sun Jan 06 19:01:24 PST 2008   [info] ResourceManager : found dependencies.xml with loader org.apache.velocity.runtime.resource.loader.FileResourceLoader
    -Sun Jan 06 19:01:24 PST 2008   [info] ResourceManager : found issue-tracking.xml with loader org.apache.velocity.runtime.resource.loader.FileResourceLoader
    -Sun Jan 06 19:01:24 PST 2008  [error] Method getText threw exception for reference $escape in template issue-tracking.xml at  [29,22]
    +        

    RAT (Release Audit Tool) results

    The following document contains the results of RAT (Release Audit Tool).

    +*****************************************************
    +Summary
    +-------
    +Notes: 4
    +Binaries: 13
    +Archives: 1
    +Standards: 118
    +
    +Apache Licensed: 108
    +Generated Documents: 0
    +
    +JavaDocs are generated and so license header is optional
    +Generated files do not required license headers
    +
    +10 Unknown Licenses
    +
    +*******************************
    +
    +Archives (+ indicates readable, $ unreadable): 
    +
    + + src/test/resources/test-documents/test-documents.zip
    + 
    +*****************************************************
    +  Files with AL headers will be marked L
    +  Binary files (which do not require AL headers) will be marked B
    +  Compressed archives will be marked A
    +  Notices, licenses etc will be marked N
    + !????? .checkstyle
    + !????? .externalToolBuilders/Maven_Ant_Builder.launch
    + !????? CHANGES.txt
    +  AL    HEADER.txt
    +  N     KEYS
    +  N     LICENSE.txt
    + !????? maven-eclipse.xml
    +  N     NOTICE.txt
    +  AL    pom.xml
    +  N     README.txt
    +  AL    src/main/assembly/standalone.xml
    +  AL    src/main/java/org/apache/tika/cli/TikaCLI.java
    +  AL    src/main/java/org/apache/tika/config/TikaConfig.java
    +  AL    src/main/java/org/apache/tika/exception/TikaException.java
    +  AL    src/main/java/org/apache/tika/gui/ParsingTransferHandler.java
    +  AL    src/main/java/org/apache/tika/gui/TikaGUI.java
    +  AL    src/main/java/org/apache/tika/metadata/CreativeCommons.java
    +  AL    src/main/java/org/apache/tika/metadata/DublinCore.java
    +  AL    src/main/java/org/apache/tika/metadata/HttpHeaders.java
    +  AL    src/main/java/org/apache/tika/metadata/Metadata.java
    +  AL    src/main/java/org/apache/tika/metadata/MSOffice.java
    +  AL    src/main/java/org/apache/tika/metadata/package.html
    +  AL    src/main/java/org/apache/tika/metadata/SpellCheckedMetadata.java
    +  AL    src/main/java/org/apache/tika/metadata/TikaMetadataKeys.java
    +  AL    src/main/java/org/apache/tika/metadata/TikaMimeKeys.java
    +  AL    src/main/java/org/apache/tika/mime/Clause.java
    +  AL    src/main/java/org/apache/tika/mime/HexCoDec.java
    +  AL    src/main/java/org/apache/tika/mime/Magic.java
    +  AL    src/main/java/org/apache/tika/mime/MagicClause.java
    +  AL    src/main/java/org/apache/tika/mime/MagicMatch.java
    +  AL    src/main/java/org/apache/tika/mime/MediaType.java
    +  AL    src/main/java/org/apache/tika/mime/MediaTypeRegistry.java
    +  AL    src/main/java/org/apache/tika/mime/MimeType.java
    +  AL    src/main/java/org/apache/tika/mime/MimeTypeException.java
    +  AL    src/main/java/org/apache/tika/mime/MimeTypes.java
    +  AL    src/main/java/org/apache/tika/mime/MimeTypesFactory.java
    +  AL    src/main/java/org/apache/tika/mime/MimeTypesReader.java
    +  AL    src/main/java/org/apache/tika/mime/Operator.java
    +  AL    src/main/java/org/apache/tika/mime/Patterns.java
    +  AL    src/main/java/org/apache/tika/parser/AbstractParser.java
    +  AL    src/main/java/org/apache/tika/parser/AutoDetectParser.java
    +  AL    src/main/java/org/apache/tika/parser/CompositeParser.java
    +  AL    src/main/java/org/apache/tika/parser/EmptyParser.java
    +  AL    src/main/java/org/apache/tika/parser/ErrorParser.java
    +  AL    src/main/java/org/apache/tika/parser/html/HtmlParser.java
    + !????? src/main/java/org/apache/tika/parser/image/ImageParser.java
    +  AL    src/main/java/org/apache/tika/parser/microsoft/Cell.java
    +  AL    src/main/java/org/apache/tika/parser/microsoft/CellDecorator.java
    +  AL    src/main/java/org/apache/tika/parser/microsoft/ExcelExtractor.java
    +  AL    src/main/java/org/apache/tika/parser/microsoft/LinkedCell.java
    +  AL    src/main/java/org/apache/tika/parser/microsoft/NumberCell.java
    +  AL    src/main/java/org/apache/tika/parser/microsoft/OfficeParser.java
    +  AL    src/main/java/org/apache/tika/parser/microsoft/TextCell.java
    +  AL    src/main/java/org/apache/tika/parser/opendocument/OpenOfficeContentParser.java
    +  AL    src/main/java/org/apache/tika/parser/opendocument/OpenOfficeMetaParser.java
    +  AL    src/main/java/org/apache/tika/parser/opendocument/OpenOfficeParser.java
    +  AL    src/main/java/org/apache/tika/parser/Parser.java
    +  AL    src/main/java/org/apache/tika/parser/ParserDecorator.java
    +  AL    src/main/java/org/apache/tika/parser/ParserPostProcessor.java
    +  AL    src/main/java/org/apache/tika/parser/ParsingReader.java
    +  AL    src/main/java/org/apache/tika/parser/pdf/PDF2XHTML.java
    +  AL    src/main/java/org/apache/tika/parser/pdf/PDFParser.java
    +  AL    src/main/java/org/apache/tika/parser/rtf/RTFParser.java
    +  AL    src/main/java/org/apache/tika/parser/txt/TXTParser.java
    +  AL    src/main/java/org/apache/tika/parser/xml/DcXMLParser.java
    +  AL    src/main/java/org/apache/tika/parser/xml/MetadataHandler.java
    +  AL    src/main/java/org/apache/tika/parser/xml/XMLParser.java
    +  AL    src/main/java/org/apache/tika/sax/BodyContentHandler.java
    +  AL    src/main/java/org/apache/tika/sax/ContentHandlerDecorator.java
    +  AL    src/main/java/org/apache/tika/sax/TeeContentHandler.java
    +  AL    src/main/java/org/apache/tika/sax/TextContentHandler.java
    +  AL    src/main/java/org/apache/tika/sax/WriteOutContentHandler.java
    +  AL    src/main/java/org/apache/tika/sax/XHTMLContentHandler.java
    +  AL    src/main/java/org/apache/tika/sax/xpath/AttributeMatcher.java
    +  AL    src/main/java/org/apache/tika/sax/xpath/ChildMatcher.java
    +  AL    src/main/java/org/apache/tika/sax/xpath/CompositeMatcher.java
    +  AL    src/main/java/org/apache/tika/sax/xpath/ElementMatcher.java
    +  AL    src/main/java/org/apache/tika/sax/xpath/Matcher.java
    +  AL    src/main/java/org/apache/tika/sax/xpath/MatchingContentHandler.java
    +  AL    src/main/java/org/apache/tika/sax/xpath/NamedAttributeMatcher.java
    +  AL    src/main/java/org/apache/tika/sax/xpath/NamedElementMatcher.java
    +  AL    src/main/java/org/apache/tika/sax/xpath/NodeMatcher.java
    +  AL    src/main/java/org/apache/tika/sax/xpath/SubtreeMatcher.java
    +  AL    src/main/java/org/apache/tika/sax/xpath/TextMatcher.java
    +  AL    src/main/java/org/apache/tika/sax/xpath/XPathParser.java
    +  AL    src/main/java/org/apache/tika/utils/ParseUtils.java
    +  AL    src/main/java/org/apache/tika/utils/RegexUtils.java
    +  AL    src/main/java/org/apache/tika/utils/RereadableInputStream.java
    +  AL    src/main/java/org/apache/tika/utils/StringUtil.java
    +  AL    src/main/java/org/apache/tika/utils/Utils.java
    +  AL    src/main/resources/mime/tika-mimetypes.xml
    +  AL    src/main/resources/tika-config.xml
    +  AL    src/site/apt/download.apt
    +  AL    src/site/apt/index.apt
    +  B     src/site/resources/tika.png
    +  B     src/site/resources/tika.xcf
    +  AL    src/site/site.xml
    +  AL    src/test/java/org/apache/tika/metadata/TestMetadata.java
    +  AL    src/test/java/org/apache/tika/metadata/TestSpellCheckedMetadata.java
    +  AL    src/test/java/org/apache/tika/mime/MediaTypeTest.java
    +  AL    src/test/java/org/apache/tika/mime/MimeTypesTest.java
    +  AL    src/test/java/org/apache/tika/mime/MimeTypeTest.java
    +  AL    src/test/java/org/apache/tika/mime/PatternsTest.java
    +  AL    src/test/java/org/apache/tika/mime/TestMimeTypes.java
    +  AL    src/test/java/org/apache/tika/parser/AutoDetectParserTest.java
    +  AL    src/test/java/org/apache/tika/parser/html/HtmlParserTest.java
    +  AL    src/test/java/org/apache/tika/parser/image/ImageParserTest.java
    +  AL    src/test/java/org/apache/tika/parser/microsoft/ExcelParserTest.java
    +  AL    src/test/java/org/apache/tika/parser/microsoft/PowerPointParserTest.java
    +  AL    src/test/java/org/apache/tika/parser/microsoft/WordParserTest.java
    +  AL    src/test/java/org/apache/tika/parser/opendocument/OpenOfficeParserTest.java
    +  AL    src/test/java/org/apache/tika/parser/ParsingReaderTest.java
    +  AL    src/test/java/org/apache/tika/parser/txt/TXTParserTest.java
    +  AL    src/test/java/org/apache/tika/parser/xml/DcXMLParserTest.java
    +  AL    src/test/java/org/apache/tika/sax/xpath/XPathParserTest.java
    +  AL    src/test/java/org/apache/tika/TestParsers.java
    +  AL    src/test/java/org/apache/tika/TestRereadableInputStream.java
    +  AL    src/test/java/org/apache/tika/utils/RegexUtilsTest.java
    +  AL    src/test/resources/log4j.properties
    +  A     src/test/resources/test-documents/test-documents.zip
    +  B     src/test/resources/test-documents/testBMP.bmp
    +  B     src/test/resources/test-documents/testEXCEL-formats.xls
    +  B     src/test/resources/test-documents/testEXCEL.xls
    +  B     src/test/resources/test-documents/testGIF.gif
    + !????? src/test/resources/test-documents/testHTML.html
    + !????? src/test/resources/test-documents/testHTML_utf8.html
    +  B     src/test/resources/test-documents/testJPEG.jpg
    +  B     src/test/resources/test-documents/testOpenOffice2.odt
    +  B     src/test/resources/test-documents/testPDF.pdf
    +  B     src/test/resources/test-documents/testPNG.png
    +  B     src/test/resources/test-documents/testPPT.ppt
    + !????? src/test/resources/test-documents/testRTF.rtf
    +  B     src/test/resources/test-documents/testTIFF.tif
    + !????? src/test/resources/test-documents/testTXT.txt
    +  B     src/test/resources/test-documents/testWORD.doc
    + !????? src/test/resources/test-documents/testXML.xml
    + 
    + *****************************************************
    + Printing headers for files without AL header...
    + 
    + 
    + =======================================================================
    + ==.checkstyle
    + =======================================================================
    + <?xml version="1.0" encoding="UTF-8"?>
    +<fileset-config file-format-version="1.2.0" simple-config="true">
    +    <fileset name="all" enabled="true" check-config-name="Sun Checks" local="false">
    +        <file-match-pattern match-pattern="." include-pattern="true"/>
    +    </fileset>
    +</fileset-config>
    +
    + =======================================================================
    + ==.externalToolBuilders/Maven_Ant_Builder.launch
    + =======================================================================
    + <launchConfiguration type="org.eclipse.ant.AntBuilderLaunchConfigurationType">
    +  <booleanAttribute key="org.eclipse.debug.ui.ATTR_LAUNCH_IN_BACKGROUND" value="false"/>
    +  <stringAttribute key="org.eclipse.ui.externaltools.ATTR_RUN_BUILD_KINDS" value="full,incremental,auto,clean"/>
    +  <booleanAttribute key="org.eclipse.ui.externaltools.ATTR_TRIGGERS_CONFIGURED" value="true"/>
    +  <booleanAttribute key="org.eclipse.debug.core.appendEnvironmentVariables" value="true"/>
    +  <stringAttribute key="org.eclipse.jdt.launching.PROJECT_ATTR" value="tika"/>
    +  <booleanAttribute key="org.eclipse.jdt.launching.DEFAULT_CLASSPATH" value="true"/>
    +  <stringAttribute key="org.eclipse.ui.externaltools.ATTR_LOCATION" value="${build_project}/maven-eclipse.xml"/>
    +  <stringAttribute key="org.eclipse.ui.externaltools.ATTR_WORKING_DIRECTORY" value="${build_project}"/>
    +  <stringAttribute key="org.eclipse.debug.core.ATTR_REFRESH_SCOPE" value="${project}"/>
    +  <booleanAttribute key="org.eclipse.debug.core.capture_output" value="false"/>
    +  <stringAttribute key="org.eclipse.ui.externaltools.ATTR_BUILD_SCOPE" value="${working_set:&lt;?xml version=&apos;1.0&apos;?&gt;&lt;launchConfigurationWorkingSet editPageId=&apos;org.eclipse.ui.resourceWorkingSetPage&apos; factoryID=&apos;org.eclipse.ui.internal.WorkingSetFactory&apos; label=&apos;workingSet&apos; name=&apos;workingSet&apos;&gt;&lt;item factoryID=&apos;org.eclipse.ui.internal.model.ResourceFactory&apos; path=&apos;tika&apos; type=&apos;4&apos;/&gt;&lt;/launchConfigurationWorkingSet&gt;}"/>
    +  <stringAttribute key="process_factory_id" value="org.eclipse.ant.ui.remoteAntProcessFactory"/>
    +  <booleanAttribute key="org.eclipse.ant.ui.DEFAULT_VM_INSTALL" value="false"/>
    +  <booleanAttribute key="org.eclipse.debug.ui.ATTR_CONSOLE_OUTPUT_ON" value="false"/>
    +  <booleanAttribute key="org.eclipse.ant.ui.ATTR_TARGETS_UPDATED" value="true"/>
    +  <stringAttribute key="org.eclipse.jdt.launching.CLASSPATH_PROVIDER" value="org.eclipse.ant.ui.AntClasspathProvider"/>
    +  <listAttribute key="org.eclipse.debug.core.MAPPED_RESOURCE_TYPES">
    +    <listEntry value="1"/>
    +  </listAttribute>
    +  <listAttribute key="org.eclipse.debug.core.MAPPED_RESOURCE_PATHS">
    +    <listEntry value="/tika/maven-eclipse.xml"/>
    +  </listAttribute>
    +</launchConfiguration>
    +
    + =======================================================================
    + ==CHANGES.txt
    + =======================================================================
    + Tika Change Log
    +
    +Unreleased changes (0.2-incubating)
    +
    +1.  TIKA-109 - WordParser fails on some Word files (Dave Meikle)
    +
    +2.  TIKA-105 - Excel parser implementation based on POI's Event API
    +               (Niall Pemberton)
    +
    +3.  TIKA-116 - Streaming parser for OpenDocument files (Jukka Zitting)
    +
    +4.  TIKA-117 - Drop JDOM and Jaxen dependencies (Jukka Zitting)
    +
    +5.  TIKA-115 - Tika package with all the dependencies (Jukka Zitting)
    +
    +6.  TIKA-97  - Tika GUI (Jukka Zitting)
    +
    +7.  TIKA-96  - Tika CLI (Jukka Zitting)
    +
    +8.  TIKA-112 - Use Commons IO 1.4 (Jukka Zitting)
    +
    +9.  TIKA-126 - Add Parser.parse(InputStream, Metadata) for metadata extraction
    +              (Jukka Zitting)
    +
    +10. TIKA-127 - Add support for Visio files (Jukka Zitting)
    +
    +11. TIKA-129 - node() support for the streaming XPath utility (Jukka Zitting)
    +
    +12. TIKA-130 - self-or-descendant axis does not match self in streaming XPath
    +               (Jukka Zitting)
    +
    +13. TIKA-131 - Lazy XHTML prefix generation (Jukka Zitting)
    +
    +14. TIKA-128 - HTML parser should produce XHTML SAX events (Jukka Zitting)
    +
    +15. TIKA-133 - TeeContentHandler constructor should use varargs (Jukka Zitting)
    +
    +16. TIKA-132 - Refactor Excel extractor to parse per sheet and add
    +               hyperlink support (Niall Pemberton)
    +
    +17. TIKA-134 - mvn package does not produce packages for bin/src
    +               (Karl Heinz Marbaise)
    +
    +18. TIKA-138 - Ignore HTML style and script content (Jukka Zitting)
    +
    +19. TIKA-113 - Metadata (such as title) should not be part of content
    +               (Jukka Zitting)
    +
    +20. TIKA-139 - Add a composite parser (Jukka Zitting)
    +
    +
    + =======================================================================
    + ==maven-eclipse.xml
    + =======================================================================
    + <project default="copy-resources">
    +  <target name="init"/>
    +  <target name="copy-resources" depends="init">
    +    <copy todir="target/classes/META-INF" filtering="false">
    +      <fileset dir="." includes="README.txt|NOTICE.txt|LICENSE.txt"/>
    +    </copy>
    +    <copy todir="target/classes/org/apache/tika" filtering="false">
    +      <fileset dir="src/main/resources"/>
    +    </copy>
    +  </target>
    +</project>
    +
    + =======================================================================
    + ==src/main/java/org/apache/tika/parser/image/ImageParser.java
    + =======================================================================
    + package org.apache.tika.parser.image;
    +
    +import java.io.IOException;
    +import java.io.InputStream;
    +import java.util.Iterator;
    +
    +import javax.imageio.ImageIO;
    +import javax.imageio.ImageReader;
    +
    +import org.apache.commons.io.input.CloseShieldInputStream;
    +import org.apache.tika.exception.TikaException;
    +import org.apache.tika.metadata.Metadata;
    +import org.apache.tika.parser.Parser;
    +import org.apache.tika.sax.XHTMLContentHandler;
    +import org.xml.sax.ContentHandler;
    +import org.xml.sax.SAXException;
    +
    +public class ImageParser implements Parser {
    +
    +    public void parse(InputStream stream, Metadata metadata)
    +            throws IOException, TikaException {
    +        String type = metadata.get(Metadata.CONTENT_TYPE);
    +        if (type != null) {
    +            Iterator<ImageReader> iterator =
    +                ImageIO.getImageReadersByMIMEType(type);
    +            if (iterator.hasNext()) {
    +                ImageReader reader = iterator.next();
    +                reader.setInput(ImageIO.createImageInputStream(
    +                        new CloseShieldInputStream(stream)));
    +                metadata.set("height", Integer.toString(reader.getHeight(0)));
    +                metadata.set("width", Integer.toString(reader.getWidth(0)));
    +                reader.dispose();
    +            }
    +        }
    +    }
    +
    +    public void parse(
    +            InputStream stream, ContentHandler handler, Metadata metadata)
    +            throws IOException, SAXException, TikaException {
    +        parse(stream, metadata);
    +        XHTMLContentHandler xhtml = new XHTMLContentHandler(handler, metadata);
    +        xhtml.startDocument();
    +        xhtml.endDocument();
    +    }
    +
    +}
    +
    + =======================================================================
    + ==src/test/resources/test-documents/testHTML.html
    + =======================================================================
    + <html>
    +	<head>
    +		<title>Title : Test Indexation Html</title>	
    +	</head>
    +	<body>
    +		<h1>Test Indexation Html</h1>
    +		<p><a href="http://www.apache.org/">Indexation</a> du fichier</p>
    +	</body>	
    +</html>
    +
    + =======================================================================
    + ==src/test/resources/test-documents/testHTML_utf8.html
    + =======================================================================
    + <html>
    +	<head>
    +		<title>Title : Tilte with UTF-8 chars öäå</title>	
    +	</head>
    +	<body>
    +		<h1>Content with UTF-8 chars</h1>
    +		<p>åäö</p>
    +	</body>	
    +</html>
    +
    + =======================================================================
    + ==src/test/resources/test-documents/testRTF.rtf
    + =======================================================================
    + {\rtf1\ansi\ansicpg1252\uc1\deff0\stshfdbch0\stshfloch0\stshfhich0\stshfbi0\deflang1036\deflangfe1036{\fonttbl{\f0\froman\fcharset0\fprq2{\*\panose 02020603050405020304}Times New Roman;}{\f37\froman\fcharset238\fprq2 Times New Roman CE;}
    +{\f38\froman\fcharset204\fprq2 Times New Roman Cyr;}{\f40\froman\fcharset161\fprq2 Times New Roman Greek;}{\f41\froman\fcharset162\fprq2 Times New Roman Tur;}{\f42\froman\fcharset177\fprq2 Times New Roman (Hebrew);}
    +{\f43\froman\fcharset178\fprq2 Times New Roman (Arabic);}{\f44\froman\fcharset186\fprq2 Times New Roman Baltic;}{\f45\froman\fcharset163\fprq2 Times New Roman (Vietnamese);}}{\colortbl;\red0\green0\blue0;\red0\green0\blue255;\red0\green255\blue255;
    +\red0\green255\blue0;\red255\green0\blue255;\red255\green0\blue0;\red255\green255\blue0;\red255\green255\blue255;\red0\green0\blue128;\red0\green128\blue128;\red0\green128\blue0;\red128\green0\blue128;\red128\green0\blue0;\red128\green128\blue0;
    +\red128\green128\blue128;\red192\green192\blue192;}{\stylesheet{\ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \fs24\lang1036\langfe1036\cgrid\langnp1036\langfenp1036 \snext0 Normal;}{\*\cs10 \additive \ssemihidden 
    +Default Paragraph Font;}{\*\ts11\tsrowd\trftsWidthB3\trpaddl108\trpaddr108\trpaddfl3\trpaddft3\trpaddfb3\trpaddfr3\trcbpat1\trcfpat1\tscellwidthfts0\tsvertalt\tsbrdrt\tsbrdrl\tsbrdrb\tsbrdrr\tsbrdrdgl\tsbrdrdgr\tsbrdrh\tsbrdrv 
    +\ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \fs20\lang1024\langfe1024\cgrid\langnp1024\langfenp1024 \snext11 \ssemihidden Normal Table;}}{\*\latentstyles\lsdstimax156\lsdlockeddef0}{\*\rsidtbl \rsid2954171\rsid10375891}
    +{\*\generator Microsoft Word 11.0.6568;}{\info{\title Test d\'92indexation Word}{\author Bibliotheque}{\operator Bibliotheque}{\creatim\yr2006\mo5\dy18\hr12\min19}{\revtim\yr2006\mo5\dy18\hr12\min19}{\version2}{\edmins0}{\nofpages1}{\nofwords3}
    +{\nofchars21}{\*\company Universite Laval}{\nofcharsws23}{\vern24579}}\paperw11906\paperh16838\margl1417\margr1417\margt1417\margb1417 
    +\deftab708\widowctrl\ftnbj\aenddoc\hyphhotz425\noxlattoyen\expshrtn\noultrlspc\dntblnsbdb\nospaceforul\formshade\horzdoc\dgmargin\dghspace180\dgvspace180\dghorigin1417\dgvorigin1417\dghshow1\dgvshow1
    +\jexpand\viewkind1\viewscale100\pgbrdrhead\pgbrdrfoot\splytwnine\ftnlytwnine\htmautsp\nolnhtadjtbl\useltbaln\alntblind\lytcalctblwd\lyttblrtgr\lnbrkrule\nobrkwrptbl\snaptogridincell\allowfieldendsel\wrppunct\asianbrkrule\nojkernpunct\rsidroot2954171 \fet0
    +\sectd \linex0\headery708\footery708\colsx708\endnhere\sectlinegrid360\sectdefaultcl\sftnbj {\*\pnseclvl1\pnucrm\pnstart1\pnindent720\pnhang {\pntxta .}}{\*\pnseclvl2\pnucltr\pnstart1\pnindent720\pnhang {\pntxta .}}{\*\pnseclvl3
    +\pndec\pnstart1\pnindent720\pnhang {\pntxta .}}{\*\pnseclvl4\pnlcltr\pnstart1\pnindent720\pnhang {\pntxta )}}{\*\pnseclvl5\pndec\pnstart1\pnindent720\pnhang {\pntxtb (}{\pntxta )}}{\*\pnseclvl6\pnlcltr\pnstart1\pnindent720\pnhang {\pntxtb (}{\pntxta )}}
    +{\*\pnseclvl7\pnlcrm\pnstart1\pnindent720\pnhang {\pntxtb (}{\pntxta )}}{\*\pnseclvl8\pnlcltr\pnstart1\pnindent720\pnhang {\pntxtb (}{\pntxta )}}{\*\pnseclvl9\pnlcrm\pnstart1\pnindent720\pnhang {\pntxtb (}{\pntxta )}}\pard\plain 
    +\ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \fs24\lang1036\langfe1036\cgrid\langnp1036\langfenp1036 {\insrsid2954171 Test d\rquote indexation Word
    +\par 
    +\par }}
    +
    + =======================================================================
    + ==src/test/resources/test-documents/testTXT.txt
    + =======================================================================
    + Test d'indexation de Txt
    +http://www.apache.org
    +
    + =======================================================================
    + ==src/test/resources/test-documents/testXML.xml
    + =======================================================================
    + <?xml version="1.0" encoding="UTF-8"?>
    +<oaidc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oaidc="http://www.openarchives.org/OAI/2.0/oai_dc/">
    +
    +	<dc:title>Tika test document</dc:title>
    +
    +	<dc:creator>Rida Benjelloun</dc:creator>
    +
    +	<dc:subject>Java</dc:subject>
    +
    +	<dc:subject>XML</dc:subject>
    +
    +	<dc:subject>XSLT</dc:subject>
    +
    +	<dc:subject>JDOM</dc:subject>
    + 
    +	<dc:subject>Indexation</dc:subject>
    +
    +	<dc:description>Framework d'indexation des documents XML, HTML, PDF etc.. </dc:description>
    +
    +	<dc:identifier>http://www.apache.org</dc:identifier>
    +
    +	<dc:date>2000-12</dc:date>
    +
    +	<dc:type>test</dc:type>
    +
    +	<dc:format>application/msword</dc:format>
    +
    +	<dc:language>Fr</dc:language>
    +
    +	<dc:rights>Archimède et Lius à Châteauneuf testing chars en été</dc:rights>	
    +
    +</oaidc:dc>
     

    Modified: incubator/tika/site/source-repository.html URL: http://svn.apache.org/viewvc/incubator/tika/site/source-repository.html?rev=664072&r1=664071&r2=664072&view=diff ============================================================================== --- incubator/tika/site/source-repository.html (original) +++ incubator/tika/site/source-repository.html Fri Jun 6 11:34:43 2008 @@ -78,7 +78,7 @@
  • - Download + Download
  • Project Documentation