Return-Path: Delivered-To: apmail-jackrabbit-users-archive@locus.apache.org Received: (qmail 55193 invoked from network); 22 Apr 2008 05:45:08 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 22 Apr 2008 05:45:08 -0000 Received: (qmail 35615 invoked by uid 500); 22 Apr 2008 05:45:08 -0000 Delivered-To: apmail-jackrabbit-users-archive@jackrabbit.apache.org Received: (qmail 35598 invoked by uid 500); 22 Apr 2008 05:45:08 -0000 Mailing-List: contact users-help@jackrabbit.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@jackrabbit.apache.org Delivered-To: mailing list users@jackrabbit.apache.org Received: (qmail 35587 invoked by uid 99); 22 Apr 2008 05:45:07 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 21 Apr 2008 22:45:07 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of igor.vaynberg@gmail.com designates 209.85.128.191 as permitted sender) Received: from [209.85.128.191] (HELO fk-out-0910.google.com) (209.85.128.191) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 22 Apr 2008 05:44:24 +0000 Received: by fk-out-0910.google.com with SMTP id b27so3608697fka.11 for ; Mon, 21 Apr 2008 22:44:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:sender:to:subject:mime-version:content-type:content-transfer-encoding:content-disposition:x-google-sender-auth; bh=9YFEYPHpvM36pHWFz3rBR09xPHaijx92ioZPXpFCUC0=; b=Ck2s4XoT3CgOWS5T4N6DmiHWYkMI1F2hPBGdKKYAo/1/dz50QTzXjSFYdWxgcx31AoF8pRlc+1Jd7VVqt3hxJOYvxpmWh9sNK0sCCpsG2unksx+8dBRGmY0yjB6lwruHbEHJfaMx9YynITnfFTpsaDR2H3SIxkDUfO/MpMIieJw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:sender:to:subject:mime-version:content-type:content-transfer-encoding:content-disposition:x-google-sender-auth; b=Ff0EvQjNja6C3uSUuP8ujlf2rjcR2jbSAyDfkTAepEZcPAmmQwm7KfUJaH/eYg/AbaoC/9LdssTcDmvlh12nPa3pT05Hz5ZJkhl8Pjmme6jVH/mBooTbDjcKsHYGkbBnpyRfEDCAJBQag2I1i+Id/Pqq+TGTPlcAfOVgxPUxog0= Received: by 10.78.134.7 with SMTP id h7mr7243276hud.94.1208843075781; Mon, 21 Apr 2008 22:44:35 -0700 (PDT) Received: by 10.78.179.10 with HTTP; Mon, 21 Apr 2008 22:44:35 -0700 (PDT) Message-ID: <23eb48360804212244o63338348pbe9c0086b0f768d7@mail.gmail.com> Date: Mon, 21 Apr 2008 22:44:35 -0700 From: "Igor Vaynberg" Sender: igor.vaynberg@gmail.com To: users@jackrabbit.apache.org Subject: Possible to silence namespace checking in HTMLTextExtractor? MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline X-Google-Sender-Auth: 8a6546ab51741bd0 X-Virus-Checked: Checked by ClamAV on apache.org hi, i am storing html fragments in nt:file nodes that end with .html extension. these fragments contain some custom tags ( eg ) that are replaced later during output. apparently HTMLTextExtractor sets up a parser with namespace checking so i am constantly seeing a warning in my logs, and even if i disable the warning via log config i still see this in my stderr: ERROR: 'Namespace for prefix 'BRIX' has not been declared.'. it is rather annoying. is there any way to disable the check? stacktrace at the bottom... thanks, -igor 2008-04-21 22:37:57,953 [tid:main] [rid: ] [uid: 1] WARN o.a.j.extractor.HTMLTextExtractor - Failed to extract HTML text content javax.xml.transform.TransformerException: java.lang.RuntimeException: Namespace for prefix 'BRIX' has not been declared. at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:717) at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:313) at org.apache.jackrabbit.extractor.HTMLTextExtractor.extractText(HTMLTextExtractor.java:68) at org.apache.jackrabbit.extractor.CompositeTextExtractor.extractText(CompositeTextExtractor.java:90) at org.apache.jackrabbit.core.query.lucene.JackrabbitTextExtractor.extractText(JackrabbitTextExtractor.java:195) at org.apache.jackrabbit.core.query.lucene.NodeIndexer.addBinaryValue(NodeIndexer.java:393) at org.apache.jackrabbit.core.query.lucene.NodeIndexer.addValue(NodeIndexer.java:282) at org.apache.jackrabbit.core.query.lucene.NodeIndexer.createDoc(NodeIndexer.java:221) at org.apache.jackrabbit.core.query.lucene.SearchIndex.createDocument(SearchIndex.java:861) at org.apache.jackrabbit.core.query.lucene.SearchIndex$2.next(SearchIndex.java:512) at org.apache.jackrabbit.core.query.lucene.MultiIndex.update(MultiIndex.java:420) at org.apache.jackrabbit.core.query.lucene.SearchIndex.updateNodes(SearchIndex.java:496) at org.apache.jackrabbit.core.SearchManager.onEvent(SearchManager.java:495) at org.apache.jackrabbit.core.observation.EventConsumer.consumeEvents(EventConsumer.java:231) at org.apache.jackrabbit.core.observation.ObservationDispatcher.dispatchEvents(ObservationDispatcher.java:201) at org.apache.jackrabbit.core.observation.EventStateCollection.dispatch(EventStateCollection.java:425) at org.apache.jackrabbit.core.state.SharedItemStateManager$Update.end(SharedItemStateManager.java:737) at org.apache.jackrabbit.core.state.SharedItemStateManager.update(SharedItemStateManager.java:873) at org.apache.jackrabbit.core.state.LocalItemStateManager.update(LocalItemStateManager.java:324) at org.apache.jackrabbit.core.state.XAItemStateManager.update(XAItemStateManager.java:313) at org.apache.jackrabbit.core.state.LocalItemStateManager.update(LocalItemStateManager.java:300) at org.apache.jackrabbit.core.BatchedItemOperations.update(BatchedItemOperations.java:183) at org.apache.jackrabbit.core.WorkspaceImpl.internalCopy(WorkspaceImpl.java:397) at org.apache.jackrabbit.core.WorkspaceImpl.clone283(WorkspaceImpl.java:1045) at org.apache.jackrabbit.core.WorkspaceImpl.clone(WorkspaceImpl.java:469) at brix.jcr.api.wrapper.WorkspaceWrapper$1.execute(WorkspaceWrapper.java:54) at brix.jcr.api.wrapper.AbstractWrapper.executeCallback(AbstractWrapper.java:74) at brix.jcr.api.wrapper.WorkspaceWrapper.clone(WorkspaceWrapper.java:50) at brix.Brix.cloneWorkspace(Brix.java:121) at brix.Brix.clone(Brix.java:101) at biggie.webapp.TestDataLoader.bootstrapCms(TestDataLoader.java:1289) at biggie.webapp.TestDataLoader.onApplicationEvent(TestDataLoader.java:280) at org.springframework.context.event.SimpleApplicationEventMulticaster$1.run(SimpleApplicationEventMulticaster.java:78) at org.springframework.core.task.SyncTaskExecutor.execute(SyncTaskExecutor.java:49) at org.springframework.context.event.SimpleApplicationEventMulticaster.multicastEvent(SimpleApplicationEventMulticaster.java:76) at org.springframework.context.support.AbstractApplicationContext.publishEvent(AbstractApplicationContext.java:275) at org.springframework.context.support.AbstractApplicationContext.finishRefresh(AbstractApplicationContext.java:737) at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:384) at org.springframework.web.context.ContextLoader.createWebApplicationContext(ContextLoader.java:254) at org.springframework.web.context.ContextLoader.initWebApplicationContext(ContextLoader.java:198) at org.springframework.web.context.ContextLoaderListener.contextInitialized(ContextLoaderListener.java:45) at org.mortbay.jetty.handler.ContextHandler.startContext(ContextHandler.java:540) at org.mortbay.jetty.servlet.Context.startContext(Context.java:135) at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1220) at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:510) at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:117) at org.mortbay.jetty.Server.doStart(Server.java:222) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at biggie.webapp.StartBiggie.main(StartBiggie.java:100) at biggie.webapp.StartBiggieFresh.main(StartBiggieFresh.java:33) Caused by: java.lang.RuntimeException: Namespace for prefix 'BRIX' has not been declared. at com.sun.org.apache.xml.internal.serializer.SerializerBase.getNamespaceURI(SerializerBase.java:895) at com.sun.org.apache.xml.internal.serializer.ToXMLSAXHandler.closeStartTag(ToXMLSAXHandler.java:197) at com.sun.org.apache.xml.internal.serializer.ToSAXHandler.flushPending(ToSAXHandler.java:277) at com.sun.org.apache.xml.internal.serializer.ToXMLSAXHandler.endElement(ToXMLSAXHandler.java:243) at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source) at org.cyberneko.html.filters.DefaultFilter.endElement(DefaultFilter.java:249) at org.cyberneko.html.filters.NamespaceBinder.endElement(NamespaceBinder.java:361) at org.cyberneko.html.HTMLTagBalancer.callEndElement(HTMLTagBalancer.java:1015) at org.cyberneko.html.HTMLTagBalancer.endElement(HTMLTagBalancer.java:888) at org.cyberneko.html.HTMLTagBalancer.emptyElement(HTMLTagBalancer.java:655) at org.cyberneko.html.HTMLScanner$ContentScanner.scanStartElement(HTMLScanner.java:2340) at org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:1820) at org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:789) at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:478) at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:431) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source) at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transformIdentity(TransformerImpl.java:637) at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:708) ... 51 common frames omitted