lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ahmed Hammad" <ahm...@gmail.com>
Subject Re: Regex Transformer Error
Date Thu, 06 Nov 2008 12:14:30 GMT
It worked by replace < with &lt; and > with &gt;

Thank you for your support,
ahmd

On Thu, Nov 6, 2008 at 2:39 AM, Norskog, Lance <lance@divvio.com> wrote:

> There is a nice HTML stripper inside Solr.
> "solr.HTMLStripStandardTokenizerFactory"
>



>
> -----Original Message-----
> From: Ahmed Hammad [mailto:ahm507@gmail.com]
> Sent: Wednesday, November 05, 2008 10:43 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Regex Transformer Error
>
> Hi,
>
> It works with the attribute regex="&lt;(.|\n)*?&gt;"
>
> Sorry for the disturbance.
>
> Regards,
>
> ahmd
>
>
> On Wed, Nov 5, 2008 at 8:18 PM, Ahmed Hammad <ahm507@gmail.com> wrote:
>
> > Hi,
> >
> > I am using Solr 1.3 data import handler. One of my table fields has
> > html tags, I want to strip it of the field text. So obviously I need
> > the Regex Transformer.
> >
> > I added transformer="RegexTransformer" attribute to my entity and a
> > new field with:
> >
> > <field sourceColName="content" column="content" regex="English"
> > replaceWith="XXXXX"/>
> >
> > Every thing works fine. The text is replace without any problem. The
> > provlem happend with my regular experession to strip html tags. So I
> > use regex="<(.|\n)*?>". Of course the charecters '<' and '>' are not
> > allowed in XML. I tried the following regex="&lt;(.|\n)*?&gt;" and
> > regex="&#3C;(.|\n)*?&#3E;" but I get the following error:
> >
> > The value of attribute "regex" associated with an element type "field"
>
> > must not contain the '<' character. at
> > com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown
> > Source) ...
> >
> > The full stack trace is following:
> >
> > *FATAL: Could not create importer. DataImporter config invalid
> > org.apache.solr.common.SolrException: FATAL: Could not create
> importer.
> > DataImporter config invalid at
> > org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImport
> > Handler.java:114)
> > at
> > org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody
> > (DataImportHandler.java:206)
> > at
> > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandle
> > rBase.java:131) at
> > org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at
> > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.
> > java:303)
> > at
> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter
> > .java:232)
> > at
> > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appli
> > cationFilterChain.java:235)
> > at
> > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFi
> > lterChain.java:206)
> > at
> > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperVa
> > lve.java:233)
> > at
> > org.apache.catalina.core.StandardContextValve.invoke(StandardContextVa
> > lve.java:191)
> > at
> > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.ja
> > va:128)
> > at
> > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.ja
> > va:102)
> > at
> > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValv
> > e.java:109)
> > at
> > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java
> > :286)
> > at
> > org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor
> > .java:857)
> > at
> > org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.pro
> > cess(Http11AprProtocol.java:565) at
> > org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:150
> > 9) at java.lang.Thread.run(Unknown Source) Caused by:
> > org.apache.solr.handler.dataimport.DataImportHandlerException:
> > Exception occurred while initializing context Processing Document # at
> > org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImp
> > orter.java:176)
> > at
> > org.apache.solr.handler.dataimport.DataImporter.<init>(DataImporter.ja
> > va:93)
> > at
> > org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImport
> > Handler.java:106) ... 17 more Caused by:
> > org.xml.sax.SAXParseException: The value of attribute "regex"
> > associated with an element type "field" must not contain the '<'
> > character. at
> > com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown
> > Source) at
> > com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unkn
> > own
> > Source) at
> > org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImp
> > orter.java:166)
> > ... 19 more *
> >
> > *description* *The server encountered an internal error (FATAL: Could
> > not create importer. DataImporter config invalid
> > org.apache.solr.common.SolrException: FATAL: Could not create
> importer.
> > DataImporter config invalid at
> > org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImport
> > Handler.java:114)
> > at
> > org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody
> > (DataImportHandler.java:206)
> > at
> > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandle
> > rBase.java:131) at
> > org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at
> > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.
> > java:303)
> > at
> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter
> > .java:232)
> > at
> > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appli
> > cationFilterChain.java:235)
> > at
> > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFi
> > lterChain.java:206)
> > at
> > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperVa
> > lve.java:233)
> > at
> > org.apache.catalina.core.StandardContextValve.invoke(StandardContextVa
> > lve.java:191)
> > at
> > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.ja
> > va:128)
> > at
> > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.ja
> > va:102)
> > at
> > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValv
> > e.java:109)
> > at
> > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java
> > :286)
> > at
> > org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor
> > .java:857)
> > at
> > org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.pro
> > cess(Http11AprProtocol.java:565) at
> > org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:150
> > 9) at java.lang.Thread.run(Unknown Source) Caused by:
> > org.apache.solr.handler.dataimport.DataImportHandlerException:
> > Exception occurred while initializing context Processing Document # at
> > org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImp
> > orter.java:176)
> > at
> > org.apache.solr.handler.dataimport.DataImporter.<init>(DataImporter.ja
> > va:93)
> > at
> > org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImport
> > Handler.java:106) ... 17 more Caused by:
> > org.xml.sax.SAXParseException: The value of attribute "regex"
> > associated with an element type "field" must not contain the '<'
> > character. at
> > com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown
> > Source) at
> > com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unkn
> > own
> > Source) at
> > org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImp
> > orter.java:166) ... 19 more ) that prevented it from fulfilling this
> > request.*
> >
> > I appreciate your help.
> >
> > Regards,
> > ahmd
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message