lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jay Hill <jayallenh...@gmail.com>
Subject DIH: Limited xpath syntax unable to parse all xml elements
Date Thu, 02 Jul 2009 00:01:49 GMT
I'm using the XPathEntityProcessor to parse an xml structure that looks like
this:

<book>
    <author>Joe Smith</author>
    <title>World Atlas</title>
    <body>
        <chapter>
            <p>Content I want is here</p>
            <p>More content I want is here.</p>
            <p>Still more content here.>/p>
        </chapter>
    </body>
</book>

The author and title parse out fine:       <field column="title"
xpath="/book/title"/>  <field column="author" xpath="/book/author"/>

But I can't get at the data inside the <p> tags. I want to get all
non-markup text inside the body tag with something like this:

<field column="body" xpath="/book/body/chapter//p"/>

but that is not supported.

Does anyone know of a way that I can get the content within the <p> tags
without the markup?

Thanks,
-Jay

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message