hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Patrick Angeles (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-1027) Create UDFs for XPath expression evaluation
Date Thu, 07 Jan 2010 05:23:54 GMT

    [ https://issues.apache.org/jira/browse/HIVE-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12797520#action_12797520
] 

Patrick Angeles commented on HIVE-1027:
---------------------------------------


1) In general XPath queries return a list of nodes. What is the semantics of xpath_double
(eg.) return if XPath evaluates to multiple nodes. 

Only xpath() returns multiple nodes (list).

xpath_string() returns the text of the first matching node (and its subnodes, if any).
- xpath_string('<a>aa<b>b1</b><b>b2</b></a>','a') returns
'aab1b2'
- xpath_string('<a>aa<b>b1</b><b>b2</b></a>','b') returns
'b1'

xpath_double()/float() return the numeric value of the text of the first matching node, or
NaN if the text value is not numeric.
xpath_int()/long()/short() return the numberic value of the text of the first matching node,
or 0 if the text value is not numeric, or MAX_INT, MAX_LONG, MAX_SHORT respectively if the
value overflows.

2) Is the XPath query parsed for every input row, or only parsed once?

The XPath expression is compiled and cached. It is reused if the next expression matches the
previous. Otherwise, it is recompiled. So, the xml is always parsed for every input row, but
the xpath expression is precompiled and reused for the vast majority of use cases.

3a) Do you support DTD and XMLSchema?

Not sure how these would apply, as the Java XPath API is schema agnostic (no validation being
performed). However, malformed xml (e.g., '<a><b>1</b></aa>') will
result in a runtime exception being thrown.

3b) What about namespace and backward axes in XPath?

Namespace is not currently supported, but could be easily added later.

Backward axes are supported:

> select xpath ('<a><b id="1"><c/></b><b id="2"><c/></b></a>','/descendant::c/ancestor::b/@id')
from t1 limit 1 ;
["1","2"]

4) If XPath evaluates to empty list, do you return NULL or empty string (in case of xpath())?

When no match is found:
xpath()  returns an empty list.
xpath_string() returns an empty string.
xpath_int(), float(), etc. will return 0.
xpath_boolean() will return false.

> Create UDFs for XPath expression evaluation
> -------------------------------------------
>
>                 Key: HIVE-1027
>                 URL: https://issues.apache.org/jira/browse/HIVE-1027
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Patrick Angeles
>            Assignee: Patrick Angeles
>            Priority: Minor
>         Attachments: hive-1027.patch, udf_xpath.patch
>
>
> Create UDFs for evaluating XPath expressions against XML documents.
> Examples:
> > SELECT xpath_double ('<a><b class="odd">1</b><b class="even">2</b><b
class="odd">4</b><c>8</c></a>', 'sum(a/b[@class="odd"])') FROM
src LIMIT 1 ;
> 5.0
> > SELECT xpath_string ('<a><b>b1</b><b>b2</b></a>',
'a/b[2]') FROM src LIMIT 1 ;
> b2
> > SELECT xpath ('<a><b>b1</b><b>b2</b><b>b3</b><c>c1</c><c>c2</c></a>',
'a/c/text()') FROM src LIMIT 1 ;
> ["c1","c2"]
> Included functions are: xpath_short, xpath_int, xpath_long, xpath_float, xpath_double/xpath_number,
xpath_string, xpath

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message