lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fergus McMenemie (JIRA)" <j...@apache.org>
Subject [jira] Commented: (SOLR-1437) DIH: Enhance XPathRecordReader to deal with //tagname and other improvments.
Date Wed, 23 Sep 2009 10:47:16 GMT

    [ https://issues.apache.org/jira/browse/SOLR-1437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758652#action_12758652
] 

Fergus McMenemie commented on SOLR-1437:
----------------------------------------

Noble,

Playing with the code... some observations I would like confirmed.

1) inside parse() the valuesAddedinThisFrame HashSet and the Stack<Set<String>>
stack variables are only used to aid in the clean up after out-puting  record.

2) The code seems unable to collect text for a forEach xpath. So for the following fragment
of code

{code}
    String xml="<root>\n"
             + "  <status>live</status>\n"
             + "  <contenido id=\"10097\" idioma=\"cat\">\n"
             + "    Cats can be cute\n"
             + "    <antetitulo></antetitulo>\n"
             + "    <titulo>\n           This is my title\n    </titulo>\n"
             + "    <resumen>\n          This is my summary\n   </resumen>\n"
             + "    <texto>\n     This is the body of my text\n   </texto>\n"
             + "    </contenido>\n"
             + "</root>";
    XPathRecordReader rr = new XPathRecordReader("/root/contenido");
    rr.addField("cat"   ,"/root/contenido", false); //  ***** FAILS *****
    rr.addField("id",    "/root/contenido/@id", false);
{code}

we can get the string associated with the id attrbute of <contenido> but not its child
text! Is this a design goal, or just the way the code ended up behaving. Do we want it to
continue to work this way?

> DIH: Enhance XPathRecordReader to deal with //tagname and other improvments.
> ----------------------------------------------------------------------------
>
>                 Key: SOLR-1437
>                 URL: https://issues.apache.org/jira/browse/SOLR-1437
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Fergus McMenemie
>            Assignee: Noble Paul
>            Priority: Minor
>             Fix For: 1.5
>
>         Attachments: SOLR-1437.patch, SOLR-1437.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> As per http://www.nabble.com/Re%3A-Extract-info-from-parent-node-during-data-import-%28redirect%3A%29-td25471162.html
it would be nice to be able to use expressions such as //tagname when parsing XML documents.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message