lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "mosh (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-12298) Index Full nested document Hierarchy For Queries (umbrella issue)
Date Tue, 08 May 2018 05:44:00 GMT

    [ https://issues.apache.org/jira/browse/SOLR-12298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16466905#comment-16466905
] 

mosh commented on SOLR-12298:
-----------------------------

David you have some really strong points.

Firstly,
While it is true __childDocument__ is not required, sometimes the JSON you get is not an array,
but a child document, e.g.
{code:java}
{ "id": "X998_Y998", "from": { "name": "Peyton Manning", "id": "X18" }, "message": "Where's
my contract?", "actions": [ { "name": "Comment", "link": "http://www.facebook.com/X998/posts/Y998"
}, { "name": "Like", "link": "http://www.facebook.com/X998/posts/Y998" } ], "type": "status",
"created_time": "2010-08-02T21:27:44+0000", "updated_time": "2010-08-02T21:27:44+0000" }
{code}
This is a sample Facebook API response. The array syntax will index the array as child documents,
but it will not index the child document under the key "from"
{code:java}
 { "from": { "name": "Peyton Manning", "id": "X18" } } {code}
It would be nice if you could just index JSON as is, like you can in elastic search, moving
the responsibility from the user to Solr itself.
This feature could also be added to the XML loader if needed, to enable feature equality.
After this change the is introduced to the data loaders, the rest can be done using an URP,
as long as the loaders add the needed metadata for the URP to add the required fields.

Afterwards, a new transformer could be introduced that rebuilds the whole JSON structure,
including the full original hierarchy.

On the other hand, adding a SolrInputDocument as a supported field could be the better way
to go, making most of the logic "hack" redundant and unneeded. Perhaps you are right, and
this is the better choice in the long run.

> Index Full nested document Hierarchy For Queries (umbrella issue)
> -----------------------------------------------------------------
>
>                 Key: SOLR-12298
>                 URL: https://issues.apache.org/jira/browse/SOLR-12298
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: mosh
>            Priority: Major
>
> Solr ought to have the ability to index deeply nested objects, while storing the original
document hierarchy.
>  Currently the client has to index the child document's full path and level to manually
reconstruct the original document structure, since the children are flattened and returned
in the reserved "__childDocuments__" key.
> Ideally you could index a nested document, having Solr transparently add the required
fields while providing a document transformer to rebuild the original document's hierarchy.
>  
> This issue is an umbrella issue for the particular tasks that will make it all happen
– either subtasks or issue linking.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message