lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "mosh (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SOLR-12298) Index Full nested document Hierarchy For Queries (umbrella issue)
Date Tue, 01 May 2018 12:06:00 GMT

    [ https://issues.apache.org/jira/browse/SOLR-12298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16459631#comment-16459631
] 

mosh edited comment on SOLR-12298 at 5/1/18 12:05 PM:
------------------------------------------------------

Approach: I see [~janhoy]'s [proposal|http://lucene.472066.n3.nabble.com/nesting-Any-way-to-return-the-whole-hierarchical-structure-when-doing-Block-Join-queries-td4265933.html#a4380320] as
a starting point for this issue, as it addresses most of the problems, as well as [this|https://www.youtube.com/watch?v=qV0fIg-LGBE] talk
on Solr Revolution 2016: "Working with Deeply Nested Documents in Apache Solr", as the starting
points to this issue.

Firstly, the way a nested document is indexed has to be changed.
 I propose we add the following fields:
 # __parent__
 # __level__
 # __path__

__parent__: This field wild will store the document's parent docId, to be used for building
the whole hierarchy, using a new document transformer, as suggested by Jan on the mailing
list.

__level__: This field will store the level of the specified field in the document, using an
int value. This field can be used for the parentFilter, eliminating the need to provide a
parentFilter, which will be set by default as "__level__:queriedFieldLevel".

__path__: This field will contain the full path, separated by a specific reserved char e.g.,
'.'
 for example: "first.second.third".
 This will enable users to search for a specific path, or provide a regular expression to
search for fields sharing the same name in different levels of the document, filtering using
the _level_ key if needed.

To make this happen at index time, changes have to be made to the JSON loader, which will
add the above fields, as well as the __root__ field, which holds the documents top most level
docId. This will only happen when a specified parameter is added to the update request, e.g.
"nested=true".

The new child doc transformer will be able to either reassemble the whole document structure,
or do so from a specific level, if specified.
 Full hierarchy reconstruction can be done relatively cheaply, using the __root__ field to
get to the highest level document, and querying the block for its children, ordering the query
by the __level__ field.


was (Author: moshebla):
Approach: I see [~janhoy]'s [proposal|http://lucene.472066.n3.nabble.com/nesting-Any-way-to-return-the-whole-hierarchical-structure-when-doing-Block-Join-queries-td4265933.html#a4380320] as
a starting point for this issue, as it addresses most of the problems, as well as [this|https://www.youtube.com/watch?v=qV0fIg-LGBE] talk
on Solr Revolution 2016: "Working with Deeply Nested Documents in Apache Solr", as the starting
points to this issue.

Firstly, the way a nested document is indexed has to be changed.
 I propose we add the following fields:
 # __parent__
 # __level__
 # __path__

_parent_: This field wild will store the document's parent docId, to be used for building
the whole hierarchy, using a new document transformer, as suggested by Jan on the mailing
list.

_level_: This field will store the level of the specified field in the document, using an
int value. This field can be used for the parentFilter, eliminating the need to provide a
parentFilter, which will be set by default as "_level_:queriedFieldLevel".

_path_: This field will contain the full path, separated by a specific reserved char e.g.,
'.'
 for example: "first.second.third".
 This will enable users to search for a specific path, or provide a regular expression to
search for fields sharing the same name in different levels of the document, filtering using
the _level_ key if needed.

To make this happen at index time, changes have to be made to the JSON loader, which will
add the above fields, as well as the _root_ field, which holds the documents top most level
docId. This will only happen when a specified parameter is added to the update request, e.g.
"nested=true".

The new child doc transformer will be able to either reassemble the whole document structure,
or do so from a specific level, if specified.
 Full hierarchy reconstruction can be done relatively cheaply, using the _root_ field to get
to the highest level document, and querying the block for its children, ordering the query
by the _level_ field.

> Index Full nested document Hierarchy For Queries (umbrella issue)
> -----------------------------------------------------------------
>
>                 Key: SOLR-12298
>                 URL: https://issues.apache.org/jira/browse/SOLR-12298
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: mosh
>            Priority: Major
>
> Solr ought to have the ability to index deeply nested objects, while storing the original
document hierarchy.
> Currently the client has to index the child document's full path and level to manually
reconstruct the original document structure, since the children are flattened and returned
in the reserved "_childDocuments_" key.
> Ideally you could index a nested document, having Solr transparently add the required
fields while providing a document transformer to rebuild the original document's hierarchy.
>  
> This issue is an umbrella issue for the particular tasks that will make it all happen
– either subtasks or issue linking.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message