lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mikhail Khludnev <m...@apache.org>
Subject Re: some parent documents
Date Wed, 04 Apr 2018 10:36:48 GMT
q=+{!parent which=ntype:p v='+msg:Hello +person:Arturas'} +{!parent which=
ntype:p v='+msg:ciao +person:Vai'}

On Wed, Apr 4, 2018 at 12:19 PM, Arturas Mazeika <mazeika@gmail.com> wrote:

> Hi Mikhail et al,
>
> It seems to me that the nested documents must include nodes that encode the
> level of nodes (within the document). Therefore, the minimal example must
> include the node type. Is the following structure sufficient?
>
> {
>     "id":1,
>     "ntype":"p",
>     "_childDocuments_":
>     [
>         {"id":"1_1", "ntype":"c", "person":"Vai",     "time":"3:14",
> "msg":"Hello"},
>         {"id":"1_2", "ntype":"c", "person":"Arturas", "time":"3:14",
> "msg":"Hello"},
>         {"id":"1_3", "ntype":"c", "person":"Vai",     "time":"3:15",
> "msg":"Coz Mathias is working on another system- different screen."},
>         {"id":"1_4", "ntype":"c", "person":"Vai",     "time":"3:15",
> "msg":"It can get annoying"},
>         {"id":"1_5", "ntype":"c", "person":"Arturas", "time":"3:15",
> "msg":"Thank you. this is very nice of you"},
>         {"id":"1_6", "ntype":"c", "person":"Vai",     "time":"3:16",
> "msg":"ciao"},
>         {"id":"1_7", "ntype":"c", "person":"Arturas", "time":"3:16",
> "msg":"ciao"}
>     ]
> },
> {
>     "id":2,
>     "ntype":"p",
>     "_childDocuments_":
>     [
>         {"id":"2_1", "ntype":"c", "person":"Vai",     "time":"4:14",
> "msg":"Hi"},
>         {"id":"2_2", "ntype":"c", "person":"Arturas", "time":"4:14",
> "msg":"IBM Watson"},
>         {"id":"2_3", "ntype":"c", "person":"Vai",     "time":"4:15",
> "msg":"need to retain content"},
>         {"id":"2_4", "ntype":"c", "person":"Vai",     "time":"4:15",
> "msg":"It can get annoying"},
>         {"id":"2_5", "ntype":"c", "person":"Arturas", "time":"4:15",
> "msg":"You can make all your meetings more access"},
>         {"id":"2_6", "ntype":"c", "person":"Vai",     "time":"4:16",
> "msg":"Make every meeting a Skype meeting"},
>         {"id":"2_7", "ntype":"c", "person":"Arturas", "time":"4:16",
> "msg":"ciao"}
>     ]
> }
>
> How would a query look like that has a Hello from Person Arturas and ciao
> from Person Vai?
>
> Cheers,
> Arturas
>
>
> On Tue, Apr 3, 2018 at 5:21 PM, Arturas Mazeika <mazeika@gmail.com> wrote:
>
> > Hi Mikhail,
> >
> > Thanks a lot for the reply.
> >
> > You mentioned that
> >
> > q=+{!parent which.. v='+text:hello +person:A'} +{!parent
> > which..v='+text:ciao +person:B'}
> >
> > is the way to go. How would it look like precisely for the following
> > collection?
> >
> > {
> >     "id":1,
> >     "_childDocuments_":
> >     [
> >         {"id":"1_1", "person":"Vai"         , "time":"3:14",
> > "msg":"Hello"},
> >         {"id":"1_2", "person":"Arturas"     , "time":"3:14",
> > "msg":"Hello"},
> >         {"id":"1_3", "person":"Vai"         , "time":"3:15", "msg":"Coz
> > Mathias is working on another system- different screen."},
> >         {"id":"1_4", "person":"Vai"         , "time":"3:15", "msg":"It
> can
> > get annoying"},
> >         {"id":"1_5", "person":"Arturas"     , "time":"3:15", "msg":"Thank
> > you. this is very nice of you"},
> >         {"id":"1_6", "person":"Vai"         , "time":"3:16",
> "msg":"ciao"},
> >         {"id":"1_7", "person":"Arturas"     , "time":"3:16",
> "msg":"ciao"}
> >     ]
> > },
> > {
> >     "id":2,
> >     "_childDocuments_":
> >     [
> >         {"id":"2_1", "person":"Vai"         , "time":"4:14",
> > "msg":"Hello"},
> >         {"id":"2_2", "person":"Arturas"     , "time":"4:14", "msg":"IBM
> > Watson"},
> >         {"id":"2_3", "person":"Vai"         , "time":"4:15", "msg":"need
> > to retain content"},
> >         {"id":"2_4", "person":"Vai"         , "time":"4:15", "msg":"It
> can
> > get annoying"},
> >         {"id":"2_5", "person":"Arturas"     , "time":"4:15", "msg":"You
> > can make all your meetings more access"},
> >         {"id":"2_6", "person":"Vai"         , "time":"4:16", "msg":"Make
> > every meeting a Skype meeting"},
> >         {"id":"2_7", "person":"Arturas"     , "time":"4:16",
> "msg":"ciao"}
> >     ]
> > }
> >
> > Cheers,
> > Arturas
> >
> >
> > On Tue, Apr 3, 2018 at 4:33 PM, Mikhail Khludnev <mkhl@apache.org>
> wrote:
> >
> >> Hello, Arturas.
> >>
> >> TLDR; Please find inline below.
> >>
> >> On Tue, Apr 3, 2018 at 5:14 PM, Arturas Mazeika <mazeika@gmail.com>
> >> wrote:
> >>
> >> > Hi Solr Fans,
> >> >
> >> > I am trying to make sense of information retrieval using expressions
> >> like
> >> > "some parent", "*only parent*", " *all parent*". I am also trying to
> >> > understand the syntax "!parent which" and "!child of". On the
> technical
> >> > level, I am reading the following documents:
> >> >
> >> > [1]
> >> > https://lucene.apache.org/solr/guide/7_2/other-parsers.
> >> > html#block-join-query-parsers
> >> > [2]
> >> > https://lucene.apache.org/solr/guide/7_2/uploading-data-
> >> > with-index-handlers.html#nested-child-documents
> >> > [3] http://yonik.com/solr-nested-objects/
> >> >
> >> > and I am confused to read:
> >> >
> >> > This parser takes a query that matches some parent documents and
> returns
> >> > their children. The syntax for this parser is: q={!child
> >> > of=<allParents>}<someParents>. The parameter allParents is
a filter
> that
> >> > matches *only parent documents*; here you would define the field and
> >> value
> >> > that you used to identify *all parent documents*. The parameter
> >> someParents
> >> > identifies a query that will match some of the parent documents. The
> >> output
> >> > is the children.
> >> >
> >> > The first sentence talks about "matching" but does not define what
> that
> >> > means (and why it is only some parents matching?). The second sentence
> >> > introduces a syntax of the parser, but blurs the understanding as
> "some"
> >> > and "all" of parents are combined into one sentence. My understanding
> is
> >> > that all documents are retrieve that satisfy a query. The query must
> >> > express some constraints on the parent node and some on the child
> node.
> >> I
> >> > have a feeling that "only parent documents" reads "criteria is
> >> formulated
> >> > over the parent part of {parent document}->{child document} of entity.
> >> > My simplified conceptual world of solr looks in the following way:
> >> >
> >> > 1. Every document has an ID.
> >> > 2. Every document may have additional attributes
> >> > 3. Text attributes is what's at stake in solr. Sure we can search for
> >> > products that costs at most X, but this is the added functionality.
> For
> >> > simplicity I am neglecting those here.
> >> > 4. The user has an information need. She expresses it with (key)words
> >> and
> >> > hopes to find matching documents. For simplicity, I am skipping all
> >> issues
> >> > related to the information presentation of the documents
> >> > 5. Analysis chain (and inverse index) are the key technologies solr is
> >> > based upon. Once the chain-processing is applied, mathematical logic
> >> kicks
> >> > in, retrieving the documents (that are a set of processed, normalized,
> >> > enriched tokens) matching the query (processed, normalized and
> enriched
> >> > tokens). Clearly, the logic function can be a fancy one (at least one
> of
> >> > query token is in the document set of tokens, etc.), ranking is used
> to
> >> > sort the results.
> >> > 6. A nested document concept is introduced in solr. It needs to be
> >> uploaded
> >> > into the index structure using a specific handlers [2]. A nested
> >> documents
> >> > is a tree. A root may contain children documents, which may be parents
> >> of
> >> > grandchildren documents.
> >> > 7. Querying nested documents is supported in the following manner:
> >> >     7.1 Child documents are return that satisfies {parent
> >> > document}->{document}
> >> >     7.2 Parent documents are return that satisfy {document}->{child
> >> > document}
> >> >
> >> > Would I be very wrong to have this conceptual picture?
> >> >
> >> > From this point, the situation is a bit bury in my head. At the core,
> I
> >> do
> >> > not really understand what "a document" is anymore (since the complete
> >> json
> >> > or xml, so is a sub-json and sub-xml are documents, every document
> must
> >> > have an ID, does that meant the the subdocuments must have and ID too,
> >> or
> >> > sub-ids are also fine?), how to formulate mathematical expressions
> over
> >> > documents and what it means that the document satisfies my (key)word
> >> query?
> >> > Can we define a document to be the largest entity of information that
> >> does
> >> > not contain any other nested documents [4]? If this is defined and
> >> > communicated like this already where can I find it? There is a use of
> >> the
> >> > clarification, as the concept of the document means different things
> in
> >> > different contexts (e.g., you can update only the "complete document"
> in
> >> > the index vs. parent document, etc.).
> >> >
> >> > Is it possible to formulate what's going on using mathematical logic?
> >> Can
> >> > one express something like
> >> >
> >> > { give documents d : d is a document, d is parent of document c, d
> >> > satisfies logical criteria C1,....,CN, c satisfies logical criteria
> >> > C1',...,CM'}
> >> > { give documents c : c is a document, d is parent of document c, d
> >> > satisfies logical criteria C1,....,CN, c satisfies logical criteria
> >> > C1',...,CM'}
> >> >
> >> > here the meaning of document is as in definition [4] above.
> >> >
> >> > 1. Is it possible to retrieve all parent documents that have two
> >> children
> >> > c1 and c2? Consider a document that is a skype chat, and children are
> >> > individual lines of communication in the chat. I would be looking for
> >> the
> >> > (parent) documents that have "hello" said by person A and "ciao" said
> by
> >> > person B (as two different sub-documents).
> >> >
> >>
> >> q=+{!parent which.. v='+text:hello +person:A'} +{!parent which..
> >> v='+text:ciao +person:B'}
> >> The query syntax is really tricky and cumbersome.
> >>
> >>
> >> >
> >> > 2. Is it possible to search for documents such that they have a
> >> grandchild
> >> > and the grandchild has the word "hello"?
> >> >
> >>
> >> http://blog-archive.griddynamics.com/2013/12/grandchildren-
> >> and-siblings-with-block.html
> >>
> >>
> >> >
> >> > 3. Is it possible to search for documents that do not have children?
> >> >
> >> q=-{!parent which..}type:child
> >> Beware that mixing parents and childfree products is not supported and
> >> causes pain. as a workaround you need to put empty child placeholder
> doc.
> >> Sic. Sorry.
> >>
> >>
> >> > Is this the right venue to discuss documentation of solr?
> >> >
> >> > Thanks!
> >> > Arturas
> >> >
> >>
> >>
> >>
> >> --
> >> Sincerely yours
> >> Mikhail Khludnev
> >>
> >
> >
>



-- 
Sincerely yours
Mikhail Khludnev

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message