lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joel Bernstein <joels...@gmail.com>
Subject Re: Using fetch function with streaming expression
Date Wed, 15 Mar 2017 14:21:02 GMT
I haven't created the jira ticket for this yet. It's fairly quick to
implement but the Solr 6.5 release is just around the corner. So most
likely it would be in the Solr 6.6.  It will be committed fairly soon
though so if you want to use master, or branch_6x you can experiment with
it earlier.

Joel Bernstein
http://joelsolr.blogspot.com/

On Tue, Mar 14, 2017 at 7:53 PM, Pratik Patel <pratik@semandex.net> wrote:

> Wow, this is interesting! Is it going to be a new addition to solr or is it
> already available cause I can not find it in documentation? I am using solr
> version 6.4.1.
>
> On Tue, Mar 14, 2017 at 7:41 PM, Joel Bernstein <joelsolr@gmail.com>
> wrote:
>
> > I'm going to add a "cartesian" function that create a cartesian product
> > from a multi-value field. This will turn a single tuple with a
> multi-value
> > into multiple tuples with a single value field. This will allow the fetch
> > operation to work on ancestors. It also has many other use cases. Sample
> > syntax:
> >
> > fetch(collection1,
> >          cartesian(field=ancestors,
> >                          having(gatherNodes(collection1,
> >
> >  search(collection1,
> >
> >  q="*:*",
> >
> >  fl="conceptid",
> >
> >  sort="conceptid asc",
> >
> >  fq=storeid:"524efcfd505637004b1f6f24",
> >
> >  fq=tags:"Company",
> >
> >  fq=tags:"Prospects2",
> >
> >  qt="/export"),
> >
> > walk=conceptid->eventParticipantID,
> >
> > gather="eventID",
> >                                           t
> > rackTraversal="true",
> >
> > scatter="leaves",
> >                                                             count(*)),
> >                                      gt(count(*),1))),
> >          fl="concept_name",
> >          on="ancestors=conceptid")
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Tue, Mar 14, 2017 at 11:51 AM, Pratik Patel <pratik@semandex.net>
> > wrote:
> >
> > > Hi, Joel. Thanks for the reply.
> > >
> > > So, I need to do some graph traversal queries for my use case. In my
> data
> > > set, I have concepts and events.
> > >
> > > concept : {name, address, bio ......},
> > > > event: {name, date, participantIds:[concept1, concept2...] .....}
> > >
> > >
> > > Events connects two or more concepts. So, this is a graph data where
> > > concepts are connected to each other via events. Each event store links
> > to
> > > the concepts that it connects. So the field which stores those links is
> > > multivalued. This is a natural structure for my data on which I wanted
> to
> > > do some advanced graph traversal queries with some streaming
> expression.
> > > However, gatherNodes() function does not support multivalued fields
> yet.
> > > So, I changed my index structure to be something like this.
> > >
> > > concept : {conceptId, name, address, bio ......},
> > > > event: {eventId, name, date, participantIds:[concept1, concept2...]
> > > .....}
> > > > *****create eventLink documents for each participantId in each
> > > > event********
> > > > eventLink:{eventid, conceptid, id}
> > >
> > >
> > >
> > > I created eventLink documents from each event so that I can traverse
> the
> > > data using gatherNodes() function. With this change, I was able to do
> > graph
> > > query and get Ids of concepts which I wanted. However, I only have ids
> of
> > > concepts. Now, using these ids, I want additional data from concept
> > > documents like concept_name or address or bio.  This is what I was
> trying
> > > to achieve with fetch() function but it seems I hit the multivalued
> > > limitation again :) The reason why I am storing only the ids in
> eventLink
> > > documents is because I don't want to duplicate data unnecessarily. It
> > will
> > > complicate maintenance of consistency in index when delete/update
> > happens.
> > > Is there any way I can achieve this?
> > >
> > > Thanks!
> > > Pratik
> > >
> > >
> > >
> > >
> > >
> > > On Tue, Mar 14, 2017 at 11:24 AM, Joel Bernstein <joelsolr@gmail.com>
> > > wrote:
> > >
> > > > Wow that's an interesting expression!
> > > >
> > > > The problem is that you are trying to fetch using the ancestors
> field,
> > > > which is multi-valued. fetch doesn't support multi-value join keys. I
> > > never
> > > > thought someone might try to do that.
> > > >
> > > > So , your attempting to get the concept names for ancestors?
> > > >
> > > > Can you explain a little more about the use case?
> > > >
> > > >
> > > > Joel Bernstein
> > > > http://joelsolr.blogspot.com/
> > > >
> > > > On Tue, Mar 14, 2017 at 11:08 AM, Pratik Patel <pratik@semandex.net>
> > > > wrote:
> > > >
> > > > > I have two types of documents in my index. eventLink and
> > concepttData.
> > > > >
> > > > > eventLink ---- { ancestors:[<id1>,<id2>] }
> > > > > conceptData-----{ id:id1, conceptid, concept_name .....<some more
> > > data> }
> > > > >
> > > > > Both are in same collection.
> > > > > In my query, I am doing a gatherNodes query wrapped in some other
> > > > function
> > > > > and ultimately I am getting a bunch of eventLink documents. Now,
I
> am
> > > > > trying to get conceptData document for each id specified in
> > eventLink's
> > > > > ancestors field. I am trying to do that using fetch() function.
> Here
> > is
> > > > > simplified form of my query.
> > > > >
> > > > > fetch(collection1,
> > > > > >  function to get eventLinks,
> > > > > >   fl="concept_name",
> > > > > >   on="ancestors=conceptid"
> > > > > > )
> > > > >
> > > > >
> > > > > On executing this query, I am getting back same set of documents
> > which
> > > > are
> > > > > results of my streaming expression containing gatherNodes()
> function.
> > > No
> > > > > fields are added to the tuples. From documentation, it seems like
> > fetch
> > > > > would fetch additional data and add it to the tuples. However, that
> > is
> > > > not
> > > > > happening. Resulting tuples does not have concept_name field in
> them.
> > > > What
> > > > > am I missing here? I really need to get this additional data from
> one
> > > > solr
> > > > > query so that I don't have to iterate over the eventLinks and get
> > > > > additional data by individual queries. That would badly impact
> > > > performance.
> > > > > Any suggestions?
> > > > >
> > > > > Here is my actual query and the response.
> > > > >
> > > > >
> > > > > fetch(collection1,
> > > > > >  having(
> > > > > > gatherNodes(collection1,
> > > > > > search(collection1,q="*:*",fl="conceptid",sort="conceptid
> > > > > > asc",fq=storeid:"524efcfd505637004b1f6f24",fq=
> > > tags:"Company",fq=tags:"
> > > > > Prospects2",
> > > > > > qt="/export"),
> > > > > > walk=conceptid->eventParticipantID,
> > > > > > gather="eventID",
> > > > > > trackTraversal="true", scatter="leaves",
> > > > > > count(*)
> > > > > > ),
> > > > > > gt(count(*),1)
> > > > > > ),
> > > > > > fl="concept_name",
> > > > > > on="ancestors=conceptid"
> > > > > > )
> > > > >
> > > > >
> > > > >
> > > > > Response :
> > > > >
> > > > > {
> > > > > > "result-set": {
> > > > > > "docs": [
> > > > > > {
> > > > > > "node": "524f03355056c8b53b4ed199",
> > > > > > "field": "eventID",
> > > > > > "level": 1,
> > > > > > "count(*)": 2,
> > > > > > "collection": "collection1",
> > > > > > "ancestors": [
> > > > > > "524f02845056c8b53b4e9871",
> > > > > > "524f02755056c8b53b4e9269"
> > > > > > ]
> > > > > > },
> > > > > > .........
> > > > > > }
> > > > >
> > > > >
> > > > > Thanks,
> > > > > Pratik
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message