lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From István <lecc...@gmail.com>
Subject Re: Querying nested datastructures
Date Wed, 25 Nov 2015 14:14:37 GMT
Hi Jack,

Thank you very much, I am going to for this as the primary solution.

Regards,
Istvan

On Tue, Nov 24, 2015 at 1:56 PM, Jack Krupansky <jack.krupansky@gmail.com>
wrote:

> The primary recommendation is that you flatten nested documents.
>
> That means one Solr document per cpc, not multivalued.
>
> As always, queries should drive your data model, so please specify what a
> typical query might be like, in plain English.
>
> -- Jack Krupansky
>
> On Tue, Nov 24, 2015 at 4:39 AM, István <leccine@gmail.com> wrote:
>
> > Hi all,
> >
> > I would like to find documents in a key-value store (Riak) with Solr and
> I
> > am running into a challenge. I have nested JSON documents with patent
> > information. Patents have a one or many CPC (
> > http://www.cooperativepatentclassification.org/index.html) codes
> something
> > like these:
> >
> > {
> >
> > // more data
> >
> > "cpc": [
> >     {
> >       "class": "61",
> >       "section": "A",
> >       "sequence": "1",
> >       "subclass": "K",
> >       "subgroup": "06",
> >       "main-group": "45",
> >       "classification-value": "I"
> >     },
> >     {
> >       "class": "61",
> >       "section": "A",
> >       "sequence": "2",
> >       "subclass": "K",
> >       "subgroup": "506",
> >       "main-group": "31",
> >       "classification-value": "I"
> >     }
> > ]
> >
> > }
> >
> > I would like to find the documents that match to a certain CPC code,
> > sometimes with partial code sometimes with the full code. I used the
> > following schema to index the documents:
> >
> > <field name="cpc.class"                 type="int"    indexed="true"
> > stored="true" multiValued="true" />
> > <field name="cpc.section"               type="string" indexed="true"
> > stored="true" multiValued="true" />
> > <field name="cpc.sequence"              type="int"    indexed="true"
> > stored="true" multiValued="true" />
> > <field name="cpc.subclass"              type="string" indexed="true"
> > stored="true" multiValued="true" />
> > <field name="cpc.subgroup"              type="int"    indexed="true"
> > stored="true" multiValued="true" />
> > <field name="cpc.main-group"            type="int"    indexed="true"
> > stored="true" multiValued="true" />
> > <field name="cpc.classification-value"  type="string" indexed="true"
> > stored="true" multiValued="true" />
> >
> >
> > The problem with this approach is that when we query a certain
> combination
> > of partial CPC codes it returns document that don't actually match that
> > combination.
> >
> > This behavior described in this blog post:
> >
> >
> >
> http://blog.griddynamics.com/2011/06/solr-experience-search-parent-child.html
> >
> > My understanding is that I need to apply termPositions=”true” to the
> field
> > definition and than Solr maintains the position information and it will
> > return only the documents that actually match the combination of the
> > partial CPC codes. Am I on the right track with this or there is a better
> > solution to query nested documents with partial codes?
> >
> > Thank you in advance,
> > Istvan
> >
> > PS: I also posted this on Stackoverflow:
> >
> >
> http://stackoverflow.com/questions/33724556/how-to-index-an-array-of-hashes-with-solr
> >
> > --
> > the sun shines for all
> >
>



-- 
the sun shines for all

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message