lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From István <lecc...@gmail.com>
Subject Querying nested datastructures
Date Tue, 24 Nov 2015 09:39:10 GMT
Hi all,

I would like to find documents in a key-value store (Riak) with Solr and I
am running into a challenge. I have nested JSON documents with patent
information. Patents have a one or many CPC (
http://www.cooperativepatentclassification.org/index.html) codes something
like these:

{

// more data

"cpc": [
    {
      "class": "61",
      "section": "A",
      "sequence": "1",
      "subclass": "K",
      "subgroup": "06",
      "main-group": "45",
      "classification-value": "I"
    },
    {
      "class": "61",
      "section": "A",
      "sequence": "2",
      "subclass": "K",
      "subgroup": "506",
      "main-group": "31",
      "classification-value": "I"
    }
]

}

I would like to find the documents that match to a certain CPC code,
sometimes with partial code sometimes with the full code. I used the
following schema to index the documents:

<field name="cpc.class"                 type="int"    indexed="true"
stored="true" multiValued="true" />
<field name="cpc.section"               type="string" indexed="true"
stored="true" multiValued="true" />
<field name="cpc.sequence"              type="int"    indexed="true"
stored="true" multiValued="true" />
<field name="cpc.subclass"              type="string" indexed="true"
stored="true" multiValued="true" />
<field name="cpc.subgroup"              type="int"    indexed="true"
stored="true" multiValued="true" />
<field name="cpc.main-group"            type="int"    indexed="true"
stored="true" multiValued="true" />
<field name="cpc.classification-value"  type="string" indexed="true"
stored="true" multiValued="true" />


The problem with this approach is that when we query a certain combination
of partial CPC codes it returns document that don't actually match that
combination.

This behavior described in this blog post:

http://blog.griddynamics.com/2011/06/solr-experience-search-parent-child.html

My understanding is that I need to apply termPositions=”true” to the field
definition and than Solr maintains the position information and it will
return only the documents that actually match the combination of the
partial CPC codes. Am I on the right track with this or there is a better
solution to query nested documents with partial codes?

Thank you in advance,
Istvan

PS: I also posted this on Stackoverflow:
http://stackoverflow.com/questions/33724556/how-to-index-an-array-of-hashes-with-solr

-- 
the sun shines for all

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message