lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mikhail Khludnev <mkhlud...@griddynamics.com>
Subject Re: Querying nested datastructures
Date Tue, 24 Nov 2015 18:48:25 GMT
Hello Istvan,

- when flattern subdocs, you can concatenate its' fields which are
necessary for retrieval, eg "K-06-45", it solves retrieval, but isn't
really flexible.
- term positions is not easier to implement, if you really prefer this way
I'd suggest to look on http://siren.solutions/siren/overview/ I haven't
tried it, but it sounds like they implemented this approach.
- if you follow recent blog post, you see our favorite approach
http://blog.griddynamics.com/2013/09/solr-block-join-support.html

Also, query time join {!join} and field collapsing are also alternatives to
consider.


On Tue, Nov 24, 2015 at 12:39 PM, István <leccine@gmail.com> wrote:

> Hi all,
>
> I would like to find documents in a key-value store (Riak) with Solr and I
> am running into a challenge. I have nested JSON documents with patent
> information. Patents have a one or many CPC (
> http://www.cooperativepatentclassification.org/index.html) codes something
> like these:
>
> {
>
> // more data
>
> "cpc": [
>     {
>       "class": "61",
>       "section": "A",
>       "sequence": "1",
>       "subclass": "K",
>       "subgroup": "06",
>       "main-group": "45",
>       "classification-value": "I"
>     },
>     {
>       "class": "61",
>       "section": "A",
>       "sequence": "2",
>       "subclass": "K",
>       "subgroup": "506",
>       "main-group": "31",
>       "classification-value": "I"
>     }
> ]
>
> }
>
> I would like to find the documents that match to a certain CPC code,
> sometimes with partial code sometimes with the full code. I used the
> following schema to index the documents:
>
> <field name="cpc.class"                 type="int"    indexed="true"
> stored="true" multiValued="true" />
> <field name="cpc.section"               type="string" indexed="true"
> stored="true" multiValued="true" />
> <field name="cpc.sequence"              type="int"    indexed="true"
> stored="true" multiValued="true" />
> <field name="cpc.subclass"              type="string" indexed="true"
> stored="true" multiValued="true" />
> <field name="cpc.subgroup"              type="int"    indexed="true"
> stored="true" multiValued="true" />
> <field name="cpc.main-group"            type="int"    indexed="true"
> stored="true" multiValued="true" />
> <field name="cpc.classification-value"  type="string" indexed="true"
> stored="true" multiValued="true" />
>
>
> The problem with this approach is that when we query a certain combination
> of partial CPC codes it returns document that don't actually match that
> combination.
>
> This behavior described in this blog post:
>
>
> http://blog.griddynamics.com/2011/06/solr-experience-search-parent-child.html
>
> My understanding is that I need to apply termPositions=”true” to the field
> definition and than Solr maintains the position information and it will
> return only the documents that actually match the combination of the
> partial CPC codes. Am I on the right track with this or there is a better
> solution to query nested documents with partial codes?
>
> Thank you in advance,
> Istvan
>
> PS: I also posted this on Stackoverflow:
>
> http://stackoverflow.com/questions/33724556/how-to-index-an-array-of-hashes-with-solr
>
> --
> the sun shines for all
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
<mkhludnev@griddynamics.com>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message