lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Schmidt <...@535consulting.com>
Subject Re: Possible to facet across two indices, or document types in single index?
Date Sun, 04 Dec 2011 23:12:55 GMT
Hello again:

I'm looking at the newer join functionality (http://wiki.apache.org/solr/Join) to see if that
will help me out.  While there are signs it can go cross index/core (https://issues.apache.org/jira/browse/SOLR-2272),
I doubt I can specify facet.field params for fields in a couple of different indexes.  But,
perhaps a single combined index it might work.

Anyway, the above Jira item indicates status: resolved, resolution: fixed, and Fix version/s:
4.0.  I've been working with 3.5.0, so I checked out 4.0 from svn today:

[imac:svn/dev/trunk] jas% svn info
Path: .
URL: http://svn.apache.org/repos/asf/lucene/dev/trunk
Repository Root: http://svn.apache.org/repos/asf
Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68
Revision: 1210126
...
Last Changed Rev: 1210116
Last Changed Date: 2011-12-04 07:35:46 -0700 (Sun, 04 Dec 2011)

Issuing a join query looks like the local params syntax is being ignored and is part of the
search terms?  I get zero results, when w/o the join I get 979.

<response>
    <lst name="responseHeader">
        <int name="status">0</int>
        <int name="QTime">1</int>
        <lst name="params">
            <str name="fl">id,n_type,n_name</str>
            <str name="q">{!join from=conceptId to=id fromIndex=partner-tmo}brca1</str>
            <str name="qt">partner-tmo</str>
            <str name="fq">type:node</str>
            <str name="rows">5</str>
        </lst>
    </lst>
    <result name="response" numFound="0" start="0"/>
</response>

I've not yet fully explored this yet, and I'm not all that familiar with the Solr codebase,
but is this functionality in 4.x trunk or not? I can see there is the package org.apache.lucene.search.join.
Is this the implementation of SOLR-2272?

I can see the commit was made earlier this year, and then it was reverted and things went
off the rails. I don't want to open any old wounds, but does the join exist?  I not, I'll
know not to pursue it any further. If so, is there some solrconfig.xml configuration needed
to enable it?  I don't see it in the examples.

Thanks,

Jeff

On Dec 1, 2011, at 9:47 PM, Jeff Schmidt wrote:

> Hello:
> 
> I'm trying to relate together two different types of documents.  Currently I have 'node'
documents that reside in one index (core), and 'product mapping' documents that are in another
index.  The product mapping index is used to map tenant products to nodes. The nodes are canonical
content that gets updated every quarter, where as the product mappings can change at any time.
> 
> I put them in two indexes because (1) canonical content changes rarely, and I don't want
product mapping changes to affect it (commit, re-open searchers etc.), and I would like to
support multiple tenants mapping products to the same canonical content to avoid duplication
(a few GB).
> 
> This arrange has worked well thus far, but only in the sense that for each node result
returned, I can query the product mapping index to determine the products mapped to the node.
 I combine this information within my application and return it to the client.  This works
okay in that there are only 5-20 results returned per page (start, rows).  But now I'm being
asked to facet the product catagories (multi-valued field within a product mapping document)
along with other facets defined in the canonical content.
> 
> Can this be done with Solr 3.5.0?  I've been looking into sub-queries, function queries
etc.  Also, I've seen various postings indicating that one needs to denormalize more.  I don't
want to add product information as fields to the canonical content. Not only does that defeat
my objective (1) above, but Solr does not support incremental updates of document fields.
> 
> So, one approach is to issue by query to the canonical index and get all of the document
IDs (could be 1000s), and then issue a filter query to the product mapping index with all
of these IDs and have Solr facet the product categories.  Is that efficient?  I suppose I
could use HTTP POST (via SolrJ) to convey that payload of IDs?  I could then take the facet
results of that query and combine them with the canonical index results and return them to
the client.
> 
> That may be do-able, but then let's say the user clicks on a product category facet value
to narrow the node results to only those mapped to category XYZ. This will not affect the
query issued against the canonical content index.  Instead, I think I'd have to go through
the canonical results and eliminate the nodes that are not associated with product category
XYZ.  Then, if the current page of results is inadequate (rows=10, but 3 nodes were eliminated),
I'd have to go back to the canonical index to get more rows, eliminate some some again perhaps,
get more etc.  That sounds unappealing and low performing.
> 
> Is there a Solr way to do this?  My Packt "Apache Solr 3 Enterprise Search Server" book
(page 34) states regarding separate indices:
> 
> 	"If you do develop separate schemas and if you need to search across your indices in
one search then you must perform a distributed search, described in the last chapter. A distributed
search is usually a feature employed for a large corpus but it applies here too."
> 
> But in the chapter it goes on to talk about dealing with sharding, replication etc. to
support a large corpus, not necessarily tying together two different indexes.
> 
> Is it possible to accomplish my goal in a less ugly way than I outlined above?  Since
we only have a single tenant to worry about, I could use a combined index at least for a few
months (separate fields per document type, IDs are unique among then all) if that makes a
difference.
> 
> Thanks!
> 
> Jeff
> --
> Jeff Schmidt
> 535 Consulting
> jas@535consulting.com
> http://www.535consulting.com
> (650) 423-1068
> 
> 
> 
> 
> 
> 
> 
> 
> 



--
Jeff Schmidt
535 Consulting
jas@535consulting.com
http://www.535consulting.com
(650) 423-1068










Mime
View raw message