lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Merging solr indexes with duplicate keys - merging duplicate documents
Date Sun, 31 Mar 2013 17:51:00 GMT
Just scan one index and do read-modify-write on the other index.

There are probably better ways to do this by storing your fast moving
social signals in a non-lucene storage system.  Even something as simple as
a sequential buffer in a file might be sufficient for your needs.  Unless
you have symmetric needs for querying both sides, it may pay to retain
design flexibility on this point.


On Sun, Mar 31, 2013 at 7:44 PM, Gagandeep singh <gagan.goku@gmail.com>wrote:

> Not sure if my mail was unclear, but I want to merge them so that i can
> make use of social signals when searching. A signal like num_likes can be
> used as a multiplicative boost to show documents that are hot.
>
> The reason why we are building 2 separate indexes is because our base data
> doesn't change fast enough, whereas social signals are much more realtime.
> So the question is, is there a way of merging 2 indexes which can handle
> duplicate documents the way i want it to?
>
>
> Thanks
> Gagan
>
>
> On Sun, Mar 31, 2013 at 8:53 PM, Upayavira <uv@odoko.co.uk> wrote:
>
> >
> >
> > On Sun, Mar 31, 2013, at 05:53 AM, Gagandeep singh wrote:
> > > Hi folks
> > >
> > > We have a use case where i have 2 solr indexes with the same schema but
> > > different field populated, for example:
> > >
> > > Common schema:
> > > <field name="url" type="text" />      // Unique key
> > > <field name="product_name" type="text" />
> > > <field name="image" type="text" />
> > > <field name="brand" type="text" />
> > > <field name="description" type="text" />
> > >
> > > <field name="out_of_stock" type="boolean" />
> > > <field name="num_likes" type="int" />
> > > <field name="num_add_2_cart" type="int" />
> > >
> > > Now i have one index which stores the information about products
> (first 5
> > > fields). This index is built every 2 days.
> > > I have a 2nd index which stores social signals (url + out_of_stock  +
> > > num_likes + num_add_2_cart). This index is built every 2 hours and is
> > > used
> > > for a near realtime boosting products.
> > > The processes for building these indexes are independent, and for
> > > operational management and for sake of reuse i would like to build
> these
> > > indexes separately.
> > >
> > > My question is, is there a convenient way of merging these 2 indexes
> > > (other
> > > than applying document updates in a loop)? The IndexMergeTool from
> lucene
> > > is not capable of applying document updates and would end up keeping
> > > either
> > > first 5 field or last 3.
> >
> > Why do you want to merge them? What sort of queries do you want to do?
> > What sort of responses do you need?
> >
> > Upayavira
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message