lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: 2.1billion+ document
Date Sat, 06 Jul 2013 12:27:31 GMT
uniqueKey is used to enforce there
being only a single copy of a doc. Say
a doc changes and you re-index it. If
there is a doc in the index already _with
the same uniqueKey_ it'll be deleted
and the new one will be the only one
visible.

Which implies that if you do implement
the suggestions, be sure you send any
docs you update to the _same_ shard
you sent them to originally.  If you have
no occasion to update docs that already
exist in your index, you don't care about
this much.

Best
Erick


On Sat, Jul 6, 2013 at 12:53 AM, Gora Mohanty <gora@mimirtech.com> wrote:

> On 6 July 2013 09:45, Ali, Saqib <docbook.xml@gmail.com> wrote:
> > Thanks Jason! That was very helpful.
> >
> > I read on the solr wiki that:
> > "Documents must have a unique key and the unique key must be stored
> > (stored="true" in schema.xml)"
> >
> > What is this unique key? Is this just a id that we define in the
> schema.xml
> > that is unique to all documents? We have something as follows:
> >         <field name="id" type="long" indexed="true" stored="true"/>
> >
> > Will this suffice?
>
> By default, schema.xml should also have
> <uniqueKey>id</uniqueKey>
> and with these, you should be all set as
> far as the configuration goes.
>
> At index time, you also have to provide
> this unique key to Solr, and for distributed
> search, ensure that it is unique across all
> shards, as the Wiki notes. How you form
> this unique key depends on your use case,
> but for example, you could use the system
> filepath, or a MD5 sum of the file contents.
>
> Regards,
> Gora
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message