lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Garth Grimm <GarthGr...@averyranchconsulting.com>
Subject Re: Different ids for the same document in different replicas.
Date Thu, 13 Nov 2014 00:26:48 GMT
You mention you already have a unique Key identified for the data you’re storing in Solr:

> <uniqueKey>doctorId<uniquekey>

If that’s the field you’re using to uniquely identify each thing you’re storing in the
solr index, why do you want to have an id field that is populated with some random value?
 You’ll be using the doctorId field as the key, and the id field will have no real meaning
in your Data Model.

If doctorId actually isn’t unique to each item you plan on storing in Solr, is there any
other field that is?  If so, use that field as your unique key.

Remember, this uniqueKeys are usually used for routing documents to shards in SolrCloud, and
are used to ensure that later updates of the same “thing” overwrite the old one, rather
than generating multiple copies.  So the keys really should be something derived from the
data your storing.  I’m not sure if I understand why you would want to have the key randomly
generated.

> On Nov 12, 2014, at 6:39 PM, S.L <simpleliving016@gmail.com> wrote:
> 
> Just tried  adding  <uniqueKey>id</uniqueKey> while keeping id type=
> "string" only blank ids are being generated ,looks like the id is being
> auto generated only if the the id is set to  type uuid , but in case of
> SolrCloud this id will be unique per replica.
> 
> Is there a  way to generate a unique id both in case of SolrCloud with out
> using the uuid type or not having a per replica unique id?
> 
> The uuid in question is of type .
> 
> <fieldType name="uuid" class="solr.UUIDField" indexed="true" />
> 
> 
> On Wed, Nov 12, 2014 at 6:20 PM, S.L <simpleliving016@gmail.com> wrote:
> 
>> Thanks.
>> 
>> So the issue here is I already have a <uniqueKey>doctorId<uniquekey>
>> defined in my schema.xml.
>> 
>> If along with that I also want the <id></id> field to be automatically
>> generated for each document do I have to declare it as a <uniquekey> as
>> well , because I just tried the following setting without the uniqueKey for
>> id and its only generating blank ids for me.
>> 
>> *schema.xml*
>> 
>>        <field name="id" type="string" indexed="true" stored="true"
>>            required="true" multiValued="false" />
>> 
>> *solrconfig.xml*
>> 
>>      <updateRequestProcessorChain name="uuid">
>> 
>>        <processor class="solr.UUIDUpdateProcessorFactory">
>>            <str name="fieldName">id</str>
>>        </processor>
>>        <processor class="solr.RunUpdateProcessorFactory" />
>>    </updateRequestProcessorChain>
>> 
>> 
>> On Tue, Nov 11, 2014 at 7:47 PM, Garth Grimm <
>> GarthGrimm@averyranchconsulting.com> wrote:
>> 
>>> Looking a little deeper, I did find this about UUIDField
>>> 
>>> 
>>> http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/schema/UUIDField.html
>>> 
>>> "NOTE: Configuring a UUIDField instance with a default value of "NEW" is
>>> not advisable for most users when using SolrCloud (and not possible if the
>>> UUID value is configured as the unique key field) since the result will be
>>> that each replica of each document will get a unique UUID value. Using
>>> UUIDUpdateProcessorFactory<
>>> http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/update/processor/UUIDUpdateProcessorFactory.html>
>>> to generate UUID values when documents are added is recomended instead.”
>>> 
>>> That might describe the behavior you saw.  And the use of
>>> UUIDUpdateProcessorFactory to auto generate ID’s seems to be covered well
>>> here:
>>> 
>>> 
>>> http://solr.pl/en/2013/07/08/automatically-generate-document-identifiers-solr-4-x/
>>> 
>>> Though I’ve not actually tried that process before.
>>> 
>>> On Nov 11, 2014, at 7:39 PM, Garth Grimm <
>>> GarthGrimm@averyranchconsulting.com<mailto:
>>> GarthGrimm@averyranchconsulting.com>> wrote:
>>> 
>>> “uuid” isn’t an out of the box field type that I’m familiar with.
>>> 
>>> Generally, I’d stick with the out of the box advice of the schema.xml
>>> file, which includes things like….
>>> 
>>>  <!-- Only remove the "id" field if you have a very good reason to.
>>> While not strictly
>>>    required, it is highly recommended. A <uniqueKey> is present in
>>> almost all Solr
>>>    installations. See the <uniqueKey> declaration below where
>>> <uniqueKey> is set to "id".
>>>  -->
>>>  <field name="id" type="string" indexed="true" stored="true"
>>> required="true" multiValued="false" />
>>> 
>>> and…
>>> 
>>> <!-- Field to use to determine and enforce document uniqueness.
>>>     Unless this field is marked with required="false", it will be a
>>> required field
>>>  -->
>>> <uniqueKey>id</uniqueKey>
>>> 
>>> If you’re creating some key/value pair with uuid as the key as you feed
>>> documents in, and you know that the uuid values you’re creating are unique,
>>> just change the field name and unique key name from ‘id’ to ‘uuid’. 
Or
>>> change the key name you send in from ‘uuid’ to ‘id’.
>>> 
>>> On Nov 11, 2014, at 7:18 PM, S.L <simpleliving016@gmail.com<mailto:
>>> simpleliving016@gmail.com>> wrote:
>>> 
>>> Hi All,
>>> 
>>> I am seeing interesting behavior on the replicas , I have a single
>>> shard and 6 replicas and on SolrCloud 4.10.1 . I  only have a small
>>> number of documents ~375 that are replicated across the six replicas .
>>> 
>>> The interesting thing is that the same  document has a different id in
>>> each one of those replicas .
>>> 
>>> This is causing the fq(id:xyz) type queries to fail, depending on
>>> which replica the query goes to.
>>> 
>>> I have  specified the id field in the following manner in schema.xml,
>>> is it the right way to specifiy an auto generated id in  SolrCloud ?
>>> 
>>>      <field name="id" type="uuid" indexed="true" stored="true"
>>>          required="true" multiValued="false" />
>>> 
>>> 
>>> Thanks.
>>> 
>>> 
>>> 
>> 

Mime
View raw message