lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Garth Grimm <GarthGr...@averyranchconsulting.com>
Subject Re: Can we query on _version_field ?
Date Thu, 13 Nov 2014 18:31:31 GMT
So it sounds like you’re OK with using the docURL as the unique key for routing in SolrCloud,
but you don’t want to use it as a lookup mechanism.

If you don’t want to do a hash of it and use that unique value in a second unique field
and feed time,
and you can’t seem to find any other field that might be unique,
and you don’t want to make your own UpdateRequestProcessorChain that would generate a unique
field from your unique key (such as by doing an MD5 hash),
you might look at the UpdateRequestProcessorChain named “deduce” in the OOB solrconfig.xml.
 It’s primarily designed to help dedupe results, but it’s technique is to concatenate
multiple fields together to create a signature that will be unique in some way.  So instead
of having to find one field in your data that’s unique, you could look for a couple of fields
that, if combined, would create a unique field, and configure the “dedupe” Processor to
handle that.


> On Nov 13, 2014, at 12:02 PM, S.L <simpleliving016@gmail.com> wrote:
> 
> I am not sure if this a case of XY problem.
> 
> I have no control over the URLs to deduce an id from them , those are from
> www, I made the URL the uniqueKey , that way the document gets replaced
> when a new document with that URL comes in .
> 
> To do the detail look up I can either use the same <docURL> as it is , or
> try and generate a unique id filed for each document.
> 
> For the later option UUID is not behaving as expected in SolrCloud and
> _version_ field seems to be serving the need .
> 
> On Thu, Nov 13, 2014 at 11:35 AM, Shawn Heisey <apache@elyograg.org> wrote:
> 
>> On 11/12/2014 10:45 PM, S.L wrote:
>>> We know that _version_field is a mandatory field in solrcloud schema.xml,
>>> it is expected to be of type long , it also seems to have unique value
>> in a
>>> collection.
>>> 
>>> However the query of the form
>>> 
>> http://server1.mydomain.com:7344/solr/collection1/select/?q=*:*&fq=%28_version_:1484632548944380000%29&wt=json
>>> does not seems to return any record , can we query on the _version_field
>> in
>>> the schema.xml ?
>> 
>> I've been watching your journey unfold on the mailing list.  The whole
>> thing seems like an XY problem.
>> 
>> If I'm reading everything correctly, you want to have a unique ID value
>> that can serve as the uniqueKey, as well as a way to quickly look up a
>> single document in Solr.
>> 
>> Is there one part of the URL that serves as a unique identifier that
>> doesn't contain special characters?  It seems insane that you would not
>> have a unique ID value for every entity in your system that is composed
>> of only "regular" characters.
>> 
>> Assuming that such an ID exists (and is likely used as one piece of that
>> doctorURL that you mentioned) ... if you can extract that ID value into
>> its own field (either in your indexing code or a custom update
>> processor), you could use that for both uniqueKey and single-document
>> lookups.  Having that kind of information in your index seems like a
>> generally good idea.
>> 
>> Thanks,
>> Shawn
>> 
>> 

Mime
View raw message