lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Norskog <goks...@gmail.com>
Subject Re: Solr Cell and Deduplication - Get ID of doc
Date Sat, 27 Feb 2010 04:28:54 GMT
You could create your own unique ID and pass it in with the
literal.field=value feature.

http://wiki.apache.org/solr/ExtractingRequestHandler#Input_Parameters

On Fri, Feb 26, 2010 at 7:56 AM, Bill Engle <billengledev@gmail.com> wrote:
> Any thoughts on this? I would like to get the id back in the request after
> indexing.  My initial thoughts were to do a search to get the docid  based
> on the attr_stream_name after indexing but now that I reread my message I
> mentioned the attr_stream_name (file_name) may be different so that is
> unreliable.  My only option is to somehow return the id in the XML
> response.  Any guidance is greatly appreciated.
>
> -Bill
>
> On Wed, Feb 24, 2010 at 12:06 PM, Bill Engle <billengledev@gmail.com> wrote:
>
>> Hi -
>>
>> New Solr user here.  I am using Solr Cell to index files (PDF, doc, docx,
>> txt, htm, etc.) and there is a good chance that a new file will have
>> duplicate content but not necessarily the same file name.  To avoid this I
>> am using the deduplication feature of Solr.
>>
>>   <updateRequestProcessorChain name="dedupe">
>>     <processor
>> class="org.apache.solr.update.processor.SignatureUpdateProcessorFactory">
>>       <bool name="enabled">true</bool>
>>       <str name="signatureField">id</str>
>>       <bool name="overwriteDupes">true</bool>
>>       <str name="fields">attr_content</str>
>>       <str name="signatureClass">org.apache.solr.update.processor.</str>
>>     </processor>
>>     <processor class="solr.LogUpdateProcessorFactory" />
>>     <processor class="solr.RunUpdateProcessorFactory" />
>>   </updateRequestProcessorChain>
>>
>> How do I get the "id" value post Solr processing.  Is there someway to
>> modify the curl response so that id is returned.  I need this id because I
>> would like to rename the file to the id value.  I could probably do a Solr
>> search after the fact to get the id field based on the attr_stream_name but
>> I would like to do only one request.
>>
>> curl '
>> http://localhost:8080/solr/update/extract?uprefix=attr_&fmap.content=attr_content&commit=true'
>> -F "myfile=@myfile.pdf"
>>
>> Thanks,
>> Bill
>>
>



-- 
Lance Norskog
goksron@gmail.com

Mime
View raw message