lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vjeran Marcinko <vjeran.marci...@email.t-com.hr>
Subject Re: Duplicate filtering
Date Sun, 25 Sep 2016 13:46:42 GMT
Thanx, but I'm not looking at de-deplication while adding documents, but 
de-duplication while querying.

There is DuplicateFilter in contrib lib, but filters are not used 
anymore in newer Lucene versions, so no luck there... :(

I assume I would maybe ned to implement my own Collector, but it seems 
to me that is kinda advanced thing to do, so if one has some suggestion...


On 09/21/2016 05:40 AM, Đạt Cao Mạnh wrote:
> Solr already support de-duplication when adding new documents. You can
> refer to the doc at
> https://cwiki.apache.org/confluence/display/solr/De-Duplication
>
> On Tue, Sep 20, 2016 at 12:18 PM Vjeran Marcinko <
> vjeran.marcinko@email.t-com.hr> wrote:
>
>> Hello,
>>
>> I'm pretty much Lucene newb, so wondering for some short guidelines on
>> how to implement some duplicate document filtering based on some field
>> which defines uniqueness, and first document stays, other duplicates are
>> filtered out?
>>
>> I know some 3rd party contrib lib existed before which was for that, but
>> it has been abandoned/deprecated for these newer versions of Lucene.
>>
>> Regards,
>> Vjeran
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message