lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From William Bell <billnb...@gmail.com>
Subject Re: Engage custom hit collector for special search processing
Date Thu, 15 Jan 2015 06:29:39 GMT
We all need example data, and a sample query to help you.

You can use "group" to group by a field and remove dupes.

If you want to remove dupes you can do something like:

q=field1:DOG AND NOT field2:DOG AND NOT field3:DOG

That will remove DOG from field2 or field3.

If you don't care if it is in any field, you can use dismax/edismax and qf,
or you can just use OR.

q=field1:DOG OR field2:DOG OR field3:DOG

If you have a set of values that you want to remove duplicates at INDEX
time you can do that with SQL (if coming from SQL), and write code in the
DIH.

var x = row.get("field1");
var x1 = row.get("field2");
var x2 = row.get("field3");

if (x.equals(x1)) {
   row.put("field2", "");
}

if (x.equals(x2)) {
   row.put("field3","");
}

That way you eliminate the dupes at index time...

Bill







On Tue, Jan 13, 2015 at 2:29 PM, tedsolr <tsmith@sciquest.com> wrote:

> I have a complicated problem to solve, and I don't know enough about
> lucene/solr to phrase the question properly. This is kind of a shot in the
> dark. My requirement is to return search results always in completely
> "collapsed" form, rolling up duplicates with a count. Duplicates are
> defined
> by whatever fields are requested. If the search requests fields A, B, C,
> then all matched documents that have identical values for those 3 fields
> are
> "dupes". The field list may change with every new search request. What I do
> know is the super set of all fields that may be part of the field list at
> index time.
>
> I know this can't be done with configuration alone. It doesn't seem
> performant to retrieve all 1M+ docs and post process in Java. A very smart
> person told me that a custom hit collector should be able to do the
> filtering for me. So, maybe I create a custom search handler that somehow
> exposes this custom hit collector that can use FieldCache or DocValues to
> examine all the matches and filter the results in the way I've described
> above.
>
> So assuming this is a viable solution path, can anyone suggest some helpful
> posts, code fragments, books for me to review? I admit to being out of my
> depth, but this requirement isn't going away. I'm grasping for straws right
> now.
>
> thanks
> (using Solr 4.9)
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Engage-custom-hit-collector-for-special-search-processing-tp4179348.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Bill Bell
billnbell@gmail.com
cell 720-256-8076

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message