lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pavel Minchenkov <char...@gmail.com>
Subject Re: Duplicates
Date Fri, 23 Jul 2010 09:57:35 GMT
Thanks, Peter!

I'll try collapsing today.

Example (sorry if table unformated):

id |  type  |   prop_1  | .... |  prop_N |  folderId
________________________________________
 0 | folder |           |      |         |
 1 | file   |  val1     |      |  valN1  |   0
 2 | file   |  val3     |      |  valN2  |   0
 3 | file   |  val1     |      |  valN3  |   0
 4 | folder |           |      |         |
 5 | folder |           |      |         |
 6 | file   |  val3     |      |  valN7  |   6
 7 | file   |  val4     |      |  valN8  |   6
 8 | folder |           |      |         |
 9 | file   |  val2     |      |  valN3  |   8
 10| file   |  val1     |      |  valN2  |   8
 11| file   |  val2     |      |  valN5  |   8
 12| folder |           |      |         |


I need to select always *one* file per folder or
select *only* folders than contains matched files (without files).

Query:
prop_1:val1 OR prop_2:val2

I need results (document ids):
1, 9
or
0, 8

2010/7/23 Peter Karich <peathal@yahoo.de>

> Hi Pavel!
>
> The patch can be applied to 1.4.
> The performance is ok, but for some situations it could be worse than
> without the patch.
> For us it works good, but others reported some exceptions
> (see the patch site: https://issues.apache.org/jira/browse/SOLR-236)
>
> > I need only to delete duplicates
>
> Could you give us an example what you exactly need?
> (Maybe you could index each master document of the 'unique' documents
> with an extra field and query for that field?)
>
> Regards,
> Peter.
>
> --
Pavel Minchenkov

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message