lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christos Constantinou <ch...@simpleweb.co.uk>
Subject Faceting by fields that contain special characters
Date Thu, 19 Aug 2010 13:06:50 GMT
Hi all,

I am doing a faceted search on a solr field that contains URLs, for the sole purpose of trying
to locate duplicate URLs in my documents.

However, the solr response I get looks like this:
	public 'com' => int 492198
          public 'flickr' => int 492198
          public 'http' => int 492198
          public 'www' => int 253881
          public 'photo' => int 253843
          public 'n' => int 253318
          public 'httpwwwflickrcomphoto' => int 253316
          public 'farm' => int 238317
          public 'httpfarm' => int 238317
          public 'jpg' => int 238317
          public 'static' => int 238317
          public 'staticflickrcom' => int 238317
          public '5' => int 237939
          public '00' => int 61009
          public 'b' => int 59463
          public 'c' => int 59094
          public 'f' => int 59004
          public 'd' => int 58995
          public 'e' => int 58818
          public 'a' => int 58327
          public '08' => int 33797
          public '06' => int 33341
          public '04' => int 29902
          public '02' => int 29224
          public '2' => int 26671
          public '4' => int 26613
          public '6' => int 26606
          public '03' => int 26506
          public '1' => int 26389
          public '8' => int 26384
It should instead have the entire URL as the variable name, but the name is only a part of
the URL. Is this because characters like :// in http:// cannot be used in variable names?
If so, is there any workaround to the problem or an alternative way to detect duplicates?

Thanks

Christos


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message