lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gabriele Kahlout <gabri...@mysimpatico.com>
Subject Re: How to make the url id case insensitive?
Date Mon, 05 Sep 2011 10:26:47 GMT
On Mon, Sep 5, 2011 at 1:22 PM, Markus Jelsma <markus.jelsma@openindex.io>wrote:

> Hi,
>
> URI paths are case-sensitive. If you really want to treat all URL's as
> case-
> insensitive i would suggest to modifiy the basic URL normalizer to
> lowercase
> all URL's so that it also ends up lowercased in the CrawlDB.
>
> What is your problem? I would strongly suggest another solution if you're
> doing wide web crawls.
>

I don't want duplicate results where the only real difference is the case of
some letters in the URL.
What other solution?


>
> Cheers,
>
> > Hi,
> > I've just noticed that two search results of indexed data have the same
> > url:
> >
> > http://www.atory.com/dupe_checker_pro/
> > http://www.atory.com/dupe_checker_PRO/
> >
> > I thought the url/id was case-insentively unique. Is there how I can set
> it
> > up to be so?
> >
> > For Solr it makes sense not to make it the default for disparate uses,
> but
> > for nutch not.
>



-- 
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains "[LON]" or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
< Now + 48h) ⇒ ¬resend(I, this).

If an email is sent by a sender that is not a trusted contact or the email
does not contain a valid code then the email is not received. A valid code
starts with a hyphen and ends with "X".
∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
L(-[a-z]+[0-9]X)).

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message