nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harris Rappaport <hprappap...@gmail.com>
Subject Re: Searching for special characters
Date Mon, 05 Sep 2011 21:06:49 GMT
Are you sure this is the case? Do I need to change any configurations
somewhere (either to make solr search for special characters, or to make
sure nutch indexes them)? The wiki says that Nutch 1.3 treats special
characters are whitespace, is this wrong? I indexed some pages and am
testing out the search privately using http://127.0.0.1:8983/solr/admin/ as
shown on the Nutch tutorial page
https://wiki.apache.org/nutch/NutchTutorial. Special characters seem
to be ignored. For example, I can search for
"tree" and get certain results, then try "tree\+\+\+" and get exactly the
same results, even though the string "tree+++" does not appear anywhere, so
shouldn't I get no results, just as if I had searched for treennn?

On Mon, Aug 22, 2011 at 4:27 AM, Markus Jelsma
<markus.jelsma@openindex.io>wrote:

> In 1.3 search is delegated to Solr. It can happily search (or ignore)
> `special` chars.
>
> > I downloaded and played around a bit with 1.3 but I don't really have
> > anything invested in it (so if this is easier using another version, I
> > would gladly use that instead).
> >
> > On Sun, Aug 21, 2011 at 7:23 AM, Markus Jelsma
> >
> > <markus.jelsma@openindex.io>wrote:
> > > What version of Nutch are you using?
> > >
> > > > Hi,
> > > > On your the wiki (here: https://wiki.apache.org/nutch/Features ) it
> > > > says that special characters and punctuation are treated as spaces,
> > > > but it
> > >
> > > does
> > >
> > > > not say where in the code this is or how to configure it. How can I
> > > > configure nutch not to ignore special characters?
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message