lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thierry Collogne" <thierry.collo...@gmail.com>
Subject Re: How does HTMLStripWhitespaceTokenizerFactory work?
Date Mon, 11 Jun 2007 10:54:19 GMT
Ok. Is it possible to get back the content without the html tags?

On 08/06/07, Yonik Seeley <yonik@apache.org> wrote:
>
> On 6/8/07, Thierry Collogne <thierry.collogne@gmail.com> wrote:
> > I am trying to use the solr.HTMLStripWhitespaceTokenizerFactory analyzer
> > with no luck.
> [...]
> > Is this normal? Shouldn't the html code and the white spaces be removed
> from
> > the field?
>
> For indexing purposes, yes.  The stored field you get back will be
> unchanged though.
> If you want to see what will be indexed, try the analysis debugger in
> the admin pages.
>
> -Yonik
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message