lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ahmet Arslan <iori...@yahoo.com>
Subject Re: HTMLStripCharFilterFactory configuration problem
Date Sat, 17 Apr 2010 14:56:07 GMT


> Actually I am using SolrJ client..
> Is there anyway to do same using solrj.
> 
> thanks

If you are using Java, life is easier. You can use this static function before adding a field
to SolrInputDocument.

static String stripHTMLX(String value) {
        StringBuilder out = new StringBuilder();
        StringReader strReader = new StringReader(value);
        try {
            HTMLStripCharFilter html = new HTMLStripCharFilter(CharReader.get(strReader.markSupported()
? strReader : new BufferedReader(strReader)));
            char[] cbuf = new char[1024 * 10];
            while (true) {
                int count = html.read(cbuf);
                if (count == -1)
                    break; // end of stream mark is -1
                if (count > 0)
                    out.append(cbuf, 0, count);
            }
            html.close();
        } catch (IOException e) {
            e.printStackTrace();
            return null;
            //  "Failed stripping HTML for column: " + column, e);
        }
        return out.toString();
    }


      

Mime
View raw message