From solr-user-return-138178-archive-asf-public=cust-asf.ponee.io@lucene.apache.org Mon Jan 8 22:42:01 2018 Return-Path: X-Original-To: archive-asf-public@eu.ponee.io Delivered-To: archive-asf-public@eu.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by mx-eu-01.ponee.io (Postfix) with ESMTP id CDD90180607 for ; Mon, 8 Jan 2018 22:42:01 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id BDE1F160C2C; Mon, 8 Jan 2018 21:42:01 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 101F8160C29 for ; Mon, 8 Jan 2018 22:42:00 +0100 (CET) Received: (qmail 32674 invoked by uid 500); 8 Jan 2018 21:41:59 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 32663 invoked by uid 99); 8 Jan 2018 21:41:58 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 08 Jan 2018 21:41:58 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 7FCC11A0A50 for ; Mon, 8 Jan 2018 21:41:58 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.991 X-Spam-Level: X-Spam-Status: No, score=0.991 tagged_above=-999 required=6.31 tests=[KAM_LAZY_DOMAIN_SECURITY=1, MIME_QP_LONG_LINE=0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id IfvJ_ey7mizP for ; Mon, 8 Jan 2018 21:41:57 +0000 (UTC) Received: from mail1.ams.nl.openindex.io (mail1.ams.nl.openindex.io [141.105.125.41]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 5D03B5F24E for ; Mon, 8 Jan 2018 21:41:57 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by mail1.ams.nl.openindex.io (Postfix) with ESMTP id 8488C380C18 for ; Mon, 8 Jan 2018 21:41:51 +0000 (UTC) Received: from mail1.ams.nl.openindex.io ([127.0.0.1]) by localhost (mail1.ams.nl.openindex.io [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id RZJci4PtZybg for ; Mon, 8 Jan 2018 21:41:51 +0000 (UTC) Received: from mail1.ams.nl.openindex.io (localhost [127.0.0.1]) by mail1.ams.nl.openindex.io (Postfix) with ESMTP id 66593380C16 for ; Mon, 8 Jan 2018 21:41:51 +0000 (UTC) Subject: RE: Profanity From: =?utf-8?Q?Markus_Jelsma?= To: =?utf-8?Q?solr-user=40lucene=2Eapache=2Eorg?= Date: Mon, 8 Jan 2018 21:41:51 +0000 Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Priority: 3 (Normal) X-Mailer: Zarafa 7.2.1-51838 X-Original-To: Message-Id: Yes, an UpdateRequestProcessor is the API to implement for these sorts of requirements. In the URP you have access to a SolrDocument object that carries the input data. You can inspect the fields, and add, remove or modify fields if you want, or discard the input altogether. So, check your text input field for 'profanity' and set another boolean field if it matches or doesn't. If you are using a list of words - or an SVM or another machine learning algorithm - to detect provanity is up to you. Cheers, Markus =20 -----Original message----- > From:Sadiki Latty > Sent: Monday 8th January 2018 22:12 > To: solr-user@lucene.apache.org > Subject: Profanity >=20 > Hey >=20 > I would like to find a solution to flag (at index-time) profanity. Optimally, it would be good if it function similar to stopwords in the sense that I can have a predefined list that is read and if token is on the list that document is 'flagged' in a different field. Does anyone know of solution (outside of configuring my own). If none exists and I end up configuring my own would I be doing this in the updateprcoessor phase. I am still fairly new to Solr, but from what I've read, that seems to be the best place to look. >=20 >=20 > Thanks, >=20 > Sid >=20