manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Steenbeke (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CONNECTORS-1584) regex documentation
Date Thu, 21 Feb 2019 15:06:00 GMT

    [ https://issues.apache.org/jira/browse/CONNECTORS-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16774146#comment-16774146
] 

Tim Steenbeke commented on CONNECTORS-1584:
-------------------------------------------

{panel:title=Failure notice send by MAILER-DAEMON@apache.org}
 
Hi. This is the qmail-send program at apache.org.
I'm afraid I wasn't able to deliver your message to the following addresses.
This is a permanent error; I've given up. Sorry it didn't work out.

<user@manifoldcf.apache.org>:
Must be sent from an @apache.org address or a subscriber address or an address in LDAP.

--- Below this line is a copy of the message.

Return-Path: <Tim.Steenbeke@formica.digital>
Received: (qmail 90034 invoked by uid 99); 18 Feb 2019 10:35:51 -0000
Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142)
    by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 18 Feb 2019 10:35:51 +0000
Received: from localhost (localhost [127.0.0.1])
        by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org)
with ESMTP id 07C55C84A2
        for <user@manifoldcf.apache.org>; Mon, 18 Feb 2019 10:35:51 +0000 (UTC)
X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org
X-Spam-Flag: NO
X-Spam-Score: 1.998
X-Spam-Level: *
X-Spam-Status: No, score=1.998 tagged_above=-999 required=6.31
        tests=[DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1,
        HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_PASS=-0.001,
        SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled
Authentication-Results: spamd1-us-west.apache.org (amavisd-new);
        dkim=pass (1024-bit key) header.d=cronos.onmicrosoft.com
Received: from mx1-lw-eu.apache.org ([10.40.0.8])
        by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024)
        with ESMTP id bzxj-Zwazahp for <user@manifoldcf.apache.org>;
        Mon, 18 Feb 2019 10:35:47 +0000 (UTC)
Received: from EUR02-HE1-obe.outbound.protection.outlook.com (mail-eopbgr10062.outbound.protection.outlook.com
[40.107.1.62])
        by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS
id 5432D5F533
        for <user@manifoldcf.apache.org>; Mon, 18 Feb 2019 10:35:47 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=CRONOS.onmicrosoft.com; s=selector1-CRONOS-onmicrosoft-com;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
 bh=NXxuOXxO7L5OIh8wemB0u1esV8BQdvefAryTpMAPvDU=;
 b=dZRUlfL4a6CvpIZbLZVeakgTuNXTti3W/oO9VcpZrao8Odjy7PljvmTce1+2kx3NxG/uWOFVhgaHgYSJXBOwRSVRwW/Ovx6YP1z5fw5nBpdoux666pZd7uzLlTJSM5kNOLwqrU2fIdSkW3J6qFqB1TMMu8Jm4BonW/kXylfb0SY=
Received: from AM6PR0302MB3256.eurprd03.prod.outlook.com (52.133.27.27) by
 AM6PR0302MB3383.eurprd03.prod.outlook.com (52.133.28.10) with Microsoft SMTP
 Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 15.20.1622.19; Mon, 18 Feb 2019 10:35:40 +0000
Received: from AM6PR0302MB3256.eurprd03.prod.outlook.com
 ([fe80::a8f3:3f23:b1f3:8ce6]) by AM6PR0302MB3256.eurprd03.prod.outlook.com
 ([fe80::a8f3:3f23:b1f3:8ce6%5]) with mapi id 15.20.1622.018; Mon, 18 Feb 2019
 10:35:40 +0000
From: Steenbeke Tim <Tim.Steenbeke@formica.digital>
To: "user@manifoldcf.apache.org" <user@manifoldcf.apache.org>
Subject: Regex support
Thread-Topic: Regex support
Thread-Index: AQHUx3UQjsHP1lgCt0uYyVLe47S0rw==
Date: Mon, 18 Feb 2019 10:35:40 +0000
Message-ID:
 <AM6PR0302MB3256CB7A19417B6DC010A3FBED630@AM6PR0302MB3256.eurprd03.prod.outlook.com>
Accept-Language: en- Content-Language: en- X-MS-Has-Attach:
X-MS-TNEF-Correlator:
authentication-results: spf=none (sender IP is )
 smtp.mailfrom=Tim.Steenbeke@formica.digital; 
x-originating-ip: [94.143.189.241]
x-ms-publictraffictype: Email
x-ms-office365-filtering-correlation-id: 8b0d3335-aeb7-4dc8-f619-08d6958cd411
x-microsoft-antispam:
 BCL:0;PCL:0;RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600110)(711020)(4605104)(2017052603328)(7153060)(7193020);SRVR:AM6PR0302MB3383;
x-ms-traffictypediagnostic: AM6PR0302MB3383:
x-ms-exchange-purlcount: 2
x-microsoft-exchange-diagnostics:
 1;AM6PR0302MB3383;20:GL97sCN3oMJg9YDuqqZQjTkFnP+s9blDsxlMF5L7uIMW/Cz7EUmc2qn4aUHZ/Gk7T7u0uYQUMqr5RYnJ4UUZF2FRDvKg91ZSHM2t/jcwq+Udc5ibZTY5ZByYX7bVG9i6ZqCb2tLa/S///Mc2MjH8KqSVacv1zGyCeBiOczfh3E4=
x-microsoft-antispam-prvs:
 <AM6PR0302MB338386CFA08642DF3031A601ED630@AM6PR0302MB3383.eurprd03.prod.outlook.com>
x-forefront-prvs: 09525C61DB
x-forefront-antispam-report:
 SFV:NSPM;SFS:(10009020)(39860400002)(366004)(396003)(136003)(346002)(376002)(199004)(189003)(106356001)(7696005)(72206003)(7736002)(966005)(2906002)(99286004)(3480700005)(74316002)(71200400001)(71190400001)(6116002)(3846002)(4744005)(105586002)(316002)(19627405001)(2351001)(256004)(606006)(25786009)(861006)(486006)(476003)(33656002)(6506007)(26005)(6916009)(5660300002)(81156014)(7116003)(186003)(102836004)(8676002)(97736004)(1730700003)(81166006)(8936002)(14454004)(53936002)(478600001)(105004)(68736007)(6306002)(54896002)(86362001)(66066001)(733005)(53376002)(221733001)(5640700003)(55016002)(9686003)(236005)(6436002)(2501003)(19273905006)(46492003)(562404015)(563064011);DIR:OUT;SFP:1101;SCL:1;SRVR:AM6PR0302MB3383;H:AM6PR0302MB3256.eurprd03.prod.outlook.com;FPR:;SPF:None;LANG:en;PTR:InfoNoRecords;A:1;MX:1;
received-spf: None (protection.outlook.com: formica.digital does not designate
 permitted sender hosts)
x-ms-exchange-senderadcheck: 1
x-microsoft-antispam-message-info:
 EMPljRkNmTcSM1j5nMvNwDndVko9eGF6RcOYtpOSbyRY2kh9SX6vORYgHfViN5iQu98bqnv3p6gv+/nraMEuFBuyMaXvYIpeOHohAxeofpX3r86GS1OD8R+CdzPHqZ6gzr7tijy0lXCyqXvj0cYvXUE6rDeu3ZlBy37m62ecWB61iVQaeqGrFlO5fkxU/52AxjRZwrcMWbuhJvna/Bsee76ONLBGghQj86TBMh3bhoT2+4/h4Iq4SqOBU/ZE7yL3VfgiyWfFo6TbEwkjE6RajvfzdE3sCW5jI8sXOI3eiuNc6+c/KWFJc5cPXbfxLJ6t03x4nrGevuL/vBz3w2ToNWIPaVNYA7Z1sl+LflHQK98fyRH5wDjziheu5ADA1TXjG8tKzDCFFv6SrT17QUN/gwhQfX4GR0+UtqhtBYBofSQ=
Content-Type: multipart/alternative;
        boundary="_000_AM6PR0302MB3256CB7A19417B6DC010A3FBED630AM6PR0302MB3256_"
MIME-Version: 1.0
X-OriginatorOrg: formica.digital
X-MS-Exchange-CrossTenant-Network-Message-Id: 8b0d3335-aeb7-4dc8-f619-08d6958cd411
X-MS-Exchange-CrossTenant-originalarrivaltime: 18 Feb 2019 10:35:40.4230
 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 49c3d703-3579-47bf-a888-7c913fbdced9
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM6PR0302MB3383

--_000_AM6PR0302MB3256CB7A19417B6DC010A3FBED630AM6PR0302MB3256_
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

Dear manifoldCf support

What type of regexs does manifold include and exclude support and also in g=
eneral?

At the moment i'm using a web repository connection and an Elastic output c=
onnection.
I'm trying to exclude urls that link to documents.
           e.g.: [https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwebsite.com%2Fsomething%2Fpath%2Ffile.pdf&amp;data=02%7C01%7CTim.Steenbeke%40formica.digital%7C833999f1a42f46139b4208d6958cdc39%7C49c3d703357947bfa8887c913fbdced9%7C0%7C1%7C636860829562948053&amp;sdata=T%2B%2B3YBSgMSrsKLN%2BhaXP4Nz5Erw7fylGUyp9GpwWZhE%3D&amp;reserved=0] and [http://webs=|http://mobile-mail.google.com/-1911833645/4845611814004996015]
ite.com/something/path/file.<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwebsite.com%2Fsomething%2Fpath%2Ffile.pdf&amp;data=02%7C01%7CTim.Steenbeke%40formica.digital%7C833999f1a42f46139b4208d6958cdc39%7C49c3d703357947bfa8887c913fbdced9%7C0%7C1%7C636860829562948053&amp;sdata=T%2B%2B3YBSgMSrsKLN%2BhaXP4Nz5Erw7fylGUyp9GpwWZhE%3D&amp;reserved=0>PDF
The issue i'm having is that the regex that I have found so far doesn't wor=
k case insensitive, so for every possible case i have to add a new line.
           e.g.: .*.pdf$ and .*.PDF$ and .*.Pdf and ... .

Is it possible to add documentation what type of regex is able to be used o=
r maybe a tool to test your regex and see if it is supported by manifold ?


kind regards


[https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.formica.digital%2F.resources%2Fformica-magnolia-theme%2Fassets%2Fimg%2Fl&amp;data=02%7C01%7CTim.Steenbeke%40formica.digital%7C833999f1a42f46139b4208d6958cdc39%7C49c3d703357947bfa8887c913fbdced9%7C0%7C1%7C636860829562948053&amp;sdata=BefQSqkWc%2FnHy86py%2FqVnQpgb6w56fJF3Wm4o%2FgonJY%3D&amp;reserved=0=
ogo-symbol.png]
Tim Steenbeke
Consultant
M: tim.steenbeke@formica.digital
T: +32 497 03 66 69

[https://emea01.safelinks.protection.outlook.com/?url=www.formica.digital&amp;data=02%7C01%7CTim.Steenbeke%40formica.digital%7C833999f1a42f46139b4208d6958cdc39%7C49c3d703357947bfa8887c913fbdced9%7C0%7C1%7C636860829562948053&amp;sdata=RBHgpvxq1A%2FkusR1gLtwhz%2F1UAibu4Gk0HNudIKlwR8%3D&amp;reserved=0<https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.formica.digital%2F&amp;data=02%7C01%7CTim.Steenbeke%40formica.digital%7C833999f1a42f46139b4208d6958cdc39%7C49c3d703357947bfa8887c913fbdced9%7C0%7C1%7C636860829562948053&amp;sdata=iMpBe9IilcCo0j6pRMvR2MjYxH6O7dq%2B53ye9K6387o%3D&amp;reserved=0|https://emea01.safelinks.protection.outlook.com/?url=www.formica.digital&amp;data=02%7C01%7CTim.Steenbeke%40formica.digital%7C833999f1a42f46139b4208d6958cdc39%7C49c3d703357947bfa8887c913fbdced9%7C0%7C1%7C636860829562948053&amp;sdata=RBHgpvxq1A%2FkusR1gLtwhz%2F1UAibu4Gk0HNudIKlwR8%3D&amp;reserved=0%3Chttps://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.formica.digital%2F&amp;data=02%7C01%7CTim.Steenbeke%40formica.digital%7C833999f1a42f46139b4208d6958cdc39%7C49c3d703357947bfa8887c913fbdced9%7C0%7C1%7C636860829562948053&amp;sdata=iMpBe9IilcCo0j6pRMvR2MjYxH6O7dq%2B53ye9K6387o%3D&amp;reserved=0]>


--_000_AM6PR0302MB3256CB7A19417B6DC010A3FBED630AM6PR0302MB3256_
Content-Type: text/html; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<html>
<head>
<meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Diso-8859-=
1">
<style type=3D"text/css" style=3D"display:none;"> P {margin-top:0;margin-bo=
ttom:0;} </style>
</head>
<body dir=3D"ltr">
<div style=3D"font-family: Tahoma, Geneva, sans-serif; font-size: 12pt; col=
or: rgb(0, 0, 0);">
Dear manifoldCf support<br>
<br>
What type of regexs does manifold include and exclude support and also in g=
eneral?<br>
<br>
</div>
<div style=3D"font-family: Tahoma, Geneva, sans-serif; font-size: 12pt; col=
or: rgb(0, 0, 0);">
At the moment i'm using a web repository connection and an Elastic output c=
onnection.<br>
I'm trying to exclude urls that link to documents.</div>
<div style=3D"font-family: Tahoma, Geneva, sans-serif; font-size: 12pt; col=
or: rgb(0, 0, 0);">
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;e.g.: <a href=3D"https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwebsite.co&amp;data=02%7C01%7CTim.Steenbeke%40formica.digital%7C833999f1a42f46139b4208d6958cdc39%7C49c3d703357947bfa8887c913fbdced9%7C0%7C1%7C636860829562948053&amp;sdata=KpZCQj4pXEcPQS65ryXpv%2Bwh9zDyDXLvE5mhODLz7fc%3D&amp;reserved=0=
m/something/path/file.pdf" id=3D"LPlnk629905">
[https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwebsite.com%2Fsomething%2Fpath%2Ffile.pdf&amp;data=02%7C01%7CTim.Steenbeke%40formica.digital%7C833999f1a42f46139b4208d6958cdc39%7C49c3d703357947bfa8887c913fbdced9%7C0%7C1%7C636860829562958058&amp;sdata=VBFNcm3XNPd3vWaE8ytzndIsMfm8QSr%2FrOvGdsLhuoQ%3D&amp;reserved=0</a>&nbsp;and&nbsp;<a|https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwebsite.com%2Fsomething%2Fpath%2Ffile.pdf&amp;data=02%7C01%7CTim.Steenbeke%40formica.digital%7C833999f1a42f46139b4208d6958cdc39%7C49c3d703357947bfa8887c913fbdced9%7C0%7C1%7C636860829562958058&amp;sdata=VBFNcm3XNPd3vWaE8ytzndIsMfm8QSr%2FrOvGdsLhuoQ%3D&amp;reserved=0%3C/a%3E&nbsp;and&nbsp;%3Ca]href=3D"htt=
p://website.com/something/path/file.pdf" style=3D"margin: 0px; font-family:=
 Tahoma, Geneva, sans-serif; background-color: rgb(255, 255, 255)" id=3D"LP=
lnk423139">https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwebsite.com%2Fsomething%2Fpath%2Ffile&amp;data=02%7C01%7CTim.Steenbeke%40formica.digital%7C833999f1a42f46139b4208d6958cdc39%7C49c3d703357947bfa8887c913fbdced9%7C0%7C1%7C636860829562958058&amp;sdata=JoTpcd6VPobVXo5R5oYzPkdhHGZ7ixBHVDp7J%2FSuSFU%3D&amp;reserved=0.</a>PDF&nbsp;</div>
<div style=3D"font-family: Tahoma, Geneva, sans-serif; font-size: 12pt; col=
or: rgb(0, 0, 0);">
The issue i'm having is that the regex that I have found so far doesn't wor=
k&nbsp;case insensitive, so for every possible case i have to add a new lin=
e.&nbsp;</div>
<div style=3D"font-family: Tahoma, Geneva, sans-serif; font-size: 12pt; col=
or: rgb(0, 0, 0);">
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;e.g.: .*.pdf$ and .*.PDF$
and .*.P=
df and ... .</div>
<div style=3D"font-family: Tahoma, Geneva, sans-serif; font-size: 12pt; col=
or: rgb(0, 0, 0);">
<br>
</div>
<div style=3D"font-family: Tahoma, Geneva, sans-serif; font-size: 12pt; col=
or: rgb(0, 0, 0);">
Is it possible to add documentation what type of regex is able to be used o=
r maybe a tool to test your regex and see if it is supported by manifold ?<=
br>
</div>
<div style=3D"font-family: Tahoma, Geneva, sans-serif; font-size: 12pt; col=
or: rgb(0, 0, 0);">
<br>
</div>
<div style=3D"font-family: Tahoma, Geneva, sans-serif; font-size: 12pt; col=
or: rgb(0, 0, 0);">
<br>
</div>
<div style=3D"font-family: Tahoma, Geneva, sans-serif; font-size: 12pt; col=
or: rgb(0, 0, 0);">
kind regards&nbsp;&nbsp;</div>
<div style=3D"font-family: Tahoma, Geneva, sans-serif; font-size: 12pt; col=
or: rgb(0, 0, 0);">
<br>
</div>
<div style=3D"font-family: Tahoma, Geneva, sans-serif; font-size: 12pt; col=
or: rgb(0, 0, 0);">
<br>
</div>
<div id=3D"signature">
<div style=3D"font-family:Tahoma; font-size:13px">
<div style=3D"font-family:Tahoma; font-size:13px">
<div style=3D"font-family:Tahoma; font-size:13px">
<div style=3D"font-family:Tahoma; font-size:13px">
<div style=3D"font-family:Tahoma; font-size:13px">
<div style=3D"font-family:Tahoma; font-size:13px">
<div style=3D"font-family:Tahoma">
<div>
<table style=3D"font-family:Verdana,Arial,sans-serif; font-size:11px; borde=
r-style:none; border-width:0px">
<tbody>
<tr>
<td style=3D"padding:0px 10px 0px 0px; max-height:85px; max-width:70px; mar=
gin-right:20px; border-color:rgb(56,67,72); border-width:1px; border-right-=
style:solid">
<img width=3D"70" height=3D"94" src=3D"https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.formica.digital%2F.resourc&amp;data=02%7C01%7CTim.Steenbeke%40formica.digital%7C833999f1a42f46139b4208d6958cdc39%7C49c3d703357947bfa8887c913fbdced9%7C0%7C1%7C636860829562958058&amp;sdata=gzGd36adZ62rrMVJjaHwDiA%2BT%2B9sU9T0h%2BFuGGS4%2Fbg%3D&amp;reserved=0=
es/formica-magnolia-theme/assets/img/logo-symbol.png"></td>
<td style=3D"padding-left:20px">
<table style=3D"border-style:none; border-width:0px">
<tbody>
<tr>
<td style=3D"border-style:none; border-width:0px"><b>Tim Steenbeke</b></td>
</tr>
<tr>
<td>Consultant</td>
</tr>
<tr>
<td><span style=3D"color:rgb(0,159,227)">M:&nbsp;</span>tim.steenbeke@formi=
ca.digital</td>
</tr>
<tr>
<td><span style=3D"color:rgb(150,193,31)">T:&nbsp;</span>&#43;32
497 03 66 =
69<br>
</td>
</tr>
<tr>
<td height=3D"25" valign=3D"bottom"><a title=3D"" href=3D"https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.formi&amp;data=02%7C01%7CTim.Steenbeke%40formica.digital%7C833999f1a42f46139b4208d6958cdc39%7C49c3d703357947bfa8887c913fbdced9%7C0%7C1%7C636860829562958058&amp;sdata=%2BdRCArzs5ZW8joqX094eFkSymGbJVbb9AYqx3eJtS48%3D&amp;reserved=0=
ca.digital/" target=3D"_blank" style=3D"color:rgb(56,67,72)">https://emea01.safelinks.protection.outlook.com/?url=www.formica.di&amp;data=02%7C01%7CTim.Steenbeke%40formica.digital%7C833999f1a42f46139b4208d6958cdc39%7C49c3d703357947bfa8887c913fbdced9%7C0%7C1%7C636860829562958058&amp;sdata=BL60pUH0iOOvEwDeY5ZKCAO7JbW8nEEg7m9vztKIRkc%3D&amp;reserved=0=
gital</a></td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</div
 
 {panel}

> regex documentation
> -------------------
>
>                 Key: CONNECTORS-1584
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1584
>             Project: ManifoldCF
>          Issue Type: Improvement
>          Components: Web connector
>    Affects Versions: ManifoldCF 2.12
>            Reporter: Tim Steenbeke
>            Priority: Minor
>
> What type of regexs does manifold include and exclude support and also in general regex
support?
> At the moment i'm using a web repository connection and an Elastic output connection.
>  I'm trying to exclude urls that link to documents.
>           e.g. website.com/document/path/this.pdf and website.com/document/path/other.PDF
> The issue i'm having is that the regex that I have found so far doesn't work case insensitive,
so for every possible case i have to add a new line.
>             e.g.:
> {code:java}
> .*.pdf$ and .*.PDF$ and .*.Pdf and ... .{code}
> Is it possible to add documentation what type of regex is able to be used or maybe a
tool to test your regex and see if it is supported by manifold ?
> I tried mailing this question to [user@manifoldcf.apache.org|mailto:user@manifoldcf.apache.org] but
this mail adress returns a failure notice.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message