From user-return-5712-archive-asf-public=cust-asf.ponee.io@manifoldcf.apache.org Thu Feb 21 02:17:22 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id EC48C18075F for ; Thu, 21 Feb 2019 03:17:21 +0100 (CET) Received: (qmail 52649 invoked by uid 500); 21 Feb 2019 02:17:21 -0000 Mailing-List: contact user-help@manifoldcf.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@manifoldcf.apache.org Delivered-To: mailing list user@manifoldcf.apache.org Received: (qmail 52639 invoked by uid 99); 21 Feb 2019 02:17:20 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 21 Feb 2019 02:17:20 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 403E1C1BB2 for ; Thu, 21 Feb 2019 02:17:20 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.798 X-Spam-Level: * X-Spam-Status: No, score=1.798 tagged_above=-999 required=6.31 tests=[DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id AyivE3s0SJ0e for ; Thu, 21 Feb 2019 02:17:18 +0000 (UTC) Received: from mail-io1-f68.google.com (mail-io1-f68.google.com [209.85.166.68]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 6C4BA5FE48 for ; Thu, 21 Feb 2019 02:17:18 +0000 (UTC) Received: by mail-io1-f68.google.com with SMTP id p17so198049iol.7 for ; Wed, 20 Feb 2019 18:17:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=kTFia/10WeDWS/Xwv0aawlpY4tUxcLO+FQ1IjJM3qrA=; b=p+RT37BsX4PB1D0kXPjLgyo2Xey8onDEyeq0j/8zZ6I8zzF4cubcjl1/Yv0aV1ZGki Q/uU5oic8IQ+CwEgpCVqCca+XGmZkkHmHu5DC/GcCF0ufLPZBEoahcrdIomV4hX12mMQ Bc0MoCalyTsNqefYcQMfEGU0yKlOnyZiWXn1ql2pVAOWuqL0BaJdFBkTMPDT3TdPPaX3 dWjpLF8NWtpP02tIsfnoLnY8CKuq74GCcuWmaXyw4I35z+GBa4zOCSAovWbis0rKb9q2 WHsXMMw/3p7+uiO6e54Id2CanK8R8nfSv0PGSlMVpTlNzHHlRDp4kY+jzQ8oMcVt4Tnh sXjA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=kTFia/10WeDWS/Xwv0aawlpY4tUxcLO+FQ1IjJM3qrA=; b=USJqd1MHCcwBW+G+otsB1b33fnoKcA7PZ4LHM4p3pffNIIeSTCSfcnMaK2dM1S81Ls Z+Qblm+kNEDxpDpfKAvh0pGp2TMx2Hn6+lCwvF++zgt184TK3xkBPKLYrD5lpUP66RzO 4JoJTxpfum+/M6JSPMitWkB8G8Ac89jjej8ms297T5FnlKJReEFjQxPz6sdKHSEPptiA eGdFlJf3egimS9z1vL8ctpEHWSel4DPNftYweIqNGwugXOvHWKz/Eeh4CzY9RGlqHKjs y9WkJpad2qMxPemDjFR/nHB/tNS5k9HcUzGjVqCG0/6eDBBMEZLHBE1PLpcqNa0DJqII AfiA== X-Gm-Message-State: AHQUAuZURhG8sORRCqBQ867o62uG3iu0tJrz84iIvS7UqcShZ+4315sA DbdqbP3uv3LibeNI3UonFjFGOMFp2h1POOcBPgyZICxf X-Google-Smtp-Source: AHgI3IbBH1WVHMWSF4Djtcir1V9/GeJt8orUIwbtN0euwpJFqBku7OaEBd4Zhhfa591Nk0kCA/R4kaSu7bDWwMkJS5U= X-Received: by 2002:a6b:3844:: with SMTP id f65mr22547794ioa.179.1550715431481; Wed, 20 Feb 2019 18:17:11 -0800 (PST) MIME-Version: 1.0 From: Kayak28 Date: Thu, 21 Feb 2019 11:17:00 +0900 Message-ID: Subject: To: user@manifoldcf.apache.org Content-Type: multipart/alternative; boundary="00000000000070f35d05825e1250" --00000000000070f35d05825e1250 Content-Type: text/plain; charset="UTF-8" Hello, falks: I have a question about crawling and scraping in Manifold CF. I want to the following sequence of tasks by using MCF. 1. crawling data from RESTful api 2. scraping data 3. insert the data to Apache Solr In this case, how I need to setup Manifold CF is: 1. define output connector to access RESTful api (by using Web crawler connector or Generic connector? ) 2. define transformer connector to scrap html (by using html-extractor transformer connector...?) 3. define output connector to be Solr OR do I have to use other software such as Apache Nifi to control the sequence of these tasks? I appreciate for any comments and replays. Sincerely, Kaya --00000000000070f35d05825e1250 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hello, falks:

I have a= question about crawling and scraping in Manifold CF.
I want to t= he following sequence of tasks by using MCF.

1. cr= awling data from RESTful api=C2=A0
2. scraping data=C2=A0
3. insert the data to Apache Solr

In this case,= how I need to setup Manifold CF is:
1. define output connector t= o access RESTful api (by using Web crawler connector or Generic connector? = )

2. define transformer connector to scrap html (b= y using html-extractor transformer connector...?)
3. define outpu= t connector to be Solr=C2=A0


OR do = I have to use other software such as Apache Nifi to=C2=A0control the sequen= ce of these tasks?=C2=A0

I appreciate for any = comments and replays.
=C2=A0
Sincerely,
Kaya<= /div>


--00000000000070f35d05825e1250--