Return-Path: X-Original-To: apmail-couchdb-user-archive@www.apache.org Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2107710292 for ; Wed, 9 Oct 2013 18:58:11 +0000 (UTC) Received: (qmail 3919 invoked by uid 500); 9 Oct 2013 18:58:09 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 3632 invoked by uid 500); 9 Oct 2013 18:58:08 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 3624 invoked by uid 99); 9 Oct 2013 18:58:07 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Oct 2013 18:58:07 +0000 X-ASF-Spam-Status: No, hits=1.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of siriele2x3@gmail.com designates 209.85.212.176 as permitted sender) Received: from [209.85.212.176] (HELO mail-wi0-f176.google.com) (209.85.212.176) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Oct 2013 18:58:03 +0000 Received: by mail-wi0-f176.google.com with SMTP id l12so1864034wiv.15 for ; Wed, 09 Oct 2013 11:57:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=3OJOA2Pwx/gG8mZvwYq10PfRAtZqat8aqdJBU0MAnFw=; b=UGC2Nb1V2hp35n5kHFO332cRimUoM4EshhasbBpCg5al1bOeg3YnvhwXl3UQTByuKx kNCx6ES/dX0ijcST+VPIwsdO4uNh3lWyBMx8beBk+1cwkSNku570qIEsxfOanUIG+yVZ pdvLGzYB/mkLTXGuiHANs/LtfMhGKH0LArWA30sIGAdG+hv05UWrfChkk+2qY4xLNPU3 vGksi6tDgDxepr3is1LNFArh3vAqD5m15ZPzucEJ/fRqvPRri6guwPa+e81PF90L66B8 N/yIxYoy+EYDrdPAXjmTWaN+Xg646fa2gPqc9FK+i+hBlmr9QlILgdrgA3hgB2krH7aS 4Vxw== MIME-Version: 1.0 X-Received: by 10.180.189.49 with SMTP id gf17mr3964340wic.23.1381345062559; Wed, 09 Oct 2013 11:57:42 -0700 (PDT) Received: by 10.194.58.244 with HTTP; Wed, 9 Oct 2013 11:57:42 -0700 (PDT) Received: by 10.194.58.244 with HTTP; Wed, 9 Oct 2013 11:57:42 -0700 (PDT) In-Reply-To: References: Date: Wed, 9 Oct 2013 11:57:42 -0700 Message-ID: Subject: Re: crawl webpages into couchdb From: Stanley Iriele To: user@couchdb.apache.org Content-Type: multipart/alternative; boundary=001a11c33fd0c429a704e8537546 X-Virus-Checked: Checked by ClamAV on apache.org --001a11c33fd0c429a704e8537546 Content-Type: text/plain; charset=ISO-8859-1 We're in a "I use couch db and I have q question group"... Which can q little ambiguous at tines On Oct 9, 2013 11:26 AM, "Mark Deibert" wrote: > Are we in a CouchDB group or a "web crawler" apps group? :-/ > > > On Wed, Oct 9, 2013 at 2:09 PM, Chad Cross wrote: > > > Affi, > > > > CouchDB doesn't natively solve the web crawling issue. I'm currently > > experimenting with Scrapy (http://scrapy.org) for web crawling, but I > > haven't advanced enough to start pushing my crawling data into CouchDB. > > Maybe some users out there have experience with Scrapy and CouchDB? > > > > -Chad > > > > > > On Wed, Oct 9, 2013 at 1:45 PM, Brad Rhoads wrote: > > > > > Or better yet, casperjs. > > > On Oct 7, 2013 3:37 PM, "Mark Hahn" wrote: > > > > > > > Use node, phantomjs, and the nano couchdb driver. > > > > > > > > > > > > On Mon, Oct 7, 2013 at 2:24 PM, affi > wrote: > > > > > > > > > hi , > > > > > i am a beginner at couchdb and am learning it for a uni project. i > > have > > > > > watched many tutorials on JSON and understand how to add documents. > > > but i > > > > > dont > > > > > understand how to crawl webpages and store them in the couchdb > > > database. > > > > > would definitely appreciate some help with this. thanks > > > > > > > > > > > > > > > > > > > > --001a11c33fd0c429a704e8537546--