Return-Path: Delivered-To: apmail-incubator-couchdb-dev-archive@locus.apache.org Received: (qmail 56089 invoked from network); 1 Apr 2008 21:54:49 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 1 Apr 2008 21:54:49 -0000 Received: (qmail 36685 invoked by uid 500); 1 Apr 2008 21:54:49 -0000 Delivered-To: apmail-incubator-couchdb-dev-archive@incubator.apache.org Received: (qmail 36651 invoked by uid 500); 1 Apr 2008 21:54:49 -0000 Mailing-List: contact couchdb-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: couchdb-dev@incubator.apache.org Delivered-To: mailing list couchdb-dev@incubator.apache.org Received: (qmail 36642 invoked by uid 99); 1 Apr 2008 21:54:49 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 Apr 2008 14:54:49 -0700 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [195.41.46.236] (HELO pfepb.post.tele.dk) (195.41.46.236) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 Apr 2008 21:53:56 +0000 Received: from pascal.widetrail.dk (0x503ed345.arcnxx11.adsl-dhcp.tele.dk [80.62.211.69]) by pfepb.post.tele.dk (Postfix) with ESMTP id CBBCCF84028 for ; Tue, 1 Apr 2008 23:54:15 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by pascal.widetrail.dk (Postfix) with ESMTP id 7134135818 for ; Tue, 1 Apr 2008 23:59:40 +0200 (CEST) Received: from pascal.widetrail.dk ([127.0.0.1]) by localhost (pascal.widetrail.dk [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 19804-02 for ; Tue, 1 Apr 2008 23:59:30 +0200 (CEST) Received: from leibniz.widetrail (unknown [10.10.1.42]) by pascal.widetrail.dk (Postfix) with ESMTP id 13C323575A for ; Tue, 1 Apr 2008 23:59:30 +0200 (CEST) From: =?iso-8859-1?q?S=F8ren_Hilmer?= Organization: wideTrail To: couchdb-dev@incubator.apache.org Subject: Fulltext description for wiki Date: Tue, 1 Apr 2008 23:57:16 +0200 User-Agent: KMail/1.9.5 MIME-Version: 1.0 Content-Type: Multipart/Mixed; boundary="Boundary-00=_8+q8HCSh5dGf3Bg" Message-Id: <200804012357.16641.sh@widetrail.dk> X-Virus-Scanned: amavisd-new at widetrail.dk X-Virus-Checked: Checked by ClamAV on apache.org --Boundary-00=_8+q8HCSh5dGf3Bg Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Hi I did the attached writeup of the fulltext interface for the wiki.=20 Have testet it using a local MoinMoin installation but glitches may occur. Have fun S=F8ren =2D-=20 S=F8ren Hilmer, M.Sc., M.Crypt. wideTrail Phone: +45 25481225 Pilev=E6nget 41 Email: sh@widetrail.dk DK-8961 Alling=E5bro Web: www.widetrail.dk --Boundary-00=_8+q8HCSh5dGf3Bg Content-Type: text/plain; charset="us-ascii"; name="lucenefulltextforwiki.txt" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="lucenefulltextforwiki.txt" =3D=3D Fulltext Indexing and Searching =3D=3D CouchDB provides and interface for facilitating integration of fulltext sea= rchengines. In addition CouchDB suplies=20 a reference implementation of this interface using [http://lucene.apache.or= g Lucene] =3D=3D=3D Index interface =3D=3D=3D CouchDB uses stdio for interfacing to the search engine, whenever a documen= t is changed the name of the database=20 containing the document is sent to stdout. CouchDB does not expect to receive anything on stdin (read it will crash if= it does). =3D=3D=3D=3D setup =3D=3D=3D=3D The indexer is started by CouchDB using the commandline specified in the co= uch.ini configurationparameter: {{{ DbUpdateNotificationProcess }}} =3D=3D=3D Search interface =3D=3D=3D CouchDB again uses stdio to interface to the searcher part. Currently this interface is not exposed through Futon, so to try it out you= need to start CouchDB with the=20 interactive option -i to get an Erlang shell. =46rom there you can write search queries like: {{{ couch_ft_query:execute("database", "+ query +string"). }}} =46or this example the string "database\n" followed by "+ query +string\n" = is transmitted to stdout. The result of the search is received through stdin and displayed in the she= ll, currently the format of this is not specified and left to the particular search engine. =3D=3D=3D=3D setup =3D=3D=3D=3D The searcher is started by CouchDB using the commandline specified in the c= ouch.ini configurationparameter: {{{ =46ullTextSearchQueryServer }}} =3D=3D=3D Lucene reference implementation =3D=3D=3D =3D=3D=3D=3D Use of special design document =3D=3D=3D=3D A database to index must contain a speciel design document in this format: {{{ { "_id":"_design/fulltextsearch", "_rev":"123", "fulltext_options": { "views": { "names" : {"index":"view-value", "return":"document"}, "cities": {"index":"view-key", "return":"view"} } } } }}} The Lucene indexer uses the defined views in this document to guide the ind= exing process.=20 In this example the views "names" and "cities" must also be defined in the = database.=20 Lucene will index the "view-value" for the "names" view and return document= s as search results,=20 for the "cities" view it will index the view-key and return the view in sea= rch results. =46or info on views in CouchDB see: Self:Views =3D=3D=3D=3D Dependencies =3D=3D=3D=3D The Lucene indexer depends on these projects .jar files to work * couchdb4j.jar (see below) * commons-beanutils.jar * commons-codec-1.3.jar * commons-collections.jar * commons-httpclient-3.1.jar * commons-lang.jar * commons-logging-1.1.jar * ezmorph-1.0.3.jar * json-lib-2.0-jdk15.jar * lucene-core-2.3.1.jar Note: all the couchdb4j dependencies (as you can see some have not version info supplied) is probably easily checked out from the couchdb4j repository (see below). Note: at this time of writing couchdb4j needs to be patched using the patch= es specified in issue 6 and 8=20 on the coucdb4j issue tracking list: http://code.google.com/p/couchdb4j/iss= ues/list So checkout trunk patch and build. At least Java version 5 is needed. =3D=3D=3D=3D Compiling =3D=3D=3D=3D The Lucene indexer is not build as part of the CouchDB.=20 You need to: * setup a Java developer environment (at least version 5).=20 * Checkout CouchDB source. * Change directory to src/fulltext/lucene * Compile using javac with CLASSPATH with the needed dependencies (listed = above) * Do: jar cf !CouchLucene.jar *.class=20 As result you should get a file !CouchLucene.jar to include in your CLASSPA= TH at runtime. =3D=3D=3D=3D Runtime setup =3D=3D=3D=3D You need a path to your java runtime (at least version 5). You have to setup your java CLASSPATH to contain all the .jar files listed = in the dependency list, alternatively you can specify it on the commandline deifined for the .ini o= ptions like: {{{ =46ullTextSearchQueryServer=3Djava -cp /path/to/couchdb4j/lib/couchdb4j.jar= :... LuceneSearcher DbUpdateNotificationProcess=3Djava -cp /path/to/couchdb4j/lib/couchdb4j.jar= :... LuceneIndexer }}} Note above example works on Unix like OS's --Boundary-00=_8+q8HCSh5dGf3Bg--