From user-return-5088-apmail-couchdb-user-archive=couchdb.apache.org@couchdb.apache.org Fri Jun 12 07:55:15 2009 Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 30333 invoked from network); 12 Jun 2009 07:55:15 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 12 Jun 2009 07:55:15 -0000 Received: (qmail 85048 invoked by uid 500); 12 Jun 2009 07:55:26 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 84963 invoked by uid 500); 12 Jun 2009 07:55:25 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 84953 invoked by uid 99); 12 Jun 2009 07:55:25 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 12 Jun 2009 07:55:25 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [195.4.92.92] (HELO mout2.freenet.de) (195.4.92.92) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 12 Jun 2009 07:55:13 +0000 Received: from [195.4.92.22] (helo=12.mx.freenet.de) by mout2.freenet.de with esmtpa (ID heikohenning@freenet.de) (port 25) (Exim 4.69 #88) id 1MF1bA-0005xf-Rt for user@couchdb.apache.org; Fri, 12 Jun 2009 09:54:52 +0200 Received: from cust.static.84-253-7-59.cybernet.ch ([84.253.7.59]:17014 helo=[192.168.60.230]) by 12.mx.freenet.de with esmtpsa (ID heikohenning@freenet.de) (TLSv1:AES256-SHA:256) (port 25) (Exim 4.69 #79) id 1MF1bA-0005nB-Gj for user@couchdb.apache.org; Fri, 12 Jun 2009 09:54:52 +0200 Message-ID: <4A3209B3.90907@freenet.de> Date: Fri, 12 Jun 2009 09:54:27 +0200 From: Heiko Henning User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.8.1.21) Gecko/20090302 Lightning/0.9 Thunderbird/2.0.0.21 Mnenhy/0.7.6.666 MIME-Version: 1.0 To: user@couchdb.apache.org Subject: is this feasible with couchDb X-Enigmail-Version: 0.95.7 Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: 8bit X-Virus-Checked: Checked by ClamAV on apache.org -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello, I have just listen the CCC exp podcast with pleasurable it. And find your Database very interesting. I would also like to use it for http://www.jepaa.com/ and just like to ask you if this is realistic: One page looks like this: { "domain" : "anzeigenmarkt.tel", "txt" : "bli bla blub", "contact" : [ { "name" : "Anfrage", "ort" : "work", "data" : "info@domain.de", "type" : "mail" }, { "name" : "Suppport", "ort" : "work", "data" : "support@domain.de", "type" : "mail" }, ], "location" : { "lat" : 145.32423432, "lon" : 232.232 }, "keywords" : [ { "ul" : "Geschäft", "fn" : "Max", "ln" : "Muster", "nn" : "musti", "st" : "mustergasse" }, { "ul" : "Privat", "fn" : "Max", "ln" : "Muster", "nn" : "musti", "st" : "dorfstrasse" }, ] } Of this there are around 300 000 pieces. Now I didt like to have an fultext Index with rating: var stopwordfilter = new Array('the', 'or', for', 'them' .....) function splitIt(txt) { var data = txt.split("/[\s\,\.\!\?\-\_]+/"); for(i in stopwordfilter) for(x in data) if (data[x]==stopwordfilter[i]) delete data[x]; return data; } var domainParts = page.domain.split("/\./"); domainParts.arrayReverse(); var points = 5; for(var i=1 ; i0 ; l--) { addToIndex(domainParts.substr(0, l), points*basePoints); basePoints--; } points=points/2; } var words = splitIt(page.txt); for(var i=1 ; i0 ; l--) { addToIndex(words.substr(0, l), points); points--; } } Also an Fultext Index over all Text in the Page, but with diferent points per Word. A word in Doimain gives 15 Point a word in txt gives 10 Points and so on. If a word will only find as an part the it gives 90% of the points and so on and so on. Now to the Search Searchstring: "the wet green grass" It will search for "wet" or "green" or "grass" and page.location.lat between 100 and 130 and page.location.lon between 200 and 220 The Ordering should be like this: if will found in one page wet and grass the rating will be multipilcatetd. And then it will order by all ratings per side descending Is this realistic for couchDb and will it be performant? Friendly regards Heiko Please my worst english here you will find the german Version: http://mail-archives.apache.org/mod_mbox/couchdb-user/200906.mbox/%3C4A314597.7060102@freenet.de%3E -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFKMgmzNPIVS5vtVToRAse9AJ0TI2NQNvG7b+NgmbYWiHuD4HYoNQCffaGu sN9RCCmiV47q7h0K/aLRjJI= =rsEr -----END PGP SIGNATURE-----