Return-Path: X-Original-To: apmail-couchdb-user-archive@www.apache.org Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4E9F7898B for ; Mon, 5 Sep 2011 09:55:38 +0000 (UTC) Received: (qmail 61351 invoked by uid 500); 5 Sep 2011 09:55:36 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 60450 invoked by uid 500); 5 Sep 2011 09:55:17 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 60441 invoked by uid 99); 5 Sep 2011 09:55:12 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 Sep 2011 09:55:12 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [74.125.82.180] (HELO mail-wy0-f180.google.com) (74.125.82.180) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 Sep 2011 09:55:05 +0000 Received: by wyj26 with SMTP id 26so5393018wyj.11 for ; Mon, 05 Sep 2011 02:54:44 -0700 (PDT) Received: by 10.227.177.133 with SMTP id bi5mr3793802wbb.39.1315216484691; Mon, 05 Sep 2011 02:54:44 -0700 (PDT) Received: from rory-mac.chillibean.net ([193.203.81.66]) by mx.google.com with ESMTPS id e21sm7560796wbp.26.2011.09.05.02.54.43 (version=TLSv1/SSLv3 cipher=OTHER); Mon, 05 Sep 2011 02:54:44 -0700 (PDT) Date: Mon, 5 Sep 2011 10:54:42 +0100 From: Rory Franklin To: user@couchdb.apache.org Message-ID: In-Reply-To: References: Subject: Re: couchdb-lucene indexing issues X-Mailer: sparrow 1.3.2 (build 814.60) MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="4e649c62_3a95f874_15f" X-Virus-Checked: Checked by ClamAV on apache.org --4e649c62_3a95f874_15f Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Content-Disposition: inline Got it - I understand how that works now and my search is returning the correct results now. Thanks again! -- Rory On Monday, 5 September 2011 at 10:38, Robert Newson wrote: > The analyzer setting is a top-level item as documented in the README here; > > https://github.com/rnewson/couchdb-lucene > > B. > > On 5 September 2011 10:14, Rory Franklin wrote: > > I've modified my original index in CouchDB to be the following, but not having any joy with things being broken up in to tokens: > > > > > > { > > "_id": "_design/foo", > > "_rev": "19-da99913ce4cdd421903d0d48f9a40cc3", > > "fulltext": { > > "by_metadata": { > > "index": "function(doc) { > > var ret=new Document(); > > if (doc['type'] == 'CSAsset' && doc['deleted'] != true) { > > for (var i in doc.metadata) { > > if(doc.metadata[i]['key'] == 'Title') { > > ret.add(doc.metadata[i]['value'].toLowerCase(), {'field':'sort_title', 'store':'yes', 'index' : 'not_analyzed'}); > > } > > ret.add(doc.metadata[i]['value'],{ 'field' : doc.metadata[i]['key'].toLowerCase(), 'analyzer' : 'simple' }); > > ret.add(doc.metadata[i]['value'], { 'analyzer' : 'simple' }); > > } > > for (var i in doc.partitions) { > > ret.add(doc.partitions[i].partition_id,{'field':'partition'}); ret.add(doc.partitions[i].partition_id); > > } > > ret.add(doc['created_at'], {'field':'sort_created_at', 'store':'yes', 'index' : 'not_analyzed'}); > > return ret; > > } else { > > return null; > > } > > }" > > } > > } > > } > > > > I've opened the index up in Luke and going to the Documents tab and doing reconstruct & edit on a particular document shows that the fields aren't being split up in to separate tokens. > > > > > > -- > > > > Rory > > > > On Saturday, 3 September 2011 at 17:12, Robert Newson wrote: > > > > > " For instance, searching for the term "wonderland" should return back > > > a document where there is a field with the value > > > "some_wonderland_example" but it doesn't." > > > > > > It shouldn't and doesn't. :) > > > > > > 'some_wonderland_example' is a single token when tokenized by the > > > default StandardAnalyzer. If instead you specify "analyzer":"simple", > > > you will find that it is 3 tokens, and your search should work. > > > > > > B. > > > > > > On 3 September 2011 16:06, Rory Franklin wrote: > > > > I'm using couchdb-lucene to index a list of fields (user defined) in a document using the following design document: > > > > > > > > { > > > > "_id": "_design/foo", > > > > "_rev": "16-dcd0d39369c35b3d74ceef13a388826f", > > > > "fulltext": { > > > > "by_metadata": { > > > > "index": "function(doc) { > > > > var ret=new Document(); > > > > if (doc['type'] == 'CSAsset' && doc['deleted'] != true) { > > > > for (var i in doc.metadata) { > > > > if(doc.metadata[i]['key'] == 'Title') { > > > > ret.add(doc.metadata[i]['value'].toLowerCase(), {'field':'sort_title', 'store':'yes', 'index' : 'not_analyzed'}); > > > > } > > > > ret.add(doc.metadata[i]['value'],{'field':doc.metadata[i]['key'].toLowerCase() }); > > > > ret.add(doc.metadata[i]['value']); > > > > } > > > > for (var i in doc.partitions) { > > > > ret.add(doc.partitions[i].partition_id,{'field':'partition'}); ret.add(doc.partitions[i].partition_id); > > > > } > > > > ret.add(doc['created_at'], {'field':'sort_created_at', 'store':'yes', 'index' : 'not_analyzed'}); > > > > return ret; > > > > } else { > > > > return null; > > > > } > > > > }" > > > > } > > > > } > > > > } > > > > > > > > > > > > > > > > (I've formatted the definition so that it's not all on one line for readability here) > > > > > > > > However, when using the by_metadata view it doesn't appear to be breaking the values up when there are underscores. For instance, searching for the term "wonderland" should return back a document where there is a field with the value "some_wonderland_example" but it doesn't. It returns the document if I search for the full term. > > > > > > > > I'm just wondering whether I'm defining the index incorrectly? (of course, feel free to point out if I'm doing anything else glaringly obviously wrong too!) > > > > > > > > > > > > > > > > Rory --4e649c62_3a95f874_15f--