Return-Path: Delivered-To: apmail-db-derby-dev-archive@www.apache.org Received: (qmail 39909 invoked from network); 16 Mar 2009 08:50:21 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 16 Mar 2009 08:50:21 -0000 Received: (qmail 82944 invoked by uid 500); 16 Mar 2009 08:50:20 -0000 Delivered-To: apmail-db-derby-dev-archive@db.apache.org Received: (qmail 82916 invoked by uid 500); 16 Mar 2009 08:50:20 -0000 Mailing-List: contact derby-dev-help@db.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: Delivered-To: mailing list derby-dev@db.apache.org Received: (qmail 82907 invoked by uid 99); 16 Mar 2009 08:50:20 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 16 Mar 2009 01:50:20 -0700 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 16 Mar 2009 08:50:11 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 763CE234C044 for ; Mon, 16 Mar 2009 01:49:50 -0700 (PDT) Message-ID: <1648807821.1237193390482.JavaMail.jira@brutus> Date: Mon, 16 Mar 2009 01:49:50 -0700 (PDT) From: "Knut Anders Hatlen (JIRA)" To: derby-dev@db.apache.org Subject: [jira] Commented: (DERBY-472) Full Text Indexing / Full Text Search In-Reply-To: <1731265072.1122415098848.JavaMail.jira@ajax.apache.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/DERBY-472?page=3Dcom.atlassian.= jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D12682= 251#action_12682251 ]=20 Knut Anders Hatlen commented on DERBY-472: ------------------------------------------ I'm not aware of any documentation of the replication protocol, but my unde= rstanding is that it is just using ObjectOutput.writeObject()/ObjectInput.r= eadObject() to transport ReplicationMessage objects (http://db.apache.org/d= erby/javadoc/engine/org/apache/derby/impl/store/replication/net/Replication= Message.html). The interesting messages are the ones where type=3DTYPE_LOG.= Those messages contains the transaction log records, which have the same f= ormat as the files in the log directory in the database. The replication pr= otocol doesn't know the meaning of the transaction log records, it just for= wards the raw bytes to the recovery subsystem. I think the format of the tr= ansaction logs is also based on writeObject()/readObject(), so the easiest = way to learn the protocol is probably to study the writeExternal() and read= External() methods of the different log operation classes listed at the bot= tom of this page: http://db.apache.org/derby/papers/recovery.html (of cours= e, you don't need to know the exact format if you use readObject()/writeObj= ect() yourself). > Full Text Indexing / Full Text Search > ------------------------------------- > > Key: DERBY-472 > URL: https://issues.apache.org/jira/browse/DERBY-472 > Project: Derby > Issue Type: New Feature > Components: SQL > Affects Versions: 10.0.2.0 > Environment: All environments > Reporter: Rick Hillegas > > Efficiently support full text search of string datatyped columns. Mag Gam= raised this issue on the user's mailing list on 24 July 2005; the email th= read is titled 'Full Text Indexing'. Mag wants to see something akin to the= functionality in tsearch2 (http://www.sai.msu.su/~megera/postgres/gist/tse= arch/V2/). Dan points out that we may be able to re-use index building tech= nology exposed by the apache Lucene project (http://lucene.apache.org/). > Presumably we want to build inverted indexes on all string datatyped colu= mns: CHAR, VARCHAR, LONG VARCHAR, CLOB,, and their national variants (when = they are implemented). We should consider the following additional issues w= hen specifying this feature: > 1) Do we also want to support text search on XML columns? > 2) Which human languages do we support initially? Each language has its o= wn rules for lexing words and its own list of "noise" words which should no= t be indexed. Hopefully, we can plug-in some existing packages of lexers an= d noise filters. We should encourage users to donate additional lexers/fitl= ers. > 3) The CREATE INDEX syntax (for these new inverted indexes) should let u= s bind a lexing human language to a string-datatyped column. > 4) How do we express the search condition? For case-sensitive searches we= can get away with boolean expressions built out of standard LIKE clauses. = However, in my opinion, case-sensitive searches are an edge case. The more = useful situation is a case-insensitive search. Can we get away with introdu= cing a non-standard function here or do we need to push a proposal through = the standards commitees? Even more useful and non-standard are fuzzy search= es, which tolerate bad spellers. --=20 This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.