Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 18187 invoked from network); 24 Jun 2010 00:01:09 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 24 Jun 2010 00:01:09 -0000 Received: (qmail 50848 invoked by uid 500); 24 Jun 2010 00:01:07 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 50795 invoked by uid 500); 24 Jun 2010 00:01:06 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 50787 invoked by uid 99); 24 Jun 2010 00:01:06 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 Jun 2010 00:01:06 +0000 X-ASF-Spam-Status: No, hits=2.9 required=10.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.212.176] (HELO mail-px0-f176.google.com) (209.85.212.176) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 Jun 2010 00:00:59 +0000 Received: by pxi13 with SMTP id 13so1418103pxi.35 for ; Wed, 23 Jun 2010 17:00:36 -0700 (PDT) Received: by 10.114.237.20 with SMTP id k20mr8407714wah.185.1277337636724; Wed, 23 Jun 2010 17:00:36 -0700 (PDT) Received: from [10.0.1.90] (203-114-161-15.wir.sta.inspire.net.nz [203.114.161.15]) by mx.google.com with ESMTPS id n32sm47623773wae.10.2010.06.23.17.00.33 (version=SSLv3 cipher=RC4-MD5); Wed, 23 Jun 2010 17:00:35 -0700 (PDT) Subject: RE: Help with Numeric Range From: Todd Nine Reply-To: todd@spidertracks.co.nz To: Uwe Schindler Cc: java-user@lucene.apache.org In-Reply-To: <004e01cb12a0$d23e59c0$76bb0d40$@thetaphi.de> References: <1277272403.23824.6.camel@greenlantern.local> <004e01cb12a0$d23e59c0$76bb0d40$@thetaphi.de> Content-Type: multipart/alternative; boundary="=-B8jGBhOiCamrBhjSSUSk" Date: Thu, 24 Jun 2010 11:59:57 +1200 Message-ID: <1277337597.23824.260.camel@greenlantern.local> Mime-Version: 1.0 X-Mailer: Evolution 2.28.3 X-Virus-Checked: Checked by ClamAV on apache.org --=-B8jGBhOiCamrBhjSSUSk Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit Hi Uwe, Thank you for your help, it is greatly appreciated. Unfortunately, my tests all fail except for RangeInclusive. I've changed the step to be 6 as per your recommendation. I had it at max to eliminate step precision as the cause of the test failure. Essentially, all keys in Cassandra are UTF-8 Keys. In the Lucandra, the keys are constructed in the following way. 1. Get the token stream for the field. In this case it's a NumericTokenStream with (numeric,valSize=64,precisionStep=6) 2. For all tokens in the stream, create a UTF8 String in the following format \uffff 3. Set the term frequency to 1 This gives us a list of tokens, prefixed with the field name and the delimiter. then we do this for each term from above create a key of the format \uffff\uffff and write it to TermInfo column Family After debugging the implementation of the LucandraTermEnum, it is correctly returning values that should match my numeric range query. However, I never get the results in the TopDocs result set after they're handed back to the numeric range query object. Any ideas why this is happening? Thanks, Todd On Wed, 2010-06-23 at 08:53 +0200, Uwe Schindler wrote: > Hi Todd, > > I am not sure if I understand your problem correctly. I am not familiar with Lucandra/Cassandra at all, but if Lucandra implements the IndexWriter and IndexReader according to the documentation, numeric queries should work. A NumericField internally creates a TokenStream and "analyzes" the number to several Tokens, which are somehow "half binary" (they are terms containing of characters in the full 0..127 range for optimal UTF8 compression with 3.x versions of Lucene). The exact encoding can be looked at in the NumericUtils class + javadocs. > > About your testcase: The test looks good, so does it fail? If yes, where is the problem? You can also look into Lucene's test TestNumericRangeQuery64 for more examples. Or modify its @BeforeClass to instead build a Lucandra index. > > The test has one thing, that is not intended to be done like that: > numeric = new NumericField("long", Integer.MAX_VALUE, Store.YES, true); > > You are using MAX_VALUE as precision step, this would slowdown all queries to the speed of old-style TermRangeQueries. It is always better to stick with the default of 4, which creates 64 bits / 4 precStep = 16 terms per value. Alternatively for longs, 6 is a good precision step (see NumericRangeQuery documentation). MAX_VALUE is only intended for fields that do not do numeric ranges but e.g. sort only. precisionStep is a performance tuning parameter, it has nothing to do with better/worse precision on terms or different query results. If you are using NumericRangeQuery with this large precStep, you are not using the numeric features at all, so your test should not behave different from a conventional TermRangeQuery with padded terms. > > Uwe > > ----- > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: uwe@thetaphi.de > > > > -----Original Message----- > > From: Todd Nine [mailto:todd@spidertracks.co.nz] > > Sent: Wednesday, June 23, 2010 7:53 AM > > To: java-user@lucene.apache.org > > Subject: Help with Numeric Range > > > > Hi all, > > I'm new to Lucene, as well as Cassandra. I'm working on the Lucandra > > project to modify it to add some extra functionality. It hasn't been fully > > testing with range queries, so I've created some tests and contributed them. > > You can view my source here. > > > > http://github.com/tnine/Lucandra/blob/master/test/lucandra/NumericRang > > eTests.java > > > > First, is this a sensible test? I'm specifically testing the case of longs where I > > need millisecond precision on my searches. > > > > > > Second, I see that Numeric Fields are built via terms. I think the issue lies in > > the encoding of these terms into bytes for the Cassandra keys. Can anyone > > point me to some documentation on numeric queries and terms, and how > > they are encoded at the byte level based on the precision? > > > > Thanks, > > Todd > --=-B8jGBhOiCamrBhjSSUSk--