Return-Path: Delivered-To: apmail-lucene-java-commits-archive@www.apache.org Received: (qmail 79545 invoked from network); 3 Sep 2009 13:02:51 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 3 Sep 2009 13:02:51 -0000 Received: (qmail 57629 invoked by uid 500); 3 Sep 2009 13:02:51 -0000 Delivered-To: apmail-lucene-java-commits-archive@lucene.apache.org Received: (qmail 57558 invoked by uid 500); 3 Sep 2009 13:02:51 -0000 Mailing-List: contact java-commits-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-commits@lucene.apache.org Received: (qmail 57549 invoked by uid 99); 3 Sep 2009 13:02:51 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 03 Sep 2009 13:02:51 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO eris.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 03 Sep 2009 13:02:38 +0000 Received: by eris.apache.org (Postfix, from userid 65534) id A86A923888E5; Thu, 3 Sep 2009 13:02:17 +0000 (UTC) Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: svn commit: r810923 - in /lucene/java/trunk/contrib: collation/src/java/org/apache/lucene/collation/ db/bdb-je/src/java/org/apache/lucene/store/je/ db/bdb/src/java/org/apache/lucene/store/db/ misc/src/java/org/apache/lucene/queryParser/analyzing/ misc/... Date: Thu, 03 Sep 2009 13:02:17 -0000 To: java-commits@lucene.apache.org From: rmuir@apache.org X-Mailer: svnmailer-1.0.8 Message-Id: <20090903130217.A86A923888E5@eris.apache.org> X-Virus-Checked: Checked by ClamAV on apache.org Author: rmuir Date: Thu Sep 3 13:02:16 2009 New Revision: 810923 URL: http://svn.apache.org/viewvc?rev=810923&view=rev Log: LUCENE-1876: add missing package.html to some contribs Added: lucene/java/trunk/contrib/collation/src/java/org/apache/lucene/collation/package.html (with props) lucene/java/trunk/contrib/db/bdb-je/src/java/org/apache/lucene/store/je/package.html (with props) lucene/java/trunk/contrib/db/bdb/src/java/org/apache/lucene/store/db/package.html (with props) lucene/java/trunk/contrib/misc/src/java/org/apache/lucene/queryParser/analyzing/package.html (with props) lucene/java/trunk/contrib/misc/src/java/org/apache/lucene/queryParser/complexPhrase/package.html (with props) lucene/java/trunk/contrib/misc/src/java/org/apache/lucene/queryParser/precedence/package.html (with props) lucene/java/trunk/contrib/spatial/src/java/org/apache/lucene/spatial/geohash/package.html (with props) lucene/java/trunk/contrib/spatial/src/java/org/apache/lucene/spatial/tier/package.html (with props) lucene/java/trunk/contrib/wikipedia/src/java/org/apache/lucene/wikipedia/analysis/package.html (with props) lucene/java/trunk/contrib/xml-query-parser/src/java/org/apache/lucene/xmlparser/package.html (with props) Added: lucene/java/trunk/contrib/collation/src/java/org/apache/lucene/collation/package.html URL: http://svn.apache.org/viewvc/lucene/java/trunk/contrib/collation/src/java/org/apache/lucene/collation/package.html?rev=810923&view=auto ============================================================================== --- lucene/java/trunk/contrib/collation/src/java/org/apache/lucene/collation/package.html (added) +++ lucene/java/trunk/contrib/collation/src/java/org/apache/lucene/collation/package.html Thu Sep 3 13:02:16 2009 @@ -0,0 +1,182 @@ + + + + + Lucene Collation Package + + +

+ CollationKeyFilter and ICUCollationKeyFilter + convert each token into its binary CollationKey using the + provided Collator, and then encode the CollationKey + as a String using + {@link org.apache.lucene.util.IndexableBinaryStringTools}, to allow it to be + stored as an index term. +

+

+ ICUCollationKeyFilter depends on ICU4J 4.0 to produce the + CollationKeys. icu4j-collation-4.0.jar, + a trimmed-down version of icu4j-4.0.jar that contains only the + code and data needed to support collation, is included in Lucene's Subversion + repository at contrib/collation/lib/. +

+ +

Use Cases

+ +
    +
  • + Efficient sorting of terms in languages that use non-Unicode character + orderings. (Lucene Sort using a Locale can be very slow.) +
  • +
  • + Efficient range queries over fields that contain terms in languages that + use non-Unicode character orderings. (Range queries using a Locale can be + very slow.) +
  • +
  • + Effective Locale-specific normalization (case differences, diacritics, etc.). + ({@link org.apache.lucene.analysis.LowerCaseFilter} and + {@link org.apache.lucene.analysis.ASCIIFoldingFilter} provide these services + in a generic way that doesn't take into account locale-specific needs.) +
  • +
+ +

Example Usages

+ +

Farsi Range Queries

+
+  // "fa" Locale is not supported by Sun JDK 1.4 or 1.5
+  Collator collator = Collator.getInstance(new Locale("ar"));
+  CollationKeyAnalyzer analyzer = new CollationKeyAnalyzer(collator);
+  RAMDirectory ramDir = new RAMDirectory();
+  IndexWriter writer = new IndexWriter
+    (ramDir, analyzer, true, IndexWriter.MaxFieldLength.LIMITED);
+  Document doc = new Document();
+  doc.add(new Field("content", "\u0633\u0627\u0628", 
+                    Field.Store.YES, Field.Index.ANALYZED));
+  writer.addDocument(doc);
+  writer.close();
+  IndexSearcher is = new IndexSearcher(ramDir, true);
+
+  // The AnalyzingQueryParser in Lucene's contrib allows terms in range queries
+  // to be passed through an analyzer - Lucene's standard QueryParser does not
+  // allow this.
+  AnalyzingQueryParser aqp = new AnalyzingQueryParser("content", analyzer);
+  aqp.setLowercaseExpandedTerms(false);
+  
+  // Unicode order would include U+0633 in [ U+062F - U+0698 ], but Farsi
+  // orders the U+0698 character before the U+0633 character, so the single
+  // indexed Term above should NOT be returned by a ConstantScoreRangeQuery
+  // with a Farsi Collator (or an Arabic one for the case when Farsi is not
+  // supported).
+  ScoreDoc[] result
+    = is.search(aqp.parse("[ \u062F TO \u0698 ]"), null, 1000).scoreDocs;
+  assertEquals("The index Term should not be included.", 0, result.length);
+
+ +

Danish Sorting

+
+  Analyzer analyzer 
+    = new CollationKeyAnalyzer(Collator.getInstance(new Locale("da", "dk")));
+  RAMDirectory indexStore = new RAMDirectory();
+  IndexWriter writer = new IndexWriter 
+    (indexStore, analyzer, true, IndexWriter.MaxFieldLength.LIMITED);
+  String[] tracer = new String[] { "A", "B", "C", "D", "E" };
+  String[] data = new String[] { "HAT", "HUT", "H\u00C5T", "H\u00D8T", "HOT" };
+  String[] sortedTracerOrder = new String[] { "A", "E", "B", "D", "C" };
+  for (int i = 0 ; i < data.length ; ++i) {
+    Document doc = new Document();
+    doc.add(new Field("tracer", tracer[i], Field.Store.YES, Field.Index.NO));
+    doc.add(new Field("contents", data[i], Field.Store.NO, Field.Index.ANALYZED));
+    writer.addDocument(doc);
+  }
+  writer.close();
+  Searcher searcher = new IndexSearcher(indexStore, true);
+  Sort sort = new Sort();
+  sort.setSort(new SortField("contents", SortField.STRING));
+  Query query = new MatchAllDocsQuery();
+  ScoreDoc[] result = searcher.search(query, null, 1000, sort).scoreDocs;
+  for (int i = 0 ; i < result.length ; ++i) {
+    Document doc = searcher.doc(result[i].doc);
+    assertEquals(sortedTracerOrder[i], doc.getValues("tracer")[0]);
+  }
+
+ +

Turkish Case Normalization

+
+  Collator collator = Collator.getInstance(new Locale("tr", "TR"));
+  collator.setStrength(Collator.PRIMARY);
+  Analyzer analyzer = new CollationKeyAnalyzer(collator);
+  RAMDirectory ramDir = new RAMDirectory();
+  IndexWriter writer = new IndexWriter
+    (ramDir, analyzer, true, IndexWriter.MaxFieldLength.LIMITED);
+  Document doc = new Document();
+  doc.add(new Field("contents", "DIGY", Field.Store.NO, Field.Index.ANALYZED));
+  writer.addDocument(doc);
+  writer.close();
+  IndexSearcher is = new IndexSearcher(ramDir, true);
+  QueryParser parser = new QueryParser("contents", analyzer);
+  Query query = parser.parse("d\u0131gy");   // U+0131: dotless i
+  ScoreDoc[] result = is.search(query, null, 1000).scoreDocs;
+  assertEquals("The index Term should be included.", 1, result.length);
+
+ +

Caveats and Comparisons

+

+ WARNING: Make sure you use exactly the same + Collator at index and query time -- CollationKeys + are only comparable when produced by + the same Collator. Since {@link java.text.RuleBasedCollator}s + are not independently versioned, it is unsafe to search against stored + CollationKeys unless the following are exactly the same (best + practice is to store this information with the index and check that they + remain the same at query time): +

+
    +
  1. JVM vendor
  2. +
  3. JVM version, including patch version
  4. +
  5. + The language (and country and variant, if specified) of the Locale + used when constructing the collator via + {@link java.text.Collator#getInstance(java.util.Locale)}. +
  6. +
  7. + The collation strength used - see {@link java.text.Collator#setStrength(int)} +
  8. +
+

+ ICUCollationKeyFilter uses ICU4J's Collator, which + makes its version available, thus allowing collation to be versioned + independently from the JVM. ICUCollationKeyFilter is also + significantly faster and generates significantly shorter keys than + CollationKeyFilter. See + http://site.icu-project.org/charts/collation-icu4j-sun for key + generation timing and key length comparisons between ICU4J and + java.text.Collator over several languages. +

+

+ CollationKeys generated by java.text.Collators are + not compatible with those those generated by ICU Collators. Specifically, if + you use CollationKeyFilter to generate index terms, do not use + ICUCollationKeyFilter on the query side, or vice versa. +

+
+
+ + Propchange: lucene/java/trunk/contrib/collation/src/java/org/apache/lucene/collation/package.html ------------------------------------------------------------------------------ svn:eol-style = native Added: lucene/java/trunk/contrib/db/bdb-je/src/java/org/apache/lucene/store/je/package.html URL: http://svn.apache.org/viewvc/lucene/java/trunk/contrib/db/bdb-je/src/java/org/apache/lucene/store/je/package.html?rev=810923&view=auto ============================================================================== --- lucene/java/trunk/contrib/db/bdb-je/src/java/org/apache/lucene/store/je/package.html (added) +++ lucene/java/trunk/contrib/db/bdb-je/src/java/org/apache/lucene/store/je/package.html Thu Sep 3 13:02:16 2009 @@ -0,0 +1,22 @@ + + + + +Berkeley DB Java Edition based implementation of {@link org.apache.lucene.store.Directory Directory}. + + Propchange: lucene/java/trunk/contrib/db/bdb-je/src/java/org/apache/lucene/store/je/package.html ------------------------------------------------------------------------------ svn:eol-style = native Added: lucene/java/trunk/contrib/db/bdb/src/java/org/apache/lucene/store/db/package.html URL: http://svn.apache.org/viewvc/lucene/java/trunk/contrib/db/bdb/src/java/org/apache/lucene/store/db/package.html?rev=810923&view=auto ============================================================================== --- lucene/java/trunk/contrib/db/bdb/src/java/org/apache/lucene/store/db/package.html (added) +++ lucene/java/trunk/contrib/db/bdb/src/java/org/apache/lucene/store/db/package.html Thu Sep 3 13:02:16 2009 @@ -0,0 +1,22 @@ + + + + +Berkeley DB 4.3 based implementation of {@link org.apache.lucene.store.Directory Directory}. + + Propchange: lucene/java/trunk/contrib/db/bdb/src/java/org/apache/lucene/store/db/package.html ------------------------------------------------------------------------------ svn:eol-style = native Added: lucene/java/trunk/contrib/misc/src/java/org/apache/lucene/queryParser/analyzing/package.html URL: http://svn.apache.org/viewvc/lucene/java/trunk/contrib/misc/src/java/org/apache/lucene/queryParser/analyzing/package.html?rev=810923&view=auto ============================================================================== --- lucene/java/trunk/contrib/misc/src/java/org/apache/lucene/queryParser/analyzing/package.html (added) +++ lucene/java/trunk/contrib/misc/src/java/org/apache/lucene/queryParser/analyzing/package.html Thu Sep 3 13:02:16 2009 @@ -0,0 +1,22 @@ + + + + +QueryParser that passes Fuzzy-, Prefix-, Range-, and WildcardQuerys through the given analyzer. + + Propchange: lucene/java/trunk/contrib/misc/src/java/org/apache/lucene/queryParser/analyzing/package.html ------------------------------------------------------------------------------ svn:eol-style = native Added: lucene/java/trunk/contrib/misc/src/java/org/apache/lucene/queryParser/complexPhrase/package.html URL: http://svn.apache.org/viewvc/lucene/java/trunk/contrib/misc/src/java/org/apache/lucene/queryParser/complexPhrase/package.html?rev=810923&view=auto ============================================================================== --- lucene/java/trunk/contrib/misc/src/java/org/apache/lucene/queryParser/complexPhrase/package.html (added) +++ lucene/java/trunk/contrib/misc/src/java/org/apache/lucene/queryParser/complexPhrase/package.html Thu Sep 3 13:02:16 2009 @@ -0,0 +1,22 @@ + + + + +QueryParser which permits complex phrase query syntax eg "(john jon jonathan~) peters*" + + Propchange: lucene/java/trunk/contrib/misc/src/java/org/apache/lucene/queryParser/complexPhrase/package.html ------------------------------------------------------------------------------ svn:eol-style = native Added: lucene/java/trunk/contrib/misc/src/java/org/apache/lucene/queryParser/precedence/package.html URL: http://svn.apache.org/viewvc/lucene/java/trunk/contrib/misc/src/java/org/apache/lucene/queryParser/precedence/package.html?rev=810923&view=auto ============================================================================== --- lucene/java/trunk/contrib/misc/src/java/org/apache/lucene/queryParser/precedence/package.html (added) +++ lucene/java/trunk/contrib/misc/src/java/org/apache/lucene/queryParser/precedence/package.html Thu Sep 3 13:02:16 2009 @@ -0,0 +1,22 @@ + + + + +QueryParser designed to handle operator precedence in a more sensible fashion than the default QueryParser. + + Propchange: lucene/java/trunk/contrib/misc/src/java/org/apache/lucene/queryParser/precedence/package.html ------------------------------------------------------------------------------ svn:eol-style = native Added: lucene/java/trunk/contrib/spatial/src/java/org/apache/lucene/spatial/geohash/package.html URL: http://svn.apache.org/viewvc/lucene/java/trunk/contrib/spatial/src/java/org/apache/lucene/spatial/geohash/package.html?rev=810923&view=auto ============================================================================== --- lucene/java/trunk/contrib/spatial/src/java/org/apache/lucene/spatial/geohash/package.html (added) +++ lucene/java/trunk/contrib/spatial/src/java/org/apache/lucene/spatial/geohash/package.html Thu Sep 3 13:02:16 2009 @@ -0,0 +1,22 @@ + + + + +Support for Geohash encoding, decoding, and filtering. + + Propchange: lucene/java/trunk/contrib/spatial/src/java/org/apache/lucene/spatial/geohash/package.html ------------------------------------------------------------------------------ svn:eol-style = native Added: lucene/java/trunk/contrib/spatial/src/java/org/apache/lucene/spatial/tier/package.html URL: http://svn.apache.org/viewvc/lucene/java/trunk/contrib/spatial/src/java/org/apache/lucene/spatial/tier/package.html?rev=810923&view=auto ============================================================================== --- lucene/java/trunk/contrib/spatial/src/java/org/apache/lucene/spatial/tier/package.html (added) +++ lucene/java/trunk/contrib/spatial/src/java/org/apache/lucene/spatial/tier/package.html Thu Sep 3 13:02:16 2009 @@ -0,0 +1,22 @@ + + + + +Support for filtering based upon geographic location. + + Propchange: lucene/java/trunk/contrib/spatial/src/java/org/apache/lucene/spatial/tier/package.html ------------------------------------------------------------------------------ svn:eol-style = native Added: lucene/java/trunk/contrib/wikipedia/src/java/org/apache/lucene/wikipedia/analysis/package.html URL: http://svn.apache.org/viewvc/lucene/java/trunk/contrib/wikipedia/src/java/org/apache/lucene/wikipedia/analysis/package.html?rev=810923&view=auto ============================================================================== --- lucene/java/trunk/contrib/wikipedia/src/java/org/apache/lucene/wikipedia/analysis/package.html (added) +++ lucene/java/trunk/contrib/wikipedia/src/java/org/apache/lucene/wikipedia/analysis/package.html Thu Sep 3 13:02:16 2009 @@ -0,0 +1,22 @@ + + + + +Tokenizer that is aware of Wikipedia syntax. + + Propchange: lucene/java/trunk/contrib/wikipedia/src/java/org/apache/lucene/wikipedia/analysis/package.html ------------------------------------------------------------------------------ svn:eol-style = native Added: lucene/java/trunk/contrib/xml-query-parser/src/java/org/apache/lucene/xmlparser/package.html URL: http://svn.apache.org/viewvc/lucene/java/trunk/contrib/xml-query-parser/src/java/org/apache/lucene/xmlparser/package.html?rev=810923&view=auto ============================================================================== --- lucene/java/trunk/contrib/xml-query-parser/src/java/org/apache/lucene/xmlparser/package.html (added) +++ lucene/java/trunk/contrib/xml-query-parser/src/java/org/apache/lucene/xmlparser/package.html Thu Sep 3 13:02:16 2009 @@ -0,0 +1,22 @@ + + + + +Parser that produces Lucene Query objects from XML streams. + + Propchange: lucene/java/trunk/contrib/xml-query-parser/src/java/org/apache/lucene/xmlparser/package.html ------------------------------------------------------------------------------ svn:eol-style = native