lucene-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From hoss...@apache.org
Subject svn commit: r1368191 - in /lucene/dev/branches/branch_4x: ./ dev-tools/ lucene/ lucene/analysis/ lucene/analysis/icu/src/java/org/apache/lucene/collation/ lucene/backwards/ lucene/benchmark/ lucene/core/ lucene/demo/ lucene/facet/ lucene/grouping/ luce...
Date Wed, 01 Aug 2012 18:46:05 GMT
Author: hossman
Date: Wed Aug  1 18:46:03 2012
New Revision: 1368191

URL: http://svn.apache.org/viewvc?rev=1368191&view=rev
Log:
SOLR-3650: migrate DIH CHANGES.txt (merge r1368190)

Removed:
    lucene/dev/branches/branch_4x/solr/contrib/dataimporthandler/CHANGES.txt
Modified:
    lucene/dev/branches/branch_4x/   (props changed)
    lucene/dev/branches/branch_4x/dev-tools/   (props changed)
    lucene/dev/branches/branch_4x/lucene/   (props changed)
    lucene/dev/branches/branch_4x/lucene/BUILD.txt   (props changed)
    lucene/dev/branches/branch_4x/lucene/CHANGES.txt   (props changed)
    lucene/dev/branches/branch_4x/lucene/JRE_VERSION_MIGRATION.txt   (props changed)
    lucene/dev/branches/branch_4x/lucene/LICENSE.txt   (props changed)
    lucene/dev/branches/branch_4x/lucene/MIGRATE.txt   (props changed)
    lucene/dev/branches/branch_4x/lucene/NOTICE.txt   (props changed)
    lucene/dev/branches/branch_4x/lucene/README.txt   (props changed)
    lucene/dev/branches/branch_4x/lucene/analysis/   (props changed)
    lucene/dev/branches/branch_4x/lucene/analysis/icu/src/java/org/apache/lucene/collation/ICUCollationKeyFilterFactory.java
  (props changed)
    lucene/dev/branches/branch_4x/lucene/backwards/   (props changed)
    lucene/dev/branches/branch_4x/lucene/benchmark/   (props changed)
    lucene/dev/branches/branch_4x/lucene/build.xml   (props changed)
    lucene/dev/branches/branch_4x/lucene/common-build.xml   (props changed)
    lucene/dev/branches/branch_4x/lucene/core/   (props changed)
    lucene/dev/branches/branch_4x/lucene/demo/   (props changed)
    lucene/dev/branches/branch_4x/lucene/facet/   (props changed)
    lucene/dev/branches/branch_4x/lucene/grouping/   (props changed)
    lucene/dev/branches/branch_4x/lucene/highlighter/   (props changed)
    lucene/dev/branches/branch_4x/lucene/ivy-settings.xml   (props changed)
    lucene/dev/branches/branch_4x/lucene/join/   (props changed)
    lucene/dev/branches/branch_4x/lucene/licenses/   (props changed)
    lucene/dev/branches/branch_4x/lucene/memory/   (props changed)
    lucene/dev/branches/branch_4x/lucene/misc/   (props changed)
    lucene/dev/branches/branch_4x/lucene/module-build.xml   (props changed)
    lucene/dev/branches/branch_4x/lucene/queries/   (props changed)
    lucene/dev/branches/branch_4x/lucene/queryparser/   (props changed)
    lucene/dev/branches/branch_4x/lucene/sandbox/   (props changed)
    lucene/dev/branches/branch_4x/lucene/site/   (props changed)
    lucene/dev/branches/branch_4x/lucene/spatial/   (props changed)
    lucene/dev/branches/branch_4x/lucene/suggest/   (props changed)
    lucene/dev/branches/branch_4x/lucene/test-framework/   (props changed)
    lucene/dev/branches/branch_4x/lucene/tools/   (props changed)
    lucene/dev/branches/branch_4x/solr/   (props changed)
    lucene/dev/branches/branch_4x/solr/CHANGES.txt   (contents, props changed)
    lucene/dev/branches/branch_4x/solr/LICENSE.txt   (props changed)
    lucene/dev/branches/branch_4x/solr/NOTICE.txt   (props changed)
    lucene/dev/branches/branch_4x/solr/README.txt   (props changed)
    lucene/dev/branches/branch_4x/solr/build.xml   (props changed)
    lucene/dev/branches/branch_4x/solr/cloud-dev/   (props changed)
    lucene/dev/branches/branch_4x/solr/common-build.xml   (props changed)
    lucene/dev/branches/branch_4x/solr/contrib/   (props changed)
    lucene/dev/branches/branch_4x/solr/contrib/dataimporthandler/README.txt
    lucene/dev/branches/branch_4x/solr/core/   (props changed)
    lucene/dev/branches/branch_4x/solr/dev-tools/   (props changed)
    lucene/dev/branches/branch_4x/solr/example/   (props changed)
    lucene/dev/branches/branch_4x/solr/lib/   (props changed)
    lucene/dev/branches/branch_4x/solr/licenses/   (props changed)
    lucene/dev/branches/branch_4x/solr/licenses/httpclient-LICENSE-ASL.txt   (props changed)
    lucene/dev/branches/branch_4x/solr/licenses/httpclient-NOTICE.txt   (props changed)
    lucene/dev/branches/branch_4x/solr/licenses/httpcore-LICENSE-ASL.txt   (props changed)
    lucene/dev/branches/branch_4x/solr/licenses/httpcore-NOTICE.txt   (props changed)
    lucene/dev/branches/branch_4x/solr/licenses/httpmime-LICENSE-ASL.txt   (props changed)
    lucene/dev/branches/branch_4x/solr/licenses/httpmime-NOTICE.txt   (props changed)
    lucene/dev/branches/branch_4x/solr/scripts/   (props changed)
    lucene/dev/branches/branch_4x/solr/solrj/   (props changed)
    lucene/dev/branches/branch_4x/solr/test-framework/   (props changed)
    lucene/dev/branches/branch_4x/solr/testlogging.properties   (props changed)
    lucene/dev/branches/branch_4x/solr/webapp/   (props changed)

Modified: lucene/dev/branches/branch_4x/solr/CHANGES.txt
URL: http://svn.apache.org/viewvc/lucene/dev/branches/branch_4x/solr/CHANGES.txt?rev=1368191&r1=1368190&r2=1368191&view=diff
==============================================================================
--- lucene/dev/branches/branch_4x/solr/CHANGES.txt (original)
+++ lucene/dev/branches/branch_4x/solr/CHANGES.txt Wed Aug  1 18:46:03 2012
@@ -705,6 +705,13 @@ Bug Fixes
 * SOLR-3470: contrib/clustering: custom Carrot2 tokenizer and stemmer factories
   are respected now (Stanislaw Osinski, Dawid Weiss)
 
+* SOLR-3430: Added a new DIH test against a real SQL database.  Fixed problems 
+  revealed by this new test related to  the expanded cache support added to 
+  3.6/SOLR-2382 (James Dyer)
+             
+* SOLR-1958: When using the MailEntityProcessor, import would fail if 
+  fetchMailsSince was not specified. (Max Lynch via James Dyer) 
+
 
 Other Changes
 ----------------------
@@ -858,7 +865,13 @@ Other Changes
 * SOLR-3534: The Dismax and eDismax query parsers will fall back on the 'df' parameter
   when 'qf' is absent.  And if neither is present nor the schema default search field
   then an exception will be thrown now. (dsmiley)
-  
+
+* SOLR-3262: The "threads" feature of DIH is removed (deprecated in Solr 3.6) 
+  (James Dyer)
+
+* SOLR-3422: Refactored DIH internal data classes.  All entities in 
+  data-config.xml must have a name (James Dyer)
+ 
 Documentation
 ----------------------
 
@@ -894,6 +907,17 @@ Bug Fixes:
 * SOLR-3470: contrib/clustering: custom Carrot2 tokenizer and stemmer factories
   are respected now (Stanislaw Osinski, Dawid Weiss)
 
+* SOLR-3360: More DIH bug fixes for the deprecated "threads" parameter.  
+  (Mikhail Khludnev, Claudio R, via James Dyer)
+
+* SOLR-3430: Added a new DIH test against a real SQL database.  Fixed problems 
+  revealed by this new test related to the expanded cache support added to 
+  3.6/SOLR-2382 (James Dyer)
+
+* SOLR-3336: SolrEntityProcessor substitutes most variables at query time.
+  (Michael Kroh, Lance Norskog, via Martijn van Groningen)
+
+
 ==================  3.6.0  ==================
 More information about this release, including any errata related to the 
 release notes, upgrade instructions, or other changes may be found online at:
@@ -1046,6 +1070,27 @@ New Features
   auto detector cannot detect encoding, especially the text file is too short 
   to detect encoding. (koji)
 
+* SOLR-1499: Added SolrEntityProcessor that imports data from another Solr core
+  or instance based on a specified query.
+  (Lance Norskog, Erik Hatcher, Pulkit Singhal, Ahmet Arslan, Luca Cavanna, 
+  Martijn van Groningen)
+
+* SOLR-3190: Minor improvements to SolrEntityProcessor. Add more consistency 
+  between solr parameters and parameters used in SolrEntityProcessor and 
+  ability to specify a custom HttpClient instance.
+  (Luca Cavanna via Martijn van Groningen)
+
+* SOLR-2382: Added pluggable cache support to DIH so that any Entity can be 
+  made cache-able by adding the "cacheImpl" parameter.  Include 
+  "SortedMapBackedCache" to provide in-memory caching (as previously this was 
+  the only option when using CachedSqlEntityProcessor).  Users can provide 
+  their own implementations of DIHCache for other caching strategies.  
+  Deprecate CachedSqlEntityProcessor in favor of specifing "cacheImpl" with
+  SqlEntityProcessor.  Make SolrWriter implement DIHWriter and allow the 
+  possibility of pluggable Writers (DIH writing to something other than Solr). 
+  (James Dyer, Noble Paul)
+
+
 Optimizations
 ----------------------
 * SOLR-1931: Speedup for LukeRequestHandler and admin/schema browser. New parameter
@@ -1292,6 +1337,10 @@ Other Changes
   extracting request handler and are willing to use java 6, just add the jar. 
   (rmuir)
 
+* SOLR-3142: DIH Imports no longer default optimize to true, instead false. 
+  If you want to force all segments to be merged into one, you can specify 
+  this parameter yourself. NOTE: this can be very expensive operation and 
+  usually does not make sense for delta-imports.  (Robert Muir)
 
 Build
 ----------------------
@@ -1389,6 +1438,9 @@ Bug Fixes
   a wrong number of collation results in the response.
   (Bastiaan Verhoef, James Dyer via Simon Willnauer)
 
+* SOLR-2875: Fix the incorrect url in DIH example tika-data-config.xml 
+  (Shinichiro Abe via koji)
+
  Other Changes
 ----------------------
 
@@ -1581,6 +1633,24 @@ Bug Fixes
 * SOLR-2692: contrib/clustering: Typo in param name fixed: "carrot.fragzise" 
   changed to "carrot.fragSize" (Stanislaw Osinski).
 
+* SOLR-2644: When using DIH with threads=2 the default logging is set too high
+  (Bill Bell via shalin)
+
+* SOLR-2492: DIH does not commit if only deletes are processed 
+  (James Dyer via shalin)
+
+* SOLR-2186: DataImportHandler's multi-threaded option throws NPE 
+  (Lance Norskog, Frank Wesemann, shalin)
+
+* SOLR-2655: DIH multi threaded mode does not resolve attributes correctly 
+  (Frank Wesemann, shalin)
+
+* SOLR-2695: DIH: Documents are collected in unsynchronized list in 
+  multi-threaded debug mode (Michael McCandless, shalin)
+
+* SOLR-2668: DIH multithreaded mode does not rollback on errors from 
+  EntityProcessor (Frank Wesemann, shalin)
+
  Other Changes
 ----------------------
 
@@ -1693,6 +1763,9 @@ Bug Fixes
 * SOLR-2581: UIMAToSolrMapper wrongly instantiates Type with reflection.
   (Tommaso Teofili via koji)
 
+* SOLR-2551: Check dataimport.properties for write access (if delta-import is 
+  supported in DIH configuration) before starting an import (C S, shalin)
+
 Other Changes
 ----------------------
 
@@ -2137,6 +2210,30 @@ New Features
 
 * SOLR-2237: Added StempelPolishStemFilterFactory to contrib/analysis-extras (rmuir)
 
+* SOLR-1525: allow DIH to refer to core properties (noble)
+
+* SOLR-1547: DIH TemplateTransformer copy objects more intelligently when the 
+  template is a single variable (noble)
+
+* SOLR-1627: DIH VariableResolver should be fetched just in time (noble)
+
+* SOLR-1583: DIH Create DataSources that return InputStream (noble)
+
+* SOLR-1358: Integration of Tika and DataImportHandler (Akshay Ukey, noble)
+
+* SOLR-1654: TikaEntityProcessor example added DIHExample 
+  (Akshay Ukey via noble)
+
+* SOLR-1678: Move onError handling to DIH framework (noble)
+
+* SOLR-1352: Multi-threaded implementation of DIH (noble)
+
+* SOLR-1721: Add explicit option to run DataImportHandler in synchronous mode 
+  (Alexey Serba via noble)
+
+* SOLR-1737: Added FieldStreamDataSource (noble)
+
+
 Optimizations
 ----------------------
 
@@ -2162,6 +2259,9 @@ Optimizations
   SolrIndexSearcher.doc(int, Set<String>) method b/c it can use the document 
   cache (gsingers)
 
+* SOLR-2200: Improve the performance of DataImportHandler for large 
+  delta-import updates. (Mark Waddle via rmuir)
+
 Bug Fixes
 ----------------------
 * SOLR-1769: Solr 1.4 Replication - Repeater throwing NullPointerException (Jörgen Rydenius
via noble)
@@ -2424,6 +2524,61 @@ Bug Fixes
   does not properly use the same iterator instance. 
   (Christoph Brill, Mark Miller)
 
+* SOLR-1638: Fixed NullPointerException during DIH import if uniqueKey is not 
+  specified in schema (Akshay Ukey via shalin)
+
+* SOLR-1639: Fixed misleading error message when dataimport.properties is not 
+  writable (shalin)
+
+* SOLR-1598: DIH: Reader used in PlainTextEntityProcessor is not explicitly 
+  closed (Sascha Szott via noble)
+
+* SOLR-1759: DIH: $skipDoc was not working correctly 
+  (Gian Marco Tagliani via noble)
+
+* SOLR-1762: DIH: DateFormatTransformer does not work correctly with 
+  non-default locale dates (tommy chheng via noble)
+
+* SOLR-1757: DIH multithreading sometimes throws NPE (noble)
+
+* SOLR-1766: DIH with threads enabled doesn't respond to the abort command 
+  (Michael Henson via noble)
+
+* SOLR-1767: dataimporter.functions.escapeSql() does not escape backslash 
+  character (Sean Timm via noble)
+
+* SOLR-1811: formatDate should use the current NOW value always 
+  (Sean Timm via noble)
+
+* SOLR-1794: Dataimport of CLOB fields fails when getCharacterStream() is 
+  defined in a superclass. (Gunnar Gauslaa Bergem via rmuir)
+
+* SOLR-2057: DataImportHandler never calls UpdateRequestProcessor.finish()
+  (Drew Farris via koji)
+
+* SOLR-1973: Empty fields in XML update messages confuse DataImportHandler. 
+  (koji)
+
+* SOLR-2221: Use StrUtils.parseBool() to get values of boolean options in DIH.
+  true/on/yes (for TRUE) and false/off/no (for FALSE) can be used for 
+  sub-options (debug, verbose, synchronous, commit, clean, optimize) for 
+  full/delta-import commands. (koji)
+
+* SOLR-2310: DIH: getTimeElapsedSince() returns incorrect hour value when 
+  the elapse is over 60 hours (tom liu via koji)
+
+* SOLR-2252: DIH: When a child entity in nested entities is rootEntity="true", 
+  delta-import doesn't work. (koji)
+
+* SOLR-2330: solrconfig.xml files in example-DIH are broken. (Matt Parker, koji)
+
+* SOLR-1191: resolve DataImportHandler deltaQuery column against pk when pk
+  has a prefix (e.g. pk="book.id" deltaQuery="select id from ..."). More
+  useful error reporting when no match found (previously failed with a
+  NullPointerException in log and no clear user feedback). (gthb via yonik)
+
+* SOLR-2116: Fix TikaConfig classloader bug in TikaEntityProcessor
+  (Martijn van Groningen via hossman)
 
 Other Changes
 ----------------------
@@ -2557,6 +2712,12 @@ Other Changes
 * SOLR-1813: Add ICU4j to contrib/extraction libs and add tests for Arabic 
   extraction (Robert Muir via gsingers)
 
+* SOLR-1821: Fix TimeZone-dependent test failure in TestEvaluatorBag.
+  (Chris Male via rmuir)
+
+* SOLR-2367: Reduced noise in test output by ensuring the properties file 
+  can be written. (Gunnlaugur Thor Briem via rmuir)
+
 Build
 ----------------------
 
@@ -2641,6 +2802,33 @@ error.  See SOLR-1410 for more informati
  * RussianLowerCaseFilterFactory
  * RussianLetterTokenizerFactory
 
+DIH: Evaluator API has been changed in a non back-compatible way. Users who 
+have developed custom Evaluators will need to change their code according to 
+the new API for it to work. See SOLR-996 for details.
+
+DIH: The formatDate evaluator's syntax has been changed. The new syntax is 
+formatDate(<variable>, '<format_string>'). For example, 
+formatDate(x.date, 'yyyy-MM-dd'). In the old syntax, the date string was 
+written without a single-quotes. The old syntax has been deprecated and will 
+be removed in 1.5, until then, using the old syntax will log a warning.
+
+DIH: The Context API has been changed in a non back-compatible way. In 
+particular, the Context.currentProcess() method now returns a String 
+describing the type of the current import process instead of an int. 
+Similarily, the public constants in Context viz. FULL_DUMP, DELTA_DUMP and 
+FIND_DELTA are changed to a String type. See SOLR-969 for details.
+
+DIH: The EntityProcessor API has been simplified by moving logic for applying 
+transformers and handling multi-row outputs from Transformers into an 
+EntityProcessorWrapper class. The EntityProcessor#destroy is now called once 
+per parent-row at the end of row (end of data). A new method 
+EntityProcessor#close is added which is called at the end of import.
+
+DIH: In Solr 1.3, if the last_index_time was not available (first import) and 
+a delta-import was requested, a full-import was run instead. This is no longer 
+the case. In Solr 1.4 delta import is run with last_index_time as the epoch 
+date (January 1, 1970, 00:00:00 GMT) if last_index_time is not available.
+
 Versions of Major Components
 ----------------------------
 Apache Lucene 2.9.1  (r832363  on 2.9 branch)
@@ -2932,6 +3120,141 @@ New Features
 86. SOLR-1274: Added text serialization output for extractOnly 
     (Peter Wolanin, gsingers)  
 
+87. SOLR-768: DIH: Set last_index_time variable in full-import command.
+    (Wojtek Piaseczny, Noble Paul via shalin)
+
+88. SOLR-811: Allow a "deltaImportQuery" attribute in SqlEntityProcessor 
+    which is used for delta imports instead of DataImportHandler manipulating 
+    the SQL itself. (Noble Paul via shalin)
+
+89. SOLR-842:  Better error handling in DataImportHandler with options to 
+    abort, skip and continue imports. (Noble Paul, shalin)
+
+90. SOLR-833: DIH: A DataSource to read data from a field as a reader. This 
+    can be used, for example, to read XMLs residing as CLOBs or BLOBs in 
+    databases. (Noble Paul via shalin)
+
+91. SOLR-887: A DIH Transformer to strip HTML tags. (Ahmed Hammad via shalin)
+
+92. SOLR-886: DataImportHandler should rollback when an import fails or it is 
+    aborted (shalin)
+
+93. SOLR-891: A DIH Transformer to read strings from Clob type. 
+    (Noble Paul via shalin)
+
+94. SOLR-812: Configurable JDBC settings in JdbcDataSource including optimized 
+    defaults for read only mode. (David Smiley, Glen Newton, shalin)
+
+95. SOLR-910: Add a few utility commands to the DIH admin page such as full 
+    import, delta import, status, reload config. (Ahmed Hammad via shalin)
+
+96. SOLR-938: Add event listener API for DIH import start and end.
+    (Kay Kay, Noble Paul via shalin)
+
+97. SOLR-801: DIH: Add support for configurable pre-import and post-import 
+    delete query per root-entity. (Noble Paul via shalin)
+
+98. SOLR-988: Add a new scope for session data stored in Context to store 
+    objects across imports. (Noble Paul via shalin)
+
+99. SOLR-980: A PlainTextEntityProcessor which can read from any 
+    DataSource<Reader> and output a String. 
+    (Nathan Adams, Noble Paul via shalin)
+
+100.SOLR-1003: XPathEntityprocessor must allow slurping all text from a given 
+    xml node and its children. (Noble Paul via shalin)
+
+101.SOLR-1001: Allow variables in various attributes of RegexTransformer, 
+    HTMLStripTransformer and NumberFormatTransformer.
+    (Fergus McMenemie, Noble Paul, shalin)
+
+102.SOLR-989: DIH: Expose running statistics from the Context API.
+    (Noble Paul, shalin)
+
+103.SOLR-996: DIH: Expose Context to Evaluators. (Noble Paul, shalin)
+
+104.SOLR-783: DIH: Enhance delta-imports by maintaining separate 
+    last_index_time for each entity. (Jon Baer, Noble Paul via shalin)
+
+105.SOLR-1033: Current entity's namespace is made available to all DIH 
+    Transformers. This allows one to use an output field of TemplateTransformer
+    in other transformers, among other things.
+    (Fergus McMenemie, Noble Paul via shalin)
+
+106.SOLR-1066: New methods in DIH Context to expose Script details. 
+    ScriptTransformer changed to read scripts through the new API methods.
+    (Noble Paul via shalin)
+
+107.SOLR-1062: A DIH LogTransformer which can log data in a given template 
+    format. (Jon Baer, Noble Paul via shalin)
+
+108.SOLR-1065: A DIH ContentStreamDataSource which can accept HTTP POST data 
+    in a content stream. This can be used to push data to Solr instead of 
+    just pulling it from DB/Files/URLs. (Noble Paul via shalin)
+
+109.SOLR-1061: Improve DIH RegexTransformer to create multiple columns from 
+    regex groups. (Noble Paul via shalin)
+
+110.SOLR-1059: Special DIH flags introduced for deleting documents by query or 
+    id, skipping rows and stopping further transforms. Use $deleteDocById, 
+    $deleteDocByQuery for deleting by id and query respectively.  Use $skipRow 
+    to skip the current row but continue with the document. Use $stopTransform 
+    to stop further transformers. New methods are introduced in Context for 
+    deleting by id and query. (Noble Paul, Fergus McMenemie, shalin)
+
+111.SOLR-1076: JdbcDataSource should resolve DIH variables in all its 
+    configuration parameters. (shalin)
+
+112.SOLR-1055: Make DIH JdbcDataSource easily extensible by making the 
+    createConnectionFactory method protected and return a 
+    Callable<Connection> object. (Noble Paul, shalin)
+
+113.SOLR-1058: DIH: JdbcDataSource can lookup javax.sql.DataSource using JNDI. 
+    Use a jndiName attribute to specify the location of the data source.
+    (Jason Shepherd, Noble Paul via shalin)
+
+114.SOLR-1083: A DIH Evaluator for escaping query characters. 
+    (Noble Paul, shalin)
+
+115.SOLR-934: A MailEntityProcessor to enable indexing mails from 
+    POP/IMAP sources into a solr index. (Preetam Rao, shalin)
+
+116.SOLR-1060: A DIH LineEntityProcessor which can stream lines of text from a 
+    given file to be indexed directly or for processing with transformers and
+    child entities.
+    (Fergus McMenemie, Noble Paul, shalin)
+
+117.SOLR-1127: Add support for DIH field name to be templatized.
+    (Noble Paul, shalin)
+
+118.SOLR-1092: Added a new DIH command named 'import' which does not 
+    automatically clean the index. This is useful and more appropriate when one
+    needs to import only some of the entities.
+    (Noble Paul via shalin)
+              
+119.SOLR-1153: DIH 'deltaImportQuery' is honored on child entities as well 
+    (noble) 
+
+120.SOLR-1230: Enhanced dataimport.jsp to work with all DataImportHandler 
+    request handler configurations, rather than just a hardcoded /dataimport 
+    handler. (ehatcher)
+              
+121.SOLR-1235: disallow period (.) in DIH entity names (noble)
+
+122.SOLR-1234: Multiple DIH does not work because all of them write to 
+    dataimport.properties. Use the handler name as the properties file name 
+    (noble)
+
+123.SOLR-1348: Support binary field type in convertType logic in DIH 
+    JdbcDataSource (shalin)
+
+124.SOLR-1406: DIH: Make FileDataSource and FileListEntityProcessor to be more 
+    extensible (Luke Forehand, shalin)
+
+125.SOLR-1437: DIH: XPathEntityProcessor can deal with xpath syntaxes such as 
+    //tagname , /root//tagname (Fergus McMenemie via noble)
+
+
 Optimizations
 ----------------------
  1. SOLR-374: Use IndexReader.reopen to save resources by re-using parts of the
@@ -2989,6 +3312,21 @@ Optimizations
 17. SOLR-1296: Enables setting IndexReader's termInfosIndexDivisor via a new attribute to
StandardIndexReaderFactory.  Enables
     setting termIndexInterval to IndexWriter via SolrIndexConfig. (Jason Rutherglen, hossman,
gsingers)
 
+18. SOLR-846: DIH: Reduce memory consumption during delta import by removing 
+    keys when used (Ricky Leung, Noble Paul via shalin)
+
+19. SOLR-974: DataImportHandler skips commit if no data has been updated.
+    (Wojtek Piaseczny, shalin)
+
+20. SOLR-1004: DIH: Check for abort more frequently during delta-imports.
+    (Marc Sturlese, shalin)
+
+21. SOLR-1098: DIH DateFormatTransformer can cache the format objects.
+    (Noble Paul via shalin)
+
+22. SOLR-1465: Replaced string concatenations with StringBuilder append 
+    calls in DIH XPathRecordReader. (Mark Miller, shalin)
+
 Bug Fixes
 ----------------------
  1. SOLR-774: Fixed logging level display (Sean Timm via Otis Gospodnetic)
@@ -3206,6 +3544,103 @@ Bug Fixes
     caused an error to be returned, although the deletes were
     still executed.  (asmodean via yonik)
 
+76. SOLR-800: Deep copy collections to avoid ConcurrentModificationException 
+    in XPathEntityprocessor while streaming
+    (Kyle Morrison, Noble Paul via shalin)
+
+77. SOLR-823: Request parameter variables ${dataimporter.request.xxx} are not 
+    resolved in DIH (Mck SembWever, Noble Paul, shalin)
+
+78. SOLR-728: Add synchronization to avoid race condition of multiple DIH 
+    imports working concurrently (Walter Ferrara, shalin)
+
+79. SOLR-742: Add ability to create dynamic fields with custom 
+    DataImportHandler transformers (Wojtek Piaseczny, Noble Paul, shalin)
+
+80. SOLR-832: Rows parameter is not honored in DIH non-debug mode and can 
+    abort a running import in debug mode. (Akshay Ukey, shalin)
+
+81. SOLR-838: The DIH VariableResolver obtained from a DataSource's context 
+    does not have current data. (Noble Paul via shalin)
+
+82. SOLR-864: DataImportHandler does not catch and log Errors (shalin)
+
+83. SOLR-873: Fix case-sensitive field names and columns (Jon Baer, shalin)
+
+84. SOLR-893: Unable to delete documents via SQL and deletedPkQuery with 
+    deltaimport (Dan Rosher via shalin)
+
+85. SOLR-888: DIH DateFormatTransformer cannot convert non-string type
+    (Amit Nithian via shalin)
+
+86. SOLR-841: DataImportHandler should throw exception if a field does not 
+    have column attribute (Michael Henson, shalin)
+
+87. SOLR-884: CachedSqlEntityProcessor should check if the cache key is 
+    present in the query results (Noble Paul via shalin)
+
+88. SOLR-985: Fix thread-safety issue with DIH TemplateString for concurrent 
+    imports with multiple cores. (Ryuuichi Kumai via shalin)
+
+89. SOLR-999: DIH XPathRecordReader fails on XMLs with nodes mixed with 
+    CDATA content. (Fergus McMenemie, Noble Paul via shalin)
+
+90. SOLR-1000: DIH FileListEntityProcessor should not apply fileName filter to 
+    directory names. (Fergus McMenemie via shalin)
+
+91. SOLR-1009: Repeated column names result in duplicate values. 
+    (Fergus McMenemie, Noble Paul via shalin)
+
+92. SOLR-1017: Fix DIH thread-safety issue with last_index_time for concurrent 
+    imports in multiple cores due to unsafe usage of SimpleDateFormat by 
+    multiple threads. (Ryuuichi Kumai via shalin)
+
+93. SOLR-1024: Calling abort on DataImportHandler import commits data instead 
+    of calling rollback. (shalin)
+
+94. SOLR-1037: DIH should not add null values in a row returned by 
+    EntityProcessor to documents. (shalin)
+
+95. SOLR-1040: DIH XPathEntityProcessor fails with an xpath like 
+    /feed/entry/link[@type='text/html']/@href (Noble Paul via shalin)
+
+96. SOLR-1042: Fix memory leak in DIH by making TemplateString non-static 
+    member in VariableResolverImpl (Ryuuichi Kumai via shalin)
+
+97. SOLR-1053: IndexOutOfBoundsException in DIH SolrWriter.getResourceAsString 
+    when size of data-config.xml is a multiple of 1024 bytes.
+    (Herb Jiang via shalin)
+
+98. SOLR-1077: IndexOutOfBoundsException with useSolrAddSchema in DIH 
+    XPathEntityProcessor. (Sam Keen, Noble Paul via shalin)
+
+99. SOLR-1080: DIH RegexTransformer should not replace if regex is not matched.
+    (Noble Paul, Fergus McMenemie via shalin)
+
+100.SOLR-1090: DataImportHandler should load the data-config.xml using UTF-8 
+    encoding. (Rui Pereira, shalin)
+
+101.SOLR-1146: ConcurrentModificationException in DataImporter.getStatusMessages
+    (Walter Ferrara, Noble Paul via shalin)
+
+102.SOLR-1229: Fixes for DIH deletedPkQuery, particularly when using 
+    transformed Solr unique id's
+    (Lance Norskog, Noble Paul via ehatcher)
+              
+103.SOLR-1286: Fix the IH commit parameter always defaulting to "true" even 
+    if "false" is explicitly passed in. (Jay Hill, Noble Paul via ehatcher)
+            
+104.SOLR-1323: Reset XPathEntityProcessor's $hasMore/$nextUrl when fetching 
+    next URL (noble, ehatcher)
+
+105.SOLR-1450: DIH: Jdbc connection properties such as batchSize are not 
+    applied if the driver jar is placed in solr_home/lib.
+    (Steve Sun via shalin)
+
+106.SOLR-1474: DIH Delta-import should run even if last_index_time is not set.
+    (shalin)
+
+
 Other Changes
 ----------------------
  1. Upgraded to Lucene 2.4.0 (yonik)
@@ -3353,6 +3788,55 @@ Other Changes
     for discussion on language detection.
     See http://www.apache.org/dist/lucene/tika/CHANGES-0.4.txt. (gsingers)
 
+53. SOLR-782: DIH: Refactored SolrWriter to make it a concrete class and 
+    removed wrappers over SolrInputDocument.  Refactored to load Evaluators 
+    lazily. Removed multiple document nodes in the configuration xml. Removed 
+    support for 'default' variables, they are automatically available as 
+    request parameters. (Noble Paul via shalin)
+
+54. SOLR-964: DIH: XPathEntityProcessor now ignores DTD validations
+    (Fergus McMenemie, Noble Paul via shalin)
+
+55. SOLR-1029: DIH: Standardize Evaluator parameter parsing and added helper 
+    functions for parsing all evaluator parameters in a standard way.
+    (Noble Paul, shalin)
+
+56. SOLR-1081: Change DIH EventListener to be an interface so that components 
+    such as an EntityProcessor or a Transformer can act as an event listener.
+    (Noble Paul, shalin)
+
+57. SOLR-1027: DIH: Alias the 'dataimporter' namespace to a shorter name 'dih'.
+    (Noble Paul via shalin)
+
+58. SOLR-1084: Better error reporting when DIH entity name is a reserved word 
+    and data-config.xml root node is not <dataConfig>.
+    (Noble Paul via shalin)
+
+59. SOLR-1087: Deprecate 'where' attribute in CachedSqlEntityProcessor in 
+    favor of cacheKey and cacheLookup. (Noble Paul via shalin)
+
+60. SOLR-969: Change the FULL_DUMP, DELTA_DUMP, FIND_DELTA constants in DIH 
+    Context to String.  Change Context.currentProcess() to return a string 
+    instead of an integer.  (Kay Kay, Noble Paul, shalin)
+
+61. SOLR-1120: Simplified DIH EntityProcessor API by moving logic for applying 
+    transformers and handling multi-row outputs from Transformers into an 
+    EntityProcessorWrapper class. The behavior of the method 
+    EntityProcessor#destroy has been modified to be called once per parent-row 
+    at the end of row. A new method EntityProcessor#close is added which is 
+    called at the end of import. A new method 
+    Context#getResolvedEntityAttribute is added which returns the resolved 
+    value of an entity's attribute. Introduced a DocWrapper which takes care 
+    of maintaining document level session variables.
+    (Noble Paul, shalin)
+
+62. SOLR-1265: Add DIH variable resolving for URLDataSource properties like 
+    baseUrl.  (Chris Eldredge via ehatcher)
+
+63. SOLR-1269: Better error messages from DIH JdbcDataSource when JDBC Driver 
+    name or SQL is incorrect. (ehatcher, shalin)
+
+
 Build
 ----------------------
  1. SOLR-776: Added in ability to sign artifacts via Ant for releases (gsingers)
@@ -3378,6 +3862,10 @@ Documentation
 
  3. SOLR-1409: Added Solr Powered By Logos    
 
+ 4. SOLR-1369: Add HSQLDB Jar to example-DIH, unzip database and update 
+    instructions.
+
+
 ================== Release 1.3.0 ==================
 
 Upgrading from Solr 1.2
@@ -3723,7 +4211,10 @@ New Features
 71. SOLR-1129 : Support binding dynamic fields to beans in SolrJ (Avlesh Singh , noble)
 
 72. SOLR-920 : Cache and reuse IndexSchema . A new attribute added in solr.xml called 'shareSchema'
(noble)
-    
+
+73. SOLR-700: DIH: Allow configurable locales through a locale attribute in 
+    fields for NumberFormatTransformer. (Stefan Oestreicher, shalin)
+
 Changes in runtime behavior
  1. SOLR-559: use Lucene updateDocument, deleteDocuments methods.  This
     removes the maxBufferedDeletes parameter added by SOLR-310 as Lucene
@@ -3938,6 +4429,18 @@ Bug Fixes
 
 50. SOLR-749: Allow QParser and ValueSourceParsers to be extended with same name (hossman,
gsingers)
 
+51. SOLR-704: DIH NumberFormatTransformer can silently ignore part of the 
+    string while parsing. Now it tries to use the complete string for parsing. 
+    Failure to do so will result in an exception.
+    (Stefan Oestreicher via shalin)
+
+52. SOLR-729: DIH Context.getDataSource(String) gives current entity's 
+    DataSource instance regardless of argument. (Noble Paul, shalin)
+
+53. SOLR-726: DIH: Jdbc Drivers and DataSources fail to load if placed in 
+    multicore sharedLib or core's lib directory.
+    (Walter Ferrara, Noble Paul, shalin)
+
 Other Changes
  1. SOLR-135: Moved common classes to org.apache.solr.common and altered the
     build scripts to make two jars: apache-solr-1.3.jar and 

Modified: lucene/dev/branches/branch_4x/solr/contrib/dataimporthandler/README.txt
URL: http://svn.apache.org/viewvc/lucene/dev/branches/branch_4x/solr/contrib/dataimporthandler/README.txt?rev=1368191&r1=1368190&r2=1368191&view=diff
==============================================================================
--- lucene/dev/branches/branch_4x/solr/contrib/dataimporthandler/README.txt (original)
+++ lucene/dev/branches/branch_4x/solr/contrib/dataimporthandler/README.txt Wed Aug  1 18:46:03
2012
@@ -1,3 +1,12 @@
+                    Apache Solr - DataImportHandler
+
+Introduction
+------------
+DataImportHandler is a data import tool for Solr which makes importing data from Databases,
XML files and
+HTTP data sources quick and easy.
+
+Important Note
+--------------
 Although Solr strives to be agnostic of the Locale where the server is
 running, some code paths in DataImportHandler are known to depend on the
 System default Locale, Timezone, or Charset.  It is recommended that when



Mime
View raw message