<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<title>java-user@lucene.apache.org Archives</title>
<link rel="self" href="http://mail-archives.apache.org/mod_mbox/lucene-java-user/?format=atom"/>
<link href="http://mail-archives.apache.org/mod_mbox/lucene-java-user/"/>
<id>http://mail-archives.apache.org/mod_mbox/lucene-java-user/</id>
<updated>2009-12-10T00:28:33Z</updated>
<entry>
<title>RE: heap memory issues when sorting by a string field</title>
<author><name>Toke Eskildsen &lt;te@statsbiblioteket.dk&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-java-user/200912.mbox/%3c2E6A89A648463A4EBF093A9062C16683013484F9F742@SBMAILBOX1.sb.statsbiblioteket.dk%3e"/>
<id>urn:uuid:%3c2E6A89A648463A4EBF093A9062C16683013484F9F742@SBMAILBOX1-sb-statsbiblioteket-dk%3e</id>
<updated>2009-12-09T23:46:04Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Thanks for the heads-up, TCK. The Dietz &amp; Sleator article I found at 
http://www.cs.cmu.edu/~sleator/papers/maintaining-order.pdf
looks very interesting.

String sorting in Lucene is indeed fairly expensive and we've experimented with
two solutions to this, none of which are golden bullets.

1) Store the order, not the content.

Like the Lucene default sorter, we fetch all the terms for the sort field on first
sort. The docIDs are mapped to the terms and an complete sort is performed 
(by using a custom Collator: We want to avoid using CollationKeys as they
take too much memory. But that's another story). When the sort is finished,
we have an integer-array where each entry is the relative position for a given
document (referenced by docID). We then de-reference the loaded terms
so that the memory is freed.

Determining the order of two documents after the initial build is extremely
simple: order[docIDa] - order[docIDb] (or maybe it was the other way 
around, I forget).

Cons: Large temporarily memory overhead upon initial sort, though not as 
much as using CollationKeys. Slower initial sort than Lucene build-in.
Pros: Very fast subsequent sorts, very low memory-overhead when running.
Joker: This approach could be used to perform a post-processing on a 
generated index and storing the orders as a file along with the index, then 
pushing the complete package to searchers. This would lower the memory
requirements for the searchers significantly.


2) As above, but use a multipass order builder.

For the first sort, the order must be calculated: A sliding window sorter is created 
with a buffer of a fixed size. The window slides through all terms for the sort field 
and extracts the top X. The order-array is updated for the documents that has 
one of these terms. The sliding is repeated multiple times, where terms ordered 
before the last term of the previous iteration are ignored.

Cons: _Very_ slow (too slow in the current implementation) order build.
Pros: Same as above.
Joker: The buffer size determines memory use vs. order build time.


The multipass approach looks promising, but requires more work to get to a
usable state. Right now it takes minutes to build the order-array for half a 
million documents, with a buffer size requiring 5 iterations. If I ever get it to
work, I'll be sure to share it.

Regards,
Toke Eskildsen

________________________________________
From: TCK [moonwatcher32329@gmail.com]
Sent: 09 December 2009 22:58
To: java-user@lucene.apache.org
Subject: Re: heap memory issues when sorting by a string field

Thanks Mike for opening this jira ticket and for your patch. Explicitly
removing the entry from the WHM definitely does reduce the number of GC
cycles taken to free the huge StringIndex objects that get created when
doing a sort by a string field.

But I'm still trying to figure out why it is necessary for lucene to load
into memory the sorted-by field of every single document in the index (which
is what makes the StringIndex objects so big) on the first query. Why isn't
it sufficient to load the field values for only the documents that match the
given search criteria? Perhaps by using an order-maintenance data structure
(Dietz&amp;Sleator, or Bender et al) coupled with a balanced search tree,
instead of the simple lookup and order arrays currently used in StringIndex,
we can dynamically grow it as needed by successive queries rather than
loading everything on the first query.

Apologies if this is a naive question... I'm a newbie to the lucene code and
can't wait for the new lucene book to come out:-)

-TCK




On Tue, Dec 8, 2009 at 5:43 AM, Michael McCandless &lt;
lucene@mikemccandless.com&gt; wrote:

&gt; I've opened LUCENE-2135.
&gt;
&gt; Mike
&gt;
&gt; On Tue, Dec 8, 2009 at 5:36 AM, Michael McCandless
&gt; &lt;lucene@mikemccandless.com&gt; wrote:
&gt; &gt; This is a rather disturbing implementation detail of WeakHashMap, that
&gt; &gt; it needs the one extra step (invoking one of its methods) for its weak
&gt; &gt; keys to be reclaimable.
&gt; &gt;
&gt; &gt; Maybe on IndexReader.close(), Lucene should go and evict all entries
&gt; &gt; in the FieldCache associated with that reader.  Ie, step through the
&gt; &gt; sub-readers, and if they are truly closed as well (not shared w/ other
&gt; &gt; readers), evict.  I'll open an issue.
&gt; &gt;
&gt; &gt; Even in TCK's code fragment, it's not until the final line is done
&gt; &gt; executing, that the cache key even loses all hard references, because
&gt; &gt; it's that line that assigns to sSearcher, replacing the strong
&gt; &gt; reference to the old searcher.  Inserting sSearcher = null prior to
&gt; &gt; that would drop the hard reference sooner, but because of this impl
&gt; &gt; detail of WeakHashMap, something would still have to touch it (eg, a
&gt; &gt; warmup query that hits the field cache) before it's reclaimable.
&gt; &gt;
&gt; &gt; Mike
&gt; &gt;
&gt; &gt; On Mon, Dec 7, 2009 at 7:38 PM, Tom Hill &lt;solr-list@worldware.com&gt;
&gt; wrote:
&gt; &gt;&gt; Hi -
&gt; &gt;&gt;
&gt; &gt;&gt; If I understand correctly, WeakHashMap does not free the memory for the
&gt; &gt;&gt; value (cached data) when the key is nulled, or even when the key is
&gt; garbage
&gt; &gt;&gt; collected.
&gt; &gt;&gt;
&gt; &gt;&gt; It requires one more step: a method on WeakHashMap must be called to
&gt; allow
&gt; &gt;&gt; it to release its hard reference to the cached data. It appears that
&gt; most
&gt; &gt;&gt; methods in WeakHashMap end up calling expungeStaleEntries, which will
&gt; clear
&gt; &gt;&gt; the hard reference. But you have to call some method on the map, before
&gt; the
&gt; &gt;&gt; memory is eligible for garbage collection.
&gt; &gt;&gt;
&gt; &gt;&gt; So it requires four stages to free the cached data. Null the key; A GC
&gt; to
&gt; &gt;&gt; release the weak reference to the key; A call to some method on the map;
&gt; &gt;&gt; Then the next GC cycle should free the value.
&gt; &gt;&gt;
&gt; &gt;&gt; So it seems possible that you could end up with double memory usage for
&gt; a
&gt; &gt;&gt; time. If you don't have a GC between the time that you close the old
&gt; reader,
&gt; &gt;&gt; and you start to load the field cache entry for the next reader, then
&gt; the
&gt; &gt;&gt; key may still be hanging around uncollected.
&gt; &gt;&gt;
&gt; &gt;&gt; At that point, it may run a GC when you allocate the new cache, but
&gt; that's
&gt; &gt;&gt; only the first GC. It can't free the cached data until after the next
&gt; call
&gt; &gt;&gt; to expungeStaleEntries, so for a while you have both caches around.
&gt; &gt;&gt;
&gt; &gt;&gt; This extra usage could cause things to move into tenured space. Could
&gt; this
&gt; &gt;&gt; be causing your problem?
&gt; &gt;&gt;
&gt; &gt;&gt; Workaround would be to cause some method to be called on the
&gt; WeakHashMap.
&gt; &gt;&gt; You don't want to call get(), since that will try to populate the cache.
&gt; &gt;&gt; Maybe if you tried putting a small value to the cache, and doing a GC,
&gt; and
&gt; &gt;&gt; see if your memory drops then.
&gt; &gt;&gt;
&gt; &gt;&gt;
&gt; &gt;&gt; Tom
&gt; &gt;&gt;
&gt; &gt;&gt;
&gt; &gt;&gt;
&gt; &gt;&gt; On Mon, Dec 7, 2009 at 1:48 PM, TCK &lt;moonwatcher32329@gmail.com&gt; wrote:
&gt; &gt;&gt;
&gt; &gt;&gt;&gt; Thanks for the response. But I'm definitely calling close() on the old
&gt; &gt;&gt;&gt; reader and opening a new one (not using reopen). Also, to simplify the
&gt; &gt;&gt;&gt; analysis, I did my test with a single-threaded requester to eliminate
&gt; any
&gt; &gt;&gt;&gt; concurrency issues.
&gt; &gt;&gt;&gt;
&gt; &gt;&gt;&gt; I'm doing:
&gt; &gt;&gt;&gt; sSearcher.getIndexReader().close();
&gt; &gt;&gt;&gt; sSearcher.close(); // this actually seems to be a no-op
&gt; &gt;&gt;&gt; IndexReader newIndexReader = IndexReader.open(newDirectory);
&gt; &gt;&gt;&gt; sSearcher = new IndexSearcher(newIndexReader);
&gt; &gt;&gt;&gt;
&gt; &gt;&gt;&gt; Btw, isn't it bad practice anyway to have an unbounded cache? Are there
&gt; any
&gt; &gt;&gt;&gt; plans to replace the HashMaps used for the innerCaches with an actual
&gt; &gt;&gt;&gt; size-bounded cache with some eviction policy (perhaps EhCache or
&gt; something)
&gt; &gt;&gt;&gt; ?
&gt; &gt;&gt;&gt;
&gt; &gt;&gt;&gt; Thanks again,
&gt; &gt;&gt;&gt; TCK
&gt; &gt;&gt;&gt;
&gt; &gt;&gt;&gt;
&gt; &gt;&gt;&gt;
&gt; &gt;&gt;&gt;
&gt; &gt;&gt;&gt; On Mon, Dec 7, 2009 at 4:37 PM, Erick Erickson &lt;
&gt; erickerickson@gmail.com
&gt; &gt;&gt;&gt; &gt;wrote:
&gt; &gt;&gt;&gt;
&gt; &gt;&gt;&gt; &gt; What this sounds like is that you're not really closing your
&gt; &gt;&gt;&gt; &gt; readers even though you think you are. Sorting indeed uses up
&gt; &gt;&gt;&gt; &gt; significant memory when it populates internal caches and keeps
&gt; &gt;&gt;&gt; &gt; it around for later use (which is one of the reasons that warming
&gt; &gt;&gt;&gt; &gt; queries matter). But if you really do close the reader, I'm pretty
&gt; &gt;&gt;&gt; &gt; sure the memory should be GC-able.
&gt; &gt;&gt;&gt; &gt;
&gt; &gt;&gt;&gt; &gt; One thing that trips people up is IndexReader.reopen(). If it
&gt; &gt;&gt;&gt; &gt; returns a reader different than the original, you *must* close the
&gt; &gt;&gt;&gt; &gt; old one. If you don't, the old reader is still hanging around and
&gt; &gt;&gt;&gt; &gt; memory won't be returne.... An example from the Javadocs...
&gt; &gt;&gt;&gt; &gt;
&gt; &gt;&gt;&gt; &gt;  IndexReader reader = ...
&gt; &gt;&gt;&gt; &gt;  ...
&gt; &gt;&gt;&gt; &gt;  IndexReader new = r.reopen();
&gt; &gt;&gt;&gt; &gt;  if (new != reader) {
&gt; &gt;&gt;&gt; &gt;   ...     // reader was reopened
&gt; &gt;&gt;&gt; &gt;   reader.close();
&gt; &gt;&gt;&gt; &gt;  }
&gt; &gt;&gt;&gt; &gt;  reader = new;
&gt; &gt;&gt;&gt; &gt;  ...
&gt; &gt;&gt;&gt; &gt;
&gt; &gt;&gt;&gt; &gt;
&gt; &gt;&gt;&gt; &gt; If this is irrelevant, could you post your close/open
&gt; &gt;&gt;&gt; &gt;
&gt; &gt;&gt;&gt; &gt; code?
&gt; &gt;&gt;&gt; &gt;
&gt; &gt;&gt;&gt; &gt; HTH
&gt; &gt;&gt;&gt; &gt;
&gt; &gt;&gt;&gt; &gt; Erick
&gt; &gt;&gt;&gt; &gt;
&gt; &gt;&gt;&gt; &gt;
&gt; &gt;&gt;&gt; &gt; On Mon, Dec 7, 2009 at 4:27 PM, TCK &lt;moonwatcher32329@gmail.com&gt;
&gt; wrote:
&gt; &gt;&gt;&gt; &gt;
&gt; &gt;&gt;&gt; &gt; &gt; Hi,
&gt; &gt;&gt;&gt; &gt; &gt; I'm having heap memory issues when I do lucene queries involving
&gt; &gt;&gt;&gt; sorting
&gt; &gt;&gt;&gt; &gt; by
&gt; &gt;&gt;&gt; &gt; &gt; a string field. Such queries seem to load a lot of data in to
the
&gt; heap.
&gt; &gt;&gt;&gt; &gt; &gt; Moreover lucene seems to hold on to references to this data even
&gt; after
&gt; &gt;&gt;&gt; &gt; the
&gt; &gt;&gt;&gt; &gt; &gt; index reader has been closed and a full GC has been run. Some
of
&gt; the
&gt; &gt;&gt;&gt; &gt; &gt; consequences of this are that in my generational heap configuration
&gt; a
&gt; &gt;&gt;&gt; lot
&gt; &gt;&gt;&gt; &gt; &gt; of
&gt; &gt;&gt;&gt; &gt; &gt; memory gets promoted to tenured space each time I close the old
&gt; index
&gt; &gt;&gt;&gt; &gt; &gt; reader
&gt; &gt;&gt;&gt; &gt; &gt; and after opening and querying using a new one, and the tenured
&gt; space
&gt; &gt;&gt;&gt; &gt; &gt; eventually gets fragmented causing a lot of promotion failures
&gt; &gt;&gt;&gt; resulting
&gt; &gt;&gt;&gt; &gt; in
&gt; &gt;&gt;&gt; &gt; &gt; jvm hangs while the jvm does stop-the-world GCs.
&gt; &gt;&gt;&gt; &gt; &gt;
&gt; &gt;&gt;&gt; &gt; &gt; Does anyone know any workarounds to avoid these memory issues
when
&gt; &gt;&gt;&gt; doing
&gt; &gt;&gt;&gt; &gt; &gt; such lucene queries?
&gt; &gt;&gt;&gt; &gt; &gt;
&gt; &gt;&gt;&gt; &gt; &gt; My profiling showed that even after a full GC lucene is holding
on
&gt; to a
&gt; &gt;&gt;&gt; &gt; lot
&gt; &gt;&gt;&gt; &gt; &gt; of references to field value data notably via the
&gt; &gt;&gt;&gt; &gt; &gt; FieldCacheImpl/ExtendedFieldCacheImpl. I noticed that the
&gt; WeakHashMap
&gt; &gt;&gt;&gt; &gt; &gt; readerCaches are using unbounded HashMaps as the innerCaches and
I
&gt; used
&gt; &gt;&gt;&gt; &gt; &gt; reflection to replace these innerCaches with dummy empty HashMaps,
&gt; but
&gt; &gt;&gt;&gt; &gt; &gt; still
&gt; &gt;&gt;&gt; &gt; &gt; I'm seeing the same behavior. I wondered if anyone has gone through
&gt; &gt;&gt;&gt; these
&gt; &gt;&gt;&gt; &gt; &gt; same issues before and would offer any advice.
&gt; &gt;&gt;&gt; &gt; &gt;
&gt; &gt;&gt;&gt; &gt; &gt; Thanks a lot,
&gt; &gt;&gt;&gt; &gt; &gt; TCK
&gt; &gt;&gt;&gt; &gt; &gt;
&gt; &gt;&gt;&gt; &gt;
&gt; &gt;&gt;&gt;
&gt; &gt;&gt;
&gt; &gt;
&gt;
&gt; ---------------------------------------------------------------------
&gt; To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
&gt; For additional commands, e-mail: java-user-help@lucene.apache.org
&gt;
&gt;

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org



</pre>
</div>
</content>
</entry>
<entry>
<title>Re: heap memory issues when sorting by a string field</title>
<author><name>Michael McCandless &lt;lucene@mikemccandless.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-java-user/200912.mbox/%3c9ac0c6aa0912091444g4dc0b0f9n19485948aee01d49@mail.gmail.com%3e"/>
<id>urn:uuid:%3c9ac0c6aa0912091444g4dc0b0f9n19485948aee01d49@mail-gmail-com%3e</id>
<updated>2009-12-09T22:44:12Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
It's not that it's "necessary" -- this is just how Lucene's sorting
has always worked ;)  But, it's just software!  You could whip up a
patch...

I'm not familiar with the order-maintenance problem &amp; solutions
offhand, but it certainly sounds interesting.

One issue is that loading only certain values on demand (from disk)
can be costly, eg it requires multiple disk head seeks (though, SSDs
will help here, though it's still maybe several order of magnitudes
slower than RAM (as long as the RAM isn't swapped out).

I think Lucene could cut back on the RAM usage by more compactly
representing the values.  Making separate String per value is alot of
overhead...

Mike

On Wed, Dec 9, 2009 at 4:58 PM, TCK &lt;moonwatcher32329@gmail.com&gt; wrote:
&gt; Thanks Mike for opening this jira ticket and for your patch. Explicitly
&gt; removing the entry from the WHM definitely does reduce the number of GC
&gt; cycles taken to free the huge StringIndex objects that get created when
&gt; doing a sort by a string field.
&gt;
&gt; But I'm still trying to figure out why it is necessary for lucene to load
&gt; into memory the sorted-by field of every single document in the index (which
&gt; is what makes the StringIndex objects so big) on the first query. Why isn't
&gt; it sufficient to load the field values for only the documents that match the
&gt; given search criteria? Perhaps by using an order-maintenance data structure
&gt; (Dietz&amp;Sleator, or Bender et al) coupled with a balanced search tree,
&gt; instead of the simple lookup and order arrays currently used in StringIndex,
&gt; we can dynamically grow it as needed by successive queries rather than
&gt; loading everything on the first query.
&gt;
&gt; Apologies if this is a naive question... I'm a newbie to the lucene code and
&gt; can't wait for the new lucene book to come out:-)
&gt;
&gt; -TCK
&gt;
&gt;
&gt;
&gt;
&gt; On Tue, Dec 8, 2009 at 5:43 AM, Michael McCandless &lt;
&gt; lucene@mikemccandless.com&gt; wrote:
&gt;
&gt;&gt; I've opened LUCENE-2135.
&gt;&gt;
&gt;&gt; Mike
&gt;&gt;
&gt;&gt; On Tue, Dec 8, 2009 at 5:36 AM, Michael McCandless
&gt;&gt; &lt;lucene@mikemccandless.com&gt; wrote:
&gt;&gt; &gt; This is a rather disturbing implementation detail of WeakHashMap, that
&gt;&gt; &gt; it needs the one extra step (invoking one of its methods) for its weak
&gt;&gt; &gt; keys to be reclaimable.
&gt;&gt; &gt;
&gt;&gt; &gt; Maybe on IndexReader.close(), Lucene should go and evict all entries
&gt;&gt; &gt; in the FieldCache associated with that reader.  Ie, step through the
&gt;&gt; &gt; sub-readers, and if they are truly closed as well (not shared w/ other
&gt;&gt; &gt; readers), evict.  I'll open an issue.
&gt;&gt; &gt;
&gt;&gt; &gt; Even in TCK's code fragment, it's not until the final line is done
&gt;&gt; &gt; executing, that the cache key even loses all hard references, because
&gt;&gt; &gt; it's that line that assigns to sSearcher, replacing the strong
&gt;&gt; &gt; reference to the old searcher.  Inserting sSearcher = null prior to
&gt;&gt; &gt; that would drop the hard reference sooner, but because of this impl
&gt;&gt; &gt; detail of WeakHashMap, something would still have to touch it (eg, a
&gt;&gt; &gt; warmup query that hits the field cache) before it's reclaimable.
&gt;&gt; &gt;
&gt;&gt; &gt; Mike
&gt;&gt; &gt;
&gt;&gt; &gt; On Mon, Dec 7, 2009 at 7:38 PM, Tom Hill &lt;solr-list@worldware.com&gt;
&gt;&gt; wrote:
&gt;&gt; &gt;&gt; Hi -
&gt;&gt; &gt;&gt;
&gt;&gt; &gt;&gt; If I understand correctly, WeakHashMap does not free the memory for the
&gt;&gt; &gt;&gt; value (cached data) when the key is nulled, or even when the key is
&gt;&gt; garbage
&gt;&gt; &gt;&gt; collected.
&gt;&gt; &gt;&gt;
&gt;&gt; &gt;&gt; It requires one more step: a method on WeakHashMap must be called to
&gt;&gt; allow
&gt;&gt; &gt;&gt; it to release its hard reference to the cached data. It appears that
&gt;&gt; most
&gt;&gt; &gt;&gt; methods in WeakHashMap end up calling expungeStaleEntries, which will
&gt;&gt; clear
&gt;&gt; &gt;&gt; the hard reference. But you have to call some method on the map, before
&gt;&gt; the
&gt;&gt; &gt;&gt; memory is eligible for garbage collection.
&gt;&gt; &gt;&gt;
&gt;&gt; &gt;&gt; So it requires four stages to free the cached data. Null the key; A GC
&gt;&gt; to
&gt;&gt; &gt;&gt; release the weak reference to the key; A call to some method on the map;
&gt;&gt; &gt;&gt; Then the next GC cycle should free the value.
&gt;&gt; &gt;&gt;
&gt;&gt; &gt;&gt; So it seems possible that you could end up with double memory usage for
&gt;&gt; a
&gt;&gt; &gt;&gt; time. If you don't have a GC between the time that you close the old
&gt;&gt; reader,
&gt;&gt; &gt;&gt; and you start to load the field cache entry for the next reader, then
&gt;&gt; the
&gt;&gt; &gt;&gt; key may still be hanging around uncollected.
&gt;&gt; &gt;&gt;
&gt;&gt; &gt;&gt; At that point, it may run a GC when you allocate the new cache, but
&gt;&gt; that's
&gt;&gt; &gt;&gt; only the first GC. It can't free the cached data until after the next
&gt;&gt; call
&gt;&gt; &gt;&gt; to expungeStaleEntries, so for a while you have both caches around.
&gt;&gt; &gt;&gt;
&gt;&gt; &gt;&gt; This extra usage could cause things to move into tenured space. Could
&gt;&gt; this
&gt;&gt; &gt;&gt; be causing your problem?
&gt;&gt; &gt;&gt;
&gt;&gt; &gt;&gt; Workaround would be to cause some method to be called on the
&gt;&gt; WeakHashMap.
&gt;&gt; &gt;&gt; You don't want to call get(), since that will try to populate the cache.
&gt;&gt; &gt;&gt; Maybe if you tried putting a small value to the cache, and doing a GC,
&gt;&gt; and
&gt;&gt; &gt;&gt; see if your memory drops then.
&gt;&gt; &gt;&gt;
&gt;&gt; &gt;&gt;
&gt;&gt; &gt;&gt; Tom
&gt;&gt; &gt;&gt;
&gt;&gt; &gt;&gt;
&gt;&gt; &gt;&gt;
&gt;&gt; &gt;&gt; On Mon, Dec 7, 2009 at 1:48 PM, TCK &lt;moonwatcher32329@gmail.com&gt; wrote:
&gt;&gt; &gt;&gt;
&gt;&gt; &gt;&gt;&gt; Thanks for the response. But I'm definitely calling close() on the old
&gt;&gt; &gt;&gt;&gt; reader and opening a new one (not using reopen). Also, to simplify the
&gt;&gt; &gt;&gt;&gt; analysis, I did my test with a single-threaded requester to eliminate
&gt;&gt; any
&gt;&gt; &gt;&gt;&gt; concurrency issues.
&gt;&gt; &gt;&gt;&gt;
&gt;&gt; &gt;&gt;&gt; I'm doing:
&gt;&gt; &gt;&gt;&gt; sSearcher.getIndexReader().close();
&gt;&gt; &gt;&gt;&gt; sSearcher.close(); // this actually seems to be a no-op
&gt;&gt; &gt;&gt;&gt; IndexReader newIndexReader = IndexReader.open(newDirectory);
&gt;&gt; &gt;&gt;&gt; sSearcher = new IndexSearcher(newIndexReader);
&gt;&gt; &gt;&gt;&gt;
&gt;&gt; &gt;&gt;&gt; Btw, isn't it bad practice anyway to have an unbounded cache? Are there
&gt;&gt; any
&gt;&gt; &gt;&gt;&gt; plans to replace the HashMaps used for the innerCaches with an actual
&gt;&gt; &gt;&gt;&gt; size-bounded cache with some eviction policy (perhaps EhCache or
&gt;&gt; something)
&gt;&gt; &gt;&gt;&gt; ?
&gt;&gt; &gt;&gt;&gt;
&gt;&gt; &gt;&gt;&gt; Thanks again,
&gt;&gt; &gt;&gt;&gt; TCK
&gt;&gt; &gt;&gt;&gt;
&gt;&gt; &gt;&gt;&gt;
&gt;&gt; &gt;&gt;&gt;
&gt;&gt; &gt;&gt;&gt;
&gt;&gt; &gt;&gt;&gt; On Mon, Dec 7, 2009 at 4:37 PM, Erick Erickson &lt;
&gt;&gt; erickerickson@gmail.com
&gt;&gt; &gt;&gt;&gt; &gt;wrote:
&gt;&gt; &gt;&gt;&gt;
&gt;&gt; &gt;&gt;&gt; &gt; What this sounds like is that you're not really closing your
&gt;&gt; &gt;&gt;&gt; &gt; readers even though you think you are. Sorting indeed uses up
&gt;&gt; &gt;&gt;&gt; &gt; significant memory when it populates internal caches and keeps
&gt;&gt; &gt;&gt;&gt; &gt; it around for later use (which is one of the reasons that warming
&gt;&gt; &gt;&gt;&gt; &gt; queries matter). But if you really do close the reader, I'm pretty
&gt;&gt; &gt;&gt;&gt; &gt; sure the memory should be GC-able.
&gt;&gt; &gt;&gt;&gt; &gt;
&gt;&gt; &gt;&gt;&gt; &gt; One thing that trips people up is IndexReader.reopen(). If it
&gt;&gt; &gt;&gt;&gt; &gt; returns a reader different than the original, you *must* close
the
&gt;&gt; &gt;&gt;&gt; &gt; old one. If you don't, the old reader is still hanging around and
&gt;&gt; &gt;&gt;&gt; &gt; memory won't be returne.... An example from the Javadocs...
&gt;&gt; &gt;&gt;&gt; &gt;
&gt;&gt; &gt;&gt;&gt; &gt;  IndexReader reader = ...
&gt;&gt; &gt;&gt;&gt; &gt;  ...
&gt;&gt; &gt;&gt;&gt; &gt;  IndexReader new = r.reopen();
&gt;&gt; &gt;&gt;&gt; &gt;  if (new != reader) {
&gt;&gt; &gt;&gt;&gt; &gt;   ...     // reader was reopened
&gt;&gt; &gt;&gt;&gt; &gt;   reader.close();
&gt;&gt; &gt;&gt;&gt; &gt;  }
&gt;&gt; &gt;&gt;&gt; &gt;  reader = new;
&gt;&gt; &gt;&gt;&gt; &gt;  ...
&gt;&gt; &gt;&gt;&gt; &gt;
&gt;&gt; &gt;&gt;&gt; &gt;
&gt;&gt; &gt;&gt;&gt; &gt; If this is irrelevant, could you post your close/open
&gt;&gt; &gt;&gt;&gt; &gt;
&gt;&gt; &gt;&gt;&gt; &gt; code?
&gt;&gt; &gt;&gt;&gt; &gt;
&gt;&gt; &gt;&gt;&gt; &gt; HTH
&gt;&gt; &gt;&gt;&gt; &gt;
&gt;&gt; &gt;&gt;&gt; &gt; Erick
&gt;&gt; &gt;&gt;&gt; &gt;
&gt;&gt; &gt;&gt;&gt; &gt;
&gt;&gt; &gt;&gt;&gt; &gt; On Mon, Dec 7, 2009 at 4:27 PM, TCK &lt;moonwatcher32329@gmail.com&gt;
&gt;&gt; wrote:
&gt;&gt; &gt;&gt;&gt; &gt;
&gt;&gt; &gt;&gt;&gt; &gt; &gt; Hi,
&gt;&gt; &gt;&gt;&gt; &gt; &gt; I'm having heap memory issues when I do lucene queries involving
&gt;&gt; &gt;&gt;&gt; sorting
&gt;&gt; &gt;&gt;&gt; &gt; by
&gt;&gt; &gt;&gt;&gt; &gt; &gt; a string field. Such queries seem to load a lot of data in
to the
&gt;&gt; heap.
&gt;&gt; &gt;&gt;&gt; &gt; &gt; Moreover lucene seems to hold on to references to this data
even
&gt;&gt; after
&gt;&gt; &gt;&gt;&gt; &gt; the
&gt;&gt; &gt;&gt;&gt; &gt; &gt; index reader has been closed and a full GC has been run. Some
of
&gt;&gt; the
&gt;&gt; &gt;&gt;&gt; &gt; &gt; consequences of this are that in my generational heap configuration
&gt;&gt; a
&gt;&gt; &gt;&gt;&gt; lot
&gt;&gt; &gt;&gt;&gt; &gt; &gt; of
&gt;&gt; &gt;&gt;&gt; &gt; &gt; memory gets promoted to tenured space each time I close the
old
&gt;&gt; index
&gt;&gt; &gt;&gt;&gt; &gt; &gt; reader
&gt;&gt; &gt;&gt;&gt; &gt; &gt; and after opening and querying using a new one, and the tenured
&gt;&gt; space
&gt;&gt; &gt;&gt;&gt; &gt; &gt; eventually gets fragmented causing a lot of promotion failures
&gt;&gt; &gt;&gt;&gt; resulting
&gt;&gt; &gt;&gt;&gt; &gt; in
&gt;&gt; &gt;&gt;&gt; &gt; &gt; jvm hangs while the jvm does stop-the-world GCs.
&gt;&gt; &gt;&gt;&gt; &gt; &gt;
&gt;&gt; &gt;&gt;&gt; &gt; &gt; Does anyone know any workarounds to avoid these memory issues
when
&gt;&gt; &gt;&gt;&gt; doing
&gt;&gt; &gt;&gt;&gt; &gt; &gt; such lucene queries?
&gt;&gt; &gt;&gt;&gt; &gt; &gt;
&gt;&gt; &gt;&gt;&gt; &gt; &gt; My profiling showed that even after a full GC lucene is holding
on
&gt;&gt; to a
&gt;&gt; &gt;&gt;&gt; &gt; lot
&gt;&gt; &gt;&gt;&gt; &gt; &gt; of references to field value data notably via the
&gt;&gt; &gt;&gt;&gt; &gt; &gt; FieldCacheImpl/ExtendedFieldCacheImpl. I noticed that the
&gt;&gt; WeakHashMap
&gt;&gt; &gt;&gt;&gt; &gt; &gt; readerCaches are using unbounded HashMaps as the innerCaches
and I
&gt;&gt; used
&gt;&gt; &gt;&gt;&gt; &gt; &gt; reflection to replace these innerCaches with dummy empty HashMaps,
&gt;&gt; but
&gt;&gt; &gt;&gt;&gt; &gt; &gt; still
&gt;&gt; &gt;&gt;&gt; &gt; &gt; I'm seeing the same behavior. I wondered if anyone has gone
through
&gt;&gt; &gt;&gt;&gt; these
&gt;&gt; &gt;&gt;&gt; &gt; &gt; same issues before and would offer any advice.
&gt;&gt; &gt;&gt;&gt; &gt; &gt;
&gt;&gt; &gt;&gt;&gt; &gt; &gt; Thanks a lot,
&gt;&gt; &gt;&gt;&gt; &gt; &gt; TCK
&gt;&gt; &gt;&gt;&gt; &gt; &gt;
&gt;&gt; &gt;&gt;&gt; &gt;
&gt;&gt; &gt;&gt;&gt;
&gt;&gt; &gt;&gt;
&gt;&gt; &gt;
&gt;&gt;
&gt;&gt; ---------------------------------------------------------------------
&gt;&gt; To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
&gt;&gt; For additional commands, e-mail: java-user-help@lucene.apache.org
&gt;&gt;
&gt;&gt;
&gt;

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org



</pre>
</div>
</content>
</entry>
<entry>
<title>Re: NearSpansUnordered payloads not returning all the time</title>
<author><name>Michael McCandless &lt;lucene@mikemccandless.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-java-user/200912.mbox/%3c9ac0c6aa0912091425i6f6b51f3p9b8be471a952336d@mail.gmail.com%3e"/>
<id>urn:uuid:%3c9ac0c6aa0912091425i6f6b51f3p9b8be471a952336d@mail-gmail-com%3e</id>
<updated>2009-12-09T22:25:29Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Yes, you found it!  Is that what you're hitting?

I don't know of a workaround though... this is just how SpanQuery
currently works...

Mike

On Wed, Dec 9, 2009 at 4:56 PM, Jason Rutherglen
&lt;jason.rutherglen@gmail.com&gt; wrote:
&gt; Mike,
&gt;
&gt; Is this the thread?
&gt;
&gt; http://www.lucidimagination.com/search/document/1e87d488a904b89f/spannearquery_s_spans_payloads#8103efdc9705a763
&gt;
&gt; Maybe we need a recommended workaround for this?
&gt;
&gt; Jason
&gt;
&gt; On Wed, Dec 9, 2009 at 1:17 PM, Michael McCandless
&gt; &lt;lucene@mikemccandless.com&gt; wrote:
&gt;&gt; That sounds familiar... try to track down the last thread maybe?
&gt;&gt;
&gt;&gt; I think it was this: if the payload was already retrieved for a prior
&gt;&gt; span then the current span won't be able to retrieve it, so even
&gt;&gt; though you know a payload falls within the span you're looking at, you
&gt;&gt; won't get it back, if it already fell on a prior span.
&gt;&gt;
&gt;&gt; Mike
&gt;&gt;
&gt;&gt; On Wed, Dec 9, 2009 at 11:25 AM, Jason Rutherglen
&gt;&gt; &lt;jason.rutherglen@gmail.com&gt; wrote:
&gt;&gt;&gt; Right we're getting the spans, however it's just the payloads that are
&gt;&gt;&gt; missing, randomly...
&gt;&gt;&gt;
&gt;&gt;&gt; On Wed, Dec 9, 2009 at 2:23 AM, Michael McCandless
&gt;&gt;&gt; &lt;lucene@mikemccandless.com&gt; wrote:
&gt;&gt;&gt;&gt; There was a thread a while back about how span queries don't enumerate
&gt;&gt;&gt;&gt; every possible span, but I can't remember if that included sometimes
&gt;&gt;&gt;&gt; missing payloads...
&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt; Mike
&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt; On Tue, Dec 8, 2009 at 7:34 PM, Jason Rutherglen
&gt;&gt;&gt;&gt; &lt;jason.rutherglen@gmail.com&gt; wrote:
&gt;&gt;&gt;&gt;&gt; Howdy,
&gt;&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt;&gt; I am wondering if anyone has seen
&gt;&gt;&gt;&gt;&gt; NearSpansUnordered.getPayload() not return payloads that are
&gt;&gt;&gt;&gt;&gt; verifiably accessible via IR.termPositions? It's a bit confusing
&gt;&gt;&gt;&gt;&gt; because most of the time they're returned properly.
&gt;&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt;&gt; I suspect the payload logic gets tripped up in
&gt;&gt;&gt;&gt;&gt; NearSpansUnordered. I'll put together a test case, however the
&gt;&gt;&gt;&gt;&gt; difficulty is that we're only seeing the issue with largish 800
&gt;&gt;&gt;&gt;&gt; MB indexes, which could make the test case a little crazy.
&gt;&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt;&gt; Jason
&gt;&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt;&gt; ---------------------------------------------------------------------
&gt;&gt;&gt;&gt;&gt; To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
&gt;&gt;&gt;&gt;&gt; For additional commands, e-mail: java-user-help@lucene.apache.org
&gt;&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt; ---------------------------------------------------------------------
&gt;&gt;&gt;&gt; To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
&gt;&gt;&gt;&gt; For additional commands, e-mail: java-user-help@lucene.apache.org
&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt;
&gt;&gt;&gt;
&gt;&gt;&gt; ---------------------------------------------------------------------
&gt;&gt;&gt; To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
&gt;&gt;&gt; For additional commands, e-mail: java-user-help@lucene.apache.org
&gt;&gt;&gt;
&gt;&gt;&gt;
&gt;&gt;
&gt;&gt; ---------------------------------------------------------------------
&gt;&gt; To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
&gt;&gt; For additional commands, e-mail: java-user-help@lucene.apache.org
&gt;&gt;
&gt;&gt;
&gt;
&gt; ---------------------------------------------------------------------
&gt; To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
&gt; For additional commands, e-mail: java-user-help@lucene.apache.org
&gt;
&gt;

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org



</pre>
</div>
</content>
</entry>
<entry>
<title>RE: MatchAllDocsQuery and InstantiatedIndex on Lucene 2.9.1</title>
<author><name>&quot;Uwe Schindler&quot; &lt;uwe@thetaphi.de&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-java-user/200912.mbox/%3c467107581D194DA996B89CCA45E93D31@VEGA%3e"/>
<id>urn:uuid:%3c467107581D194DA996B89CCA45E93D31@VEGA%3e</id>
<updated>2009-12-09T22:22:33Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
This is a bug in InstantiatedIndex. The termDoc(null) was added to get all
documents. This was never implemented in Instantiated Index. Can you open an
issue?

There maybe other queries fail because of this (e.g.
FieldCacheRangeFilter,...).

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

&gt; -----Original Message-----
&gt; From: Jason Fennell [mailto:jdfennell@gmail.com]
&gt; Sent: Wednesday, December 09, 2009 7:48 PM
&gt; To: java-user@lucene.apache.org
&gt; Subject: MatchAllDocsQuery and InstantiatedIndex on Lucene 2.9.1
&gt; 
&gt; I'm trying to upgrade our application from Lucene 2.4.1 to Lucene 2.9.1.
&gt; I've been using an InstantiatedIndex to do a bunch of unit testing, but am
&gt; running into a some problems with Lucene 2.9.1.
&gt; In particular, when I try to run a MatchAllDocsQuery on my
&gt; InstantiatedIndex
&gt; (which worked fine on 2.4.1) a NullPointerException is raised.  I think I
&gt; tracked down the problem in the Lucene source.
&gt; 
&gt; In the 2.9.1 MatchAllDocsQuery, the MatchAllScorer retrieves termDocs from
&gt; the IndexReader passed to it with
&gt; 
&gt; reader.termDocs(null)
&gt; 
&gt; which I assume is supposed to return all termDocs.  However, tracing this
&gt; call down
&gt; 
&gt; IndexReader.termDocs(term) calls InstantiatedTermDocs.seek(
&gt; term) calls InstantiatedIndex.findTerm(term)
&gt; 
&gt; which is implemented as
&gt; 
&gt; InstantiatedTerm findTerm(Term term) {
&gt;     return findTerm(term.field(), term.text());
&gt; }
&gt; 
&gt; which, since term is null, results in a NullPointerException.  This seems
&gt; to
&gt; me like an bug either in the MatchAllDocsQuery implementation (the version
&gt; in 2.4.1 did not use termDocs, so did not pass through this null), or a
&gt; bug
&gt; in the implementation of InstantiatedIndex.
&gt; 
&gt; Any suggestions on what I can do about this?  I definitely can't get rid
&gt; of
&gt; the MatchAllDocsQuery and don't really want to move back to a slow
&gt; RAMDirectory.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org



</pre>
</div>
</content>
</entry>
<entry>
<title>Re: heap memory issues when sorting by a string field</title>
<author><name>TCK &lt;moonwatcher32329@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-java-user/200912.mbox/%3c17022a310912091358y20f4eebbt7bcd0253cb8add04@mail.gmail.com%3e"/>
<id>urn:uuid:%3c17022a310912091358y20f4eebbt7bcd0253cb8add04@mail-gmail-com%3e</id>
<updated>2009-12-09T21:58:05Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Thanks Mike for opening this jira ticket and for your patch. Explicitly
removing the entry from the WHM definitely does reduce the number of GC
cycles taken to free the huge StringIndex objects that get created when
doing a sort by a string field.

But I'm still trying to figure out why it is necessary for lucene to load
into memory the sorted-by field of every single document in the index (which
is what makes the StringIndex objects so big) on the first query. Why isn't
it sufficient to load the field values for only the documents that match the
given search criteria? Perhaps by using an order-maintenance data structure
(Dietz&amp;Sleator, or Bender et al) coupled with a balanced search tree,
instead of the simple lookup and order arrays currently used in StringIndex,
we can dynamically grow it as needed by successive queries rather than
loading everything on the first query.

Apologies if this is a naive question... I'm a newbie to the lucene code and
can't wait for the new lucene book to come out:-)

-TCK




On Tue, Dec 8, 2009 at 5:43 AM, Michael McCandless &lt;
lucene@mikemccandless.com&gt; wrote:

&gt; I've opened LUCENE-2135.
&gt;
&gt; Mike
&gt;
&gt; On Tue, Dec 8, 2009 at 5:36 AM, Michael McCandless
&gt; &lt;lucene@mikemccandless.com&gt; wrote:
&gt; &gt; This is a rather disturbing implementation detail of WeakHashMap, that
&gt; &gt; it needs the one extra step (invoking one of its methods) for its weak
&gt; &gt; keys to be reclaimable.
&gt; &gt;
&gt; &gt; Maybe on IndexReader.close(), Lucene should go and evict all entries
&gt; &gt; in the FieldCache associated with that reader.  Ie, step through the
&gt; &gt; sub-readers, and if they are truly closed as well (not shared w/ other
&gt; &gt; readers), evict.  I'll open an issue.
&gt; &gt;
&gt; &gt; Even in TCK's code fragment, it's not until the final line is done
&gt; &gt; executing, that the cache key even loses all hard references, because
&gt; &gt; it's that line that assigns to sSearcher, replacing the strong
&gt; &gt; reference to the old searcher.  Inserting sSearcher = null prior to
&gt; &gt; that would drop the hard reference sooner, but because of this impl
&gt; &gt; detail of WeakHashMap, something would still have to touch it (eg, a
&gt; &gt; warmup query that hits the field cache) before it's reclaimable.
&gt; &gt;
&gt; &gt; Mike
&gt; &gt;
&gt; &gt; On Mon, Dec 7, 2009 at 7:38 PM, Tom Hill &lt;solr-list@worldware.com&gt;
&gt; wrote:
&gt; &gt;&gt; Hi -
&gt; &gt;&gt;
&gt; &gt;&gt; If I understand correctly, WeakHashMap does not free the memory for the
&gt; &gt;&gt; value (cached data) when the key is nulled, or even when the key is
&gt; garbage
&gt; &gt;&gt; collected.
&gt; &gt;&gt;
&gt; &gt;&gt; It requires one more step: a method on WeakHashMap must be called to
&gt; allow
&gt; &gt;&gt; it to release its hard reference to the cached data. It appears that
&gt; most
&gt; &gt;&gt; methods in WeakHashMap end up calling expungeStaleEntries, which will
&gt; clear
&gt; &gt;&gt; the hard reference. But you have to call some method on the map, before
&gt; the
&gt; &gt;&gt; memory is eligible for garbage collection.
&gt; &gt;&gt;
&gt; &gt;&gt; So it requires four stages to free the cached data. Null the key; A GC
&gt; to
&gt; &gt;&gt; release the weak reference to the key; A call to some method on the map;
&gt; &gt;&gt; Then the next GC cycle should free the value.
&gt; &gt;&gt;
&gt; &gt;&gt; So it seems possible that you could end up with double memory usage for
&gt; a
&gt; &gt;&gt; time. If you don't have a GC between the time that you close the old
&gt; reader,
&gt; &gt;&gt; and you start to load the field cache entry for the next reader, then
&gt; the
&gt; &gt;&gt; key may still be hanging around uncollected.
&gt; &gt;&gt;
&gt; &gt;&gt; At that point, it may run a GC when you allocate the new cache, but
&gt; that's
&gt; &gt;&gt; only the first GC. It can't free the cached data until after the next
&gt; call
&gt; &gt;&gt; to expungeStaleEntries, so for a while you have both caches around.
&gt; &gt;&gt;
&gt; &gt;&gt; This extra usage could cause things to move into tenured space. Could
&gt; this
&gt; &gt;&gt; be causing your problem?
&gt; &gt;&gt;
&gt; &gt;&gt; Workaround would be to cause some method to be called on the
&gt; WeakHashMap.
&gt; &gt;&gt; You don't want to call get(), since that will try to populate the cache.
&gt; &gt;&gt; Maybe if you tried putting a small value to the cache, and doing a GC,
&gt; and
&gt; &gt;&gt; see if your memory drops then.
&gt; &gt;&gt;
&gt; &gt;&gt;
&gt; &gt;&gt; Tom
&gt; &gt;&gt;
&gt; &gt;&gt;
&gt; &gt;&gt;
&gt; &gt;&gt; On Mon, Dec 7, 2009 at 1:48 PM, TCK &lt;moonwatcher32329@gmail.com&gt; wrote:
&gt; &gt;&gt;
&gt; &gt;&gt;&gt; Thanks for the response. But I'm definitely calling close() on the old
&gt; &gt;&gt;&gt; reader and opening a new one (not using reopen). Also, to simplify the
&gt; &gt;&gt;&gt; analysis, I did my test with a single-threaded requester to eliminate
&gt; any
&gt; &gt;&gt;&gt; concurrency issues.
&gt; &gt;&gt;&gt;
&gt; &gt;&gt;&gt; I'm doing:
&gt; &gt;&gt;&gt; sSearcher.getIndexReader().close();
&gt; &gt;&gt;&gt; sSearcher.close(); // this actually seems to be a no-op
&gt; &gt;&gt;&gt; IndexReader newIndexReader = IndexReader.open(newDirectory);
&gt; &gt;&gt;&gt; sSearcher = new IndexSearcher(newIndexReader);
&gt; &gt;&gt;&gt;
&gt; &gt;&gt;&gt; Btw, isn't it bad practice anyway to have an unbounded cache? Are there
&gt; any
&gt; &gt;&gt;&gt; plans to replace the HashMaps used for the innerCaches with an actual
&gt; &gt;&gt;&gt; size-bounded cache with some eviction policy (perhaps EhCache or
&gt; something)
&gt; &gt;&gt;&gt; ?
&gt; &gt;&gt;&gt;
&gt; &gt;&gt;&gt; Thanks again,
&gt; &gt;&gt;&gt; TCK
&gt; &gt;&gt;&gt;
&gt; &gt;&gt;&gt;
&gt; &gt;&gt;&gt;
&gt; &gt;&gt;&gt;
&gt; &gt;&gt;&gt; On Mon, Dec 7, 2009 at 4:37 PM, Erick Erickson &lt;
&gt; erickerickson@gmail.com
&gt; &gt;&gt;&gt; &gt;wrote:
&gt; &gt;&gt;&gt;
&gt; &gt;&gt;&gt; &gt; What this sounds like is that you're not really closing your
&gt; &gt;&gt;&gt; &gt; readers even though you think you are. Sorting indeed uses up
&gt; &gt;&gt;&gt; &gt; significant memory when it populates internal caches and keeps
&gt; &gt;&gt;&gt; &gt; it around for later use (which is one of the reasons that warming
&gt; &gt;&gt;&gt; &gt; queries matter). But if you really do close the reader, I'm pretty
&gt; &gt;&gt;&gt; &gt; sure the memory should be GC-able.
&gt; &gt;&gt;&gt; &gt;
&gt; &gt;&gt;&gt; &gt; One thing that trips people up is IndexReader.reopen(). If it
&gt; &gt;&gt;&gt; &gt; returns a reader different than the original, you *must* close the
&gt; &gt;&gt;&gt; &gt; old one. If you don't, the old reader is still hanging around and
&gt; &gt;&gt;&gt; &gt; memory won't be returne.... An example from the Javadocs...
&gt; &gt;&gt;&gt; &gt;
&gt; &gt;&gt;&gt; &gt;  IndexReader reader = ...
&gt; &gt;&gt;&gt; &gt;  ...
&gt; &gt;&gt;&gt; &gt;  IndexReader new = r.reopen();
&gt; &gt;&gt;&gt; &gt;  if (new != reader) {
&gt; &gt;&gt;&gt; &gt;   ...     // reader was reopened
&gt; &gt;&gt;&gt; &gt;   reader.close();
&gt; &gt;&gt;&gt; &gt;  }
&gt; &gt;&gt;&gt; &gt;  reader = new;
&gt; &gt;&gt;&gt; &gt;  ...
&gt; &gt;&gt;&gt; &gt;
&gt; &gt;&gt;&gt; &gt;
&gt; &gt;&gt;&gt; &gt; If this is irrelevant, could you post your close/open
&gt; &gt;&gt;&gt; &gt;
&gt; &gt;&gt;&gt; &gt; code?
&gt; &gt;&gt;&gt; &gt;
&gt; &gt;&gt;&gt; &gt; HTH
&gt; &gt;&gt;&gt; &gt;
&gt; &gt;&gt;&gt; &gt; Erick
&gt; &gt;&gt;&gt; &gt;
&gt; &gt;&gt;&gt; &gt;
&gt; &gt;&gt;&gt; &gt; On Mon, Dec 7, 2009 at 4:27 PM, TCK &lt;moonwatcher32329@gmail.com&gt;
&gt; wrote:
&gt; &gt;&gt;&gt; &gt;
&gt; &gt;&gt;&gt; &gt; &gt; Hi,
&gt; &gt;&gt;&gt; &gt; &gt; I'm having heap memory issues when I do lucene queries involving
&gt; &gt;&gt;&gt; sorting
&gt; &gt;&gt;&gt; &gt; by
&gt; &gt;&gt;&gt; &gt; &gt; a string field. Such queries seem to load a lot of data in to
the
&gt; heap.
&gt; &gt;&gt;&gt; &gt; &gt; Moreover lucene seems to hold on to references to this data even
&gt; after
&gt; &gt;&gt;&gt; &gt; the
&gt; &gt;&gt;&gt; &gt; &gt; index reader has been closed and a full GC has been run. Some
of
&gt; the
&gt; &gt;&gt;&gt; &gt; &gt; consequences of this are that in my generational heap configuration
&gt; a
&gt; &gt;&gt;&gt; lot
&gt; &gt;&gt;&gt; &gt; &gt; of
&gt; &gt;&gt;&gt; &gt; &gt; memory gets promoted to tenured space each time I close the old
&gt; index
&gt; &gt;&gt;&gt; &gt; &gt; reader
&gt; &gt;&gt;&gt; &gt; &gt; and after opening and querying using a new one, and the tenured
&gt; space
&gt; &gt;&gt;&gt; &gt; &gt; eventually gets fragmented causing a lot of promotion failures
&gt; &gt;&gt;&gt; resulting
&gt; &gt;&gt;&gt; &gt; in
&gt; &gt;&gt;&gt; &gt; &gt; jvm hangs while the jvm does stop-the-world GCs.
&gt; &gt;&gt;&gt; &gt; &gt;
&gt; &gt;&gt;&gt; &gt; &gt; Does anyone know any workarounds to avoid these memory issues
when
&gt; &gt;&gt;&gt; doing
&gt; &gt;&gt;&gt; &gt; &gt; such lucene queries?
&gt; &gt;&gt;&gt; &gt; &gt;
&gt; &gt;&gt;&gt; &gt; &gt; My profiling showed that even after a full GC lucene is holding
on
&gt; to a
&gt; &gt;&gt;&gt; &gt; lot
&gt; &gt;&gt;&gt; &gt; &gt; of references to field value data notably via the
&gt; &gt;&gt;&gt; &gt; &gt; FieldCacheImpl/ExtendedFieldCacheImpl. I noticed that the
&gt; WeakHashMap
&gt; &gt;&gt;&gt; &gt; &gt; readerCaches are using unbounded HashMaps as the innerCaches and
I
&gt; used
&gt; &gt;&gt;&gt; &gt; &gt; reflection to replace these innerCaches with dummy empty HashMaps,
&gt; but
&gt; &gt;&gt;&gt; &gt; &gt; still
&gt; &gt;&gt;&gt; &gt; &gt; I'm seeing the same behavior. I wondered if anyone has gone through
&gt; &gt;&gt;&gt; these
&gt; &gt;&gt;&gt; &gt; &gt; same issues before and would offer any advice.
&gt; &gt;&gt;&gt; &gt; &gt;
&gt; &gt;&gt;&gt; &gt; &gt; Thanks a lot,
&gt; &gt;&gt;&gt; &gt; &gt; TCK
&gt; &gt;&gt;&gt; &gt; &gt;
&gt; &gt;&gt;&gt; &gt;
&gt; &gt;&gt;&gt;
&gt; &gt;&gt;
&gt; &gt;
&gt;
&gt; ---------------------------------------------------------------------
&gt; To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
&gt; For additional commands, e-mail: java-user-help@lucene.apache.org
&gt;
&gt;


</pre>
</div>
</content>
</entry>
<entry>
<title>Re: NearSpansUnordered payloads not returning all the time</title>
<author><name>Jason Rutherglen &lt;jason.rutherglen@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-java-user/200912.mbox/%3c85d3c3b60912091356t6c260c43gcaf0ede53d7a8f98@mail.gmail.com%3e"/>
<id>urn:uuid:%3c85d3c3b60912091356t6c260c43gcaf0ede53d7a8f98@mail-gmail-com%3e</id>
<updated>2009-12-09T21:56:38Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Mike,

Is this the thread?

http://www.lucidimagination.com/search/document/1e87d488a904b89f/spannearquery_s_spans_payloads#8103efdc9705a763

Maybe we need a recommended workaround for this?

Jason

On Wed, Dec 9, 2009 at 1:17 PM, Michael McCandless
&lt;lucene@mikemccandless.com&gt; wrote:
&gt; That sounds familiar... try to track down the last thread maybe?
&gt;
&gt; I think it was this: if the payload was already retrieved for a prior
&gt; span then the current span won't be able to retrieve it, so even
&gt; though you know a payload falls within the span you're looking at, you
&gt; won't get it back, if it already fell on a prior span.
&gt;
&gt; Mike
&gt;
&gt; On Wed, Dec 9, 2009 at 11:25 AM, Jason Rutherglen
&gt; &lt;jason.rutherglen@gmail.com&gt; wrote:
&gt;&gt; Right we're getting the spans, however it's just the payloads that are
&gt;&gt; missing, randomly...
&gt;&gt;
&gt;&gt; On Wed, Dec 9, 2009 at 2:23 AM, Michael McCandless
&gt;&gt; &lt;lucene@mikemccandless.com&gt; wrote:
&gt;&gt;&gt; There was a thread a while back about how span queries don't enumerate
&gt;&gt;&gt; every possible span, but I can't remember if that included sometimes
&gt;&gt;&gt; missing payloads...
&gt;&gt;&gt;
&gt;&gt;&gt; Mike
&gt;&gt;&gt;
&gt;&gt;&gt; On Tue, Dec 8, 2009 at 7:34 PM, Jason Rutherglen
&gt;&gt;&gt; &lt;jason.rutherglen@gmail.com&gt; wrote:
&gt;&gt;&gt;&gt; Howdy,
&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt; I am wondering if anyone has seen
&gt;&gt;&gt;&gt; NearSpansUnordered.getPayload() not return payloads that are
&gt;&gt;&gt;&gt; verifiably accessible via IR.termPositions? It's a bit confusing
&gt;&gt;&gt;&gt; because most of the time they're returned properly.
&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt; I suspect the payload logic gets tripped up in
&gt;&gt;&gt;&gt; NearSpansUnordered. I'll put together a test case, however the
&gt;&gt;&gt;&gt; difficulty is that we're only seeing the issue with largish 800
&gt;&gt;&gt;&gt; MB indexes, which could make the test case a little crazy.
&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt; Jason
&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt; ---------------------------------------------------------------------
&gt;&gt;&gt;&gt; To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
&gt;&gt;&gt;&gt; For additional commands, e-mail: java-user-help@lucene.apache.org
&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt;
&gt;&gt;&gt;
&gt;&gt;&gt; ---------------------------------------------------------------------
&gt;&gt;&gt; To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
&gt;&gt;&gt; For additional commands, e-mail: java-user-help@lucene.apache.org
&gt;&gt;&gt;
&gt;&gt;&gt;
&gt;&gt;
&gt;&gt; ---------------------------------------------------------------------
&gt;&gt; To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
&gt;&gt; For additional commands, e-mail: java-user-help@lucene.apache.org
&gt;&gt;
&gt;&gt;
&gt;
&gt; ---------------------------------------------------------------------
&gt; To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
&gt; For additional commands, e-mail: java-user-help@lucene.apache.org
&gt;
&gt;

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org



</pre>
</div>
</content>
</entry>
<entry>
<title>Re: NearSpansUnordered payloads not returning all the time</title>
<author><name>Michael McCandless &lt;lucene@mikemccandless.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-java-user/200912.mbox/%3c9ac0c6aa0912091317v242bb434uf748a9b508f7fb0b@mail.gmail.com%3e"/>
<id>urn:uuid:%3c9ac0c6aa0912091317v242bb434uf748a9b508f7fb0b@mail-gmail-com%3e</id>
<updated>2009-12-09T21:17:20Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
That sounds familiar... try to track down the last thread maybe?

I think it was this: if the payload was already retrieved for a prior
span then the current span won't be able to retrieve it, so even
though you know a payload falls within the span you're looking at, you
won't get it back, if it already fell on a prior span.

Mike

On Wed, Dec 9, 2009 at 11:25 AM, Jason Rutherglen
&lt;jason.rutherglen@gmail.com&gt; wrote:
&gt; Right we're getting the spans, however it's just the payloads that are
&gt; missing, randomly...
&gt;
&gt; On Wed, Dec 9, 2009 at 2:23 AM, Michael McCandless
&gt; &lt;lucene@mikemccandless.com&gt; wrote:
&gt;&gt; There was a thread a while back about how span queries don't enumerate
&gt;&gt; every possible span, but I can't remember if that included sometimes
&gt;&gt; missing payloads...
&gt;&gt;
&gt;&gt; Mike
&gt;&gt;
&gt;&gt; On Tue, Dec 8, 2009 at 7:34 PM, Jason Rutherglen
&gt;&gt; &lt;jason.rutherglen@gmail.com&gt; wrote:
&gt;&gt;&gt; Howdy,
&gt;&gt;&gt;
&gt;&gt;&gt; I am wondering if anyone has seen
&gt;&gt;&gt; NearSpansUnordered.getPayload() not return payloads that are
&gt;&gt;&gt; verifiably accessible via IR.termPositions? It's a bit confusing
&gt;&gt;&gt; because most of the time they're returned properly.
&gt;&gt;&gt;
&gt;&gt;&gt; I suspect the payload logic gets tripped up in
&gt;&gt;&gt; NearSpansUnordered. I'll put together a test case, however the
&gt;&gt;&gt; difficulty is that we're only seeing the issue with largish 800
&gt;&gt;&gt; MB indexes, which could make the test case a little crazy.
&gt;&gt;&gt;
&gt;&gt;&gt; Jason
&gt;&gt;&gt;
&gt;&gt;&gt; ---------------------------------------------------------------------
&gt;&gt;&gt; To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
&gt;&gt;&gt; For additional commands, e-mail: java-user-help@lucene.apache.org
&gt;&gt;&gt;
&gt;&gt;&gt;
&gt;&gt;
&gt;&gt; ---------------------------------------------------------------------
&gt;&gt; To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
&gt;&gt; For additional commands, e-mail: java-user-help@lucene.apache.org
&gt;&gt;
&gt;&gt;
&gt;
&gt; ---------------------------------------------------------------------
&gt; To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
&gt; For additional commands, e-mail: java-user-help@lucene.apache.org
&gt;
&gt;

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org



</pre>
</div>
</content>
</entry>
<entry>
<title>Converting HitCollector to Collector</title>
<author><name>Max Lynch &lt;ihasmax@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-java-user/200912.mbox/%3c3836ec640912091058o6366d748vfb04ce855edad19b@mail.gmail.com%3e"/>
<id>urn:uuid:%3c3836ec640912091058o6366d748vfb04ce855edad19b@mail-gmail-com%3e</id>
<updated>2009-12-09T18:58:35Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Hi,
I have a HitCollector that processes all hits from a query.  I want all
hits, not the top N hits.  I am converting my HitCollector to a Collector
for Lucene 3.0.0, and I'm a little confused by the new interface.

I assume that I can implement by new Collector much like the code on the API
Docs:

 Searcher searcher = new IndexSearcher(indexReader);
 final BitSet bits = new BitSet(indexReader.maxDoc());
 searcher.search(query, new Collector() {
   private int docBase;

   *// ignore scorer*
   public void setScorer(Scorer scorer) {
   }

   *// accept docs out of order (for a BitSet it doesn't matter)*
   public boolean acceptsDocsOutOfOrder() {
     return true;
   }

   public void collect(int doc) {
     bits.set(doc + docBase);
   }

   public void setNextReader(IndexReader reader, int docBase) {
     this.docBase = docBase;
   }
 });


But I'm confused what the docBasing is (I need to get fields from each
document from my index searcher).  Do I need to use the doc base or
setNextReader?  Also, what is the purpose of acceptsDocsOutOfOrder?  I
see the docs note on it but I'm not sure how I could apply that or if
I should care about it.

Thanks,
Max


</pre>
</div>
</content>
</entry>
<entry>
<title>MatchAllDocsQuery and InstantiatedIndex on Lucene 2.9.1</title>
<author><name>Jason Fennell &lt;jdfennell@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-java-user/200912.mbox/%3c9fad81fd0912091048y8bf5fc1qe9bd408a99220eae@mail.gmail.com%3e"/>
<id>urn:uuid:%3c9fad81fd0912091048y8bf5fc1qe9bd408a99220eae@mail-gmail-com%3e</id>
<updated>2009-12-09T18:48:00Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
I'm trying to upgrade our application from Lucene 2.4.1 to Lucene 2.9.1.
I've been using an InstantiatedIndex to do a bunch of unit testing, but am
running into a some problems with Lucene 2.9.1.
In particular, when I try to run a MatchAllDocsQuery on my InstantiatedIndex
(which worked fine on 2.4.1) a NullPointerException is raised.  I think I
tracked down the problem in the Lucene source.

In the 2.9.1 MatchAllDocsQuery, the MatchAllScorer retrieves termDocs from
the IndexReader passed to it with

reader.termDocs(null)

which I assume is supposed to return all termDocs.  However, tracing this
call down

IndexReader.termDocs(term) calls InstantiatedTermDocs.seek(
term) calls InstantiatedIndex.findTerm(term)

which is implemented as

InstantiatedTerm findTerm(Term term) {
    return findTerm(term.field(), term.text());
}

which, since term is null, results in a NullPointerException.  This seems to
me like an bug either in the MatchAllDocsQuery implementation (the version
in 2.4.1 did not use termDocs, so did not pass through this null), or a bug
in the implementation of InstantiatedIndex.

Any suggestions on what I can do about this?  I definitely can't get rid of
the MatchAllDocsQuery and don't really want to move back to a slow
RAMDirectory.


</pre>
</div>
</content>
</entry>
<entry>
<title>Re: index reader for multiple indexes</title>
<author><name>David Causse &lt;dcausse@spotter.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-java-user/200912.mbox/%3c20091209175442.GF4942@spotter-dclnx%3e"/>
<id>urn:uuid:%3c20091209175442-GF4942@spotter-dclnx%3e</id>
<updated>2009-12-09T17:54:42Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
On Fri, Oct 02, 2009 at 11:40:09PM -0700, m.harig wrote:
&gt; 
&gt; Thanks Uwe Schindler ,
&gt; 
&gt;       If i use an IndexReader[] to use MultiReader  , will it be thread
&gt; safe? because i've to reopen my IndexReader to check whether my index is
&gt; updated or not . In this case how do i handle it? please suggest me

Hi,

AFAIK readers are cloned when they are used inside MultiReader, so I
guess reopening a single reader outside MultiReader has no effect and
reopen on MultiReader reopen all the readers.

So I think there is no way to reopen one specific reader inside a
MultiReader without a new instance of the MultiReader.

IMHO the easy way is to use MultiReader as a normal reader (with
default constructor MultiReader(IndexReader... readers) ), forget the
original readers and use reopen on the MultiReader. It should be
thread-safe.
You should check the sources for more complicated usage.

&gt; 
&gt; 
&gt; 
&gt; Uwe Schindler wrote:
&gt; &gt; 
&gt; &gt; Use MultiReader which is an IndexReader on top of various
&gt; &gt; Sub-IndexReaders.
&gt; &gt; 
&gt; &gt; -----
&gt; &gt; Uwe Schindler
&gt; &gt; H.-H.-Meier-Allee 63, D-28213 Bremen
&gt; &gt; http://www.thetaphi.de
&gt; &gt; eMail: uwe@thetaphi.de
&gt; &gt; 
&gt; &gt;&gt; -----Original Message-----
&gt; &gt;&gt; From: m.harig [mailto:m.harig@gmail.com]
&gt; &gt;&gt; Sent: Friday, October 02, 2009 6:52 PM
&gt; &gt;&gt; To: java-user@lucene.apache.org
&gt; &gt;&gt; Subject: index reader for multiple indexes
&gt; &gt;&gt; 
&gt; &gt;&gt; 
&gt; &gt;&gt; hello all ,
&gt; &gt;&gt; 
&gt; &gt;&gt;        am merging more than one indexes to search a document , how do i
&gt; &gt;&gt; use
&gt; &gt;&gt; IndexReader here to open multiple indexes? (since IndexReader will open
&gt; &gt;&gt; one
&gt; &gt;&gt; directory at a time) could any1 please suggest me?
&gt; &gt;&gt; 
&gt; &gt;&gt; 
&gt; &gt;&gt; --
&gt; &gt;&gt; View this message in context: http://www.nabble.com/index-reader-for-
&gt; &gt;&gt; multiple-indexes-tp25716741p25716741.html
&gt; &gt;&gt; Sent from the Lucene - Java Users mailing list archive at Nabble.com.
&gt; &gt;&gt; 
&gt; &gt;&gt; 
&gt; &gt;&gt; ---------------------------------------------------------------------
&gt; &gt;&gt; To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
&gt; &gt;&gt; For additional commands, e-mail: java-user-help@lucene.apache.org
&gt; &gt; 
&gt; &gt; 
&gt; &gt; 
&gt; &gt; ---------------------------------------------------------------------
&gt; &gt; To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
&gt; &gt; For additional commands, e-mail: java-user-help@lucene.apache.org
&gt; &gt; 
&gt; &gt; 
&gt; &gt; 
&gt; 
&gt; -- 
&gt; View this message in context: http://www.nabble.com/index-reader-for-multiple-indexes-tp25716741p25726159.html
&gt; Sent from the Lucene - Java Users mailing list archive at Nabble.com.
&gt; 
&gt; 
&gt; ---------------------------------------------------------------------
&gt; To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
&gt; For additional commands, e-mail: java-user-help@lucene.apache.org
&gt; 

-- 
David Causse
Spotter
http://www.spotter.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org



</pre>
</div>
</content>
</entry>
<entry>
<title>Re: NearSpansUnordered payloads not returning all the time</title>
<author><name>Jason Rutherglen &lt;jason.rutherglen@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-java-user/200912.mbox/%3c85d3c3b60912090825k39da2c37q2d507a5dce21ade0@mail.gmail.com%3e"/>
<id>urn:uuid:%3c85d3c3b60912090825k39da2c37q2d507a5dce21ade0@mail-gmail-com%3e</id>
<updated>2009-12-09T16:25:18Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Right we're getting the spans, however it's just the payloads that are
missing, randomly...

On Wed, Dec 9, 2009 at 2:23 AM, Michael McCandless
&lt;lucene@mikemccandless.com&gt; wrote:
&gt; There was a thread a while back about how span queries don't enumerate
&gt; every possible span, but I can't remember if that included sometimes
&gt; missing payloads...
&gt;
&gt; Mike
&gt;
&gt; On Tue, Dec 8, 2009 at 7:34 PM, Jason Rutherglen
&gt; &lt;jason.rutherglen@gmail.com&gt; wrote:
&gt;&gt; Howdy,
&gt;&gt;
&gt;&gt; I am wondering if anyone has seen
&gt;&gt; NearSpansUnordered.getPayload() not return payloads that are
&gt;&gt; verifiably accessible via IR.termPositions? It's a bit confusing
&gt;&gt; because most of the time they're returned properly.
&gt;&gt;
&gt;&gt; I suspect the payload logic gets tripped up in
&gt;&gt; NearSpansUnordered. I'll put together a test case, however the
&gt;&gt; difficulty is that we're only seeing the issue with largish 800
&gt;&gt; MB indexes, which could make the test case a little crazy.
&gt;&gt;
&gt;&gt; Jason
&gt;&gt;
&gt;&gt; ---------------------------------------------------------------------
&gt;&gt; To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
&gt;&gt; For additional commands, e-mail: java-user-help@lucene.apache.org
&gt;&gt;
&gt;&gt;
&gt;
&gt; ---------------------------------------------------------------------
&gt; To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
&gt; For additional commands, e-mail: java-user-help@lucene.apache.org
&gt;
&gt;

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org



</pre>
</div>
</content>
</entry>
<entry>
<title>RE: Index file compatibility and a migration plan to lucene 3</title>
<author><name>&quot;Rob Staveley \(Tom\)&quot; &lt;rstaveley@seseit.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-java-user/200912.mbox/%3c01a501ca78ea$32168ee0$9643aca0$@com%3e"/>
<id>urn:uuid:%3c01a501ca78ea$32168ee0$9643aca0$@com%3e</id>
<updated>2009-12-09T16:10:56Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
&gt; Don't you have a playground to properly test your changes

Yes, I'll be doing a practice run in a DEV cluster. It is the practice run that I'm planning
at this point. 

Many thanks for your pointers, Danil. 

-----Original Message-----
From: Danil Å¢ORIN [mailto:torindan@gmail.com] 
Sent: 09 December 2009 16:03
To: java-user@lucene.apache.org
Subject: Re: Index file compatibility and a migration plan to lucene 3

There are a LOT of deprecated stuff in 2.9.1 (but it's still there)
and your code should run as it is
(however there are some changes in behavior, so read carefully CHANGES.txt)

In 3.0 this old stuff is removed.
Your production readers may not even start (which I guess is more
painful than 2 step transition)

The safest way is to upgrade to 2.9.1, fix ALL deprecation warnings,
and only then move on to 3.0.

Don't you have a playground to properly test your changes before
deploying to production ?

On Wed, Dec 9, 2009 at 17:55, Rob Staveley (Tom) &lt;rstaveley@seseit.com&gt; wrote:
&gt; COMPRESS is supported (only deprecated) in 2.9.1, so I'm expecting them to be supported
http://lucene.apache.org/java/2_9_1/api/all/org/apache/lucene/document/Field.Store.html#COMPRESS
&gt;
&gt; I guess I should expect optimize() to increase the size of the index as compressed fields
are expanded as it says in item 2 at http://lucene.apache.org/java/3_0_0/changes/Changes.html#3.0.0.changes_in_runtime_behavior
&gt;
&gt; That must mean that 3.0.0 can read 2.9.1 indexes, but I'm wondering if I shouldn't simply
upgrade the readers straight from 2.3.1 to 3.0.0 now with an immediate optimize handling the
conversion. Can I safely assume that 3.0.0 is able to read 2.3.1?
&gt;
&gt; Making code changes to the readers in production is tricky in my infrastructure and making
one transition rather than two is very desirable.
&gt;
&gt; -----Original Message-----
&gt; From: Danil Å¢ORIN [mailto:torindan@gmail.com]
&gt; Sent: 09 December 2009 15:15
&gt; To: java-user@lucene.apache.org
&gt; Subject: Re: Index file compatibility and a migration plan to lucene 3
&gt;
&gt; 2nd point can be simply archived by an optimize (which will read old
&gt; segments and will create a new one)
&gt; But I'm not sure how it handles compressed fields.
&gt;
&gt;
&gt; On Wed, Dec 9, 2009 at 16:50, Rob Staveley (Tom) &lt;rstaveley@seseit.com&gt; wrote:
&gt;&gt; Thanks, Danil. I think you've saved me a lot of time. Weiwei too - converting rather
than reindexing everything, which will save a lot of time.
&gt;&gt;
&gt;&gt; So, I should do this:
&gt;&gt;
&gt;&gt; 1. Convert readers to 2.9.1, which should be able to read any 2.x index including
the existing 2.3.1 indexes
&gt;&gt; 2. Convert writers to 2.9.1, using Weiwei's idea (converting the index with a 2.9.1
reader+writer conversion utility) to save some time.
&gt;&gt; 3. Have the writers push converted indexes to the readers using the existing production
infrastructure
&gt;&gt; 4. Like (9.) in my original plan. [Go through my index writers and index reader clients
and systematically purge all of the Field.Store.COMPRESS fields and migrate to an explicit
CompressionTools approach where applicable and no compression where applicable. During this
phase I'll expect to have CompressionTools-compressed fields coexisting with their Field.Store.COMPRESS
predecessors, where index reader client use of Field.Store.COMPRESS is in transit to the explicit
decompression approach.]
&gt;&gt; 5. Convert the readers to 3.0.0, which should be able to read 2.9.1, if there are
no compressed fields (??)
&gt;&gt; 6. Convert the writers to 3.0.0
&gt;&gt;
&gt;&gt;
&gt;&gt;
&gt;&gt; -----Original Message-----
&gt;&gt; From: Danil Å¢ORIN [mailto:torindan@gmail.com]
&gt;&gt; Sent: 09 December 2009 13:20
&gt;&gt; To: java-user@lucene.apache.org
&gt;&gt; Subject: Re: Index file compatibility and a migration plan to lucene 3
&gt;&gt;
&gt;&gt; You NEED to update your readers first, or else they will be unable to
&gt;&gt; read files created by newer version.
&gt;&gt; And trust me, there are changes in index format from 2.3 -&gt; 2.9
&gt;&gt;
&gt;&gt; On Wed, Dec 9, 2009 at 15:11, Weiwei Wang &lt;ww.wang.cs@gmail.com&gt; wrote:
&gt;&gt;&gt; Hi, Rob,
&gt;&gt;&gt; I read
&gt;&gt;&gt; http://wiki.apache.org/lucene-java/BackwardsCompatibility#File_Formats and
&gt;&gt;&gt; found no compatibility guarantee for IndexWriter between different version.
&gt;&gt;&gt;
&gt;&gt;&gt; You can run your idea as a test and see the output.
&gt;&gt;&gt; If it doesn't work, i suggest you convert your index to new version as I
&gt;&gt;&gt; said in the last post.
&gt;&gt;&gt;
&gt;&gt;&gt; You can develop a convert tool to do this job automatically(that what i have
&gt;&gt;&gt; done).
&gt;&gt;&gt;
&gt;&gt;&gt; If you do not have full access to the data center, you can read(readonly
&gt;&gt;&gt; mode is preferred) from the data center(through nfs or something like that)
&gt;&gt;&gt; and write to your local disk.
&gt;&gt;&gt;
&gt;&gt;&gt; When all converting is done, you can copy the new index to the data center
&gt;&gt;&gt; with the help of the administrator.
&gt;&gt;&gt;
&gt;&gt;&gt; On Wed, Dec 9, 2009 at 8:42 PM, Rob Staveley (Tom) &lt;rstaveley@seseit.com&gt;wrote:
&gt;&gt;&gt;
&gt;&gt;&gt;&gt; Thanks for the swift response, Weiwei.
&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt; In my deployment, my index readers are in a data centre and therefore more
&gt;&gt;&gt;&gt; difficult to upgrade than the writers. That's why I wanted to start with
the
&gt;&gt;&gt;&gt; writers rather than the readers. I realise that it looks the wrong way round
&gt;&gt;&gt;&gt; and http://wiki.apache.org/lucene-java/BackwardsCompatibility#File_Formatseffectively
says that changing the reader first is a better idea for most
&gt;&gt;&gt;&gt; situations, but I wanted to know if writer first would work for me for 2.3.1
&gt;&gt;&gt;&gt; -&gt; 3.0.0.
&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt; -----Original Message-----
&gt;&gt;&gt;&gt; From: Weiwei Wang [mailto:ww.wang.cs@gmail.com]
&gt;&gt;&gt;&gt; Sent: 09 December 2009 12:21
&gt;&gt;&gt;&gt; To: java-user@lucene.apache.org
&gt;&gt;&gt;&gt; Subject: Re: Index file compatibility and a migration plan to lucene 3
&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt; Iā€™ve finished a upgrade from 2.4.1 to 3.0.0
&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt; What I do is like this:
&gt;&gt;&gt;&gt; 1. Upgrade my user-defined analyzer, tokenizer and filter to 3.0.0
&gt;&gt;&gt;&gt; 2. Use a 3.0.0 IndexReader to read the old version index and then use a
&gt;&gt;&gt;&gt; 3.0.0 IndexWriter to write all the documents into a new index
&gt;&gt;&gt;&gt; 3. Update QueryPaser to 3.0.0
&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt; I've redeployed my system and it works fine now.
&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt; On Wed, Dec 9, 2009 at 8:13 PM, Rob Staveley (Tom) &lt;rstaveley@seseit.com
&gt;&gt;&gt;&gt; &gt;wrote:
&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt; &gt; I have Lucene 2.3.1 code and indexes deployed in production in a
&gt;&gt;&gt;&gt; &gt; distributed
&gt;&gt;&gt;&gt; &gt; system and would like to bring everything up to date with 3.0.0 via
&gt;&gt;&gt;&gt; 2.9.1.
&gt;&gt;&gt;&gt; &gt;
&gt;&gt;&gt;&gt; &gt; Here's my migration plan:
&gt;&gt;&gt;&gt; &gt;
&gt;&gt;&gt;&gt; &gt; 1. Add a index writer which generates a 2.9.1 "test" index
&gt;&gt;&gt;&gt; &gt; 2. Have that "test" index writer push that 2.9.1 "test" index into
&gt;&gt;&gt;&gt; &gt; production and see if the distributed 2.3.1 index readers can cope with
&gt;&gt;&gt;&gt; it
&gt;&gt;&gt;&gt; &gt; OK
&gt;&gt;&gt;&gt; &gt; 3. Upgrade my index writers to 2.9.1 (still using evil compressed fields)
&gt;&gt;&gt;&gt; -
&gt;&gt;&gt;&gt; &gt; we shall have 2.9.1 writers and 2.3.1 readers during this phase. See
if
&gt;&gt;&gt;&gt; it
&gt;&gt;&gt;&gt; &gt; works.
&gt;&gt;&gt;&gt; &gt; 4. Upgrade my index readers to 2.9.1 (still using evil compressed fields,
&gt;&gt;&gt;&gt; &gt; but with support for explicit use of CompressionTools for decompression,
&gt;&gt;&gt;&gt; &gt; where fields have been explicitly compressed with CompressionTools -
the
&gt;&gt;&gt;&gt; &gt; application will knows which need decompression)
&gt;&gt;&gt;&gt; &gt; 5. Add a CompressionTools to my "test" index writer, generating
&gt;&gt;&gt;&gt; explicitly
&gt;&gt;&gt;&gt; &gt; compressed fields in the 2.9.1 "test" index
&gt;&gt;&gt;&gt; &gt; 6. Test explicit decompression for relevant fields with CompressionTools
&gt;&gt;&gt;&gt; in
&gt;&gt;&gt;&gt; &gt; my 2.9.1 "test" index in my index readers
&gt;&gt;&gt;&gt; &gt; 7. Upgrade my "test" index writer to 3.0.0
&gt;&gt;&gt;&gt; &gt; 8. Have that "test" index writer push that 3.0.0 "test" index into
&gt;&gt;&gt;&gt; &gt; production and see if the distributed 2.9.1 index readers can cope with
&gt;&gt;&gt;&gt; it
&gt;&gt;&gt;&gt; &gt; OK
&gt;&gt;&gt;&gt; &gt; 9. Go through my index writers and index reader clients and
&gt;&gt;&gt;&gt; systematically
&gt;&gt;&gt;&gt; &gt; purge all of the Field.Store.COMPRESS fields and migrate to an explicit
&gt;&gt;&gt;&gt; &gt; CompressionTools approach where applicable and no compression where
&gt;&gt;&gt;&gt; &gt; applicable. During this phase I'll expect to have
&gt;&gt;&gt;&gt; &gt; CompressionTools-compressed fields coexisting with their
&gt;&gt;&gt;&gt; &gt; Field.Store.COMPRESS predecessors, where index reader client use of
&gt;&gt;&gt;&gt; &gt; Field.Store.COMPRESS is in transit to the explicit decompression
&gt;&gt;&gt;&gt; approach.
&gt;&gt;&gt;&gt; &gt; 10. Upgrade my index writers to 3.0.0
&gt;&gt;&gt;&gt; &gt; 11. Upgrade my index readers to 3.0.0
&gt;&gt;&gt;&gt; &gt;
&gt;&gt;&gt;&gt; &gt; I've simplified this a bit, because I shan't really be testing straight
&gt;&gt;&gt;&gt; off
&gt;&gt;&gt;&gt; &gt; in production(!) - I'll test the migration plan in a test cluster first;
&gt;&gt;&gt;&gt; &gt; but
&gt;&gt;&gt;&gt; &gt; this gives you the idea about the path.
&gt;&gt;&gt;&gt; &gt;
&gt;&gt;&gt;&gt; &gt; I wanted to know if I should expect problems with this plan. I'm
&gt;&gt;&gt;&gt; depending
&gt;&gt;&gt;&gt; &gt; on newer writers generating indexes for older readers and 3 is a major
&gt;&gt;&gt;&gt; &gt; number upgrade. It looks like I can get away with this in version 3,
but
&gt;&gt;&gt;&gt; &gt; that's by no means a guarantee according to
&gt;&gt;&gt;&gt; &gt; http://wiki.apache.org/lucene-java/BackwardsCompatibility#File_Formats
&gt;&gt;&gt;&gt; &gt;
&gt;&gt;&gt;&gt; &gt; Does this sound like a good plan?
&gt;&gt;&gt;&gt; &gt;
&gt;&gt;&gt;&gt; &gt;
&gt;&gt;&gt;&gt; &gt; ---------------------------------------------------------------------
&gt;&gt;&gt;&gt; &gt; To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
&gt;&gt;&gt;&gt; &gt; For additional commands, e-mail: java-user-help@lucene.apache.org
&gt;&gt;&gt;&gt; &gt;
&gt;&gt;&gt;&gt; &gt;
&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt; --
&gt;&gt;&gt;&gt; Weiwei Wang
&gt;&gt;&gt;&gt; Alex Wang
&gt;&gt;&gt;&gt; ēˇ‹å·¨å·¨
&gt;&gt;&gt;&gt; Room 403, Mengmin Wei Building
&gt;&gt;&gt;&gt; Computer Science Department
&gt;&gt;&gt;&gt; Gulou Campus of Nanjing University
&gt;&gt;&gt;&gt; Nanjing, P.R.China, 210093
&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt; Homepage: http://cs.nju.edu.cn/rl/weiweiwang
&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt; ---------------------------------------------------------------------
&gt;&gt;&gt;&gt; To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
&gt;&gt;&gt;&gt; For additional commands, e-mail: java-user-help@lucene.apache.org
&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt;
&gt;&gt;&gt;
&gt;&gt;&gt;
&gt;&gt;&gt; --
&gt;&gt;&gt; Weiwei Wang
&gt;&gt;&gt; Alex Wang
&gt;&gt;&gt; ēˇ‹å·¨å·¨
&gt;&gt;&gt; Room 403, Mengmin Wei Building
&gt;&gt;&gt; Computer Science Department
&gt;&gt;&gt; Gulou Campus of Nanjing University
&gt;&gt;&gt; Nanjing, P.R.China, 210093
&gt;&gt;&gt;
&gt;&gt;&gt; Homepage: http://cs.nju.edu.cn/rl/weiweiwang
&gt;&gt;&gt;
&gt;&gt;
&gt;&gt; ---------------------------------------------------------------------
&gt;&gt; To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
&gt;&gt; For additional commands, e-mail: java-user-help@lucene.apache.org
&gt;&gt;
&gt;&gt;
&gt;&gt; ---------------------------------------------------------------------
&gt;&gt; To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
&gt;&gt; For additional commands, e-mail: java-user-help@lucene.apache.org
&gt;&gt;
&gt;&gt;
&gt;
&gt; ---------------------------------------------------------------------
&gt; To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
&gt; For additional commands, e-mail: java-user-help@lucene.apache.org
&gt;
&gt;
&gt; ---------------------------------------------------------------------
&gt; To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
&gt; For additional commands, e-mail: java-user-help@lucene.apache.org
&gt;
&gt;

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org



</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Index file compatibility and a migration plan to lucene 3</title>
<author><name>=?UTF-8?B?RGFuaWwgxaJPUklO?= &lt;torindan@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-java-user/200912.mbox/%3c2ffb6d060912090802s5796bf34ja2f419a2faa6f906@mail.gmail.com%3e"/>
<id>urn:uuid:%3c2ffb6d060912090802s5796bf34ja2f419a2faa6f906@mail-gmail-com%3e</id>
<updated>2009-12-09T16:02:57Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
There are a LOT of deprecated stuff in 2.9.1 (but it's still there)
and your code should run as it is
(however there are some changes in behavior, so read carefully CHANGES.txt)

In 3.0 this old stuff is removed.
Your production readers may not even start (which I guess is more
painful than 2 step transition)

The safest way is to upgrade to 2.9.1, fix ALL deprecation warnings,
and only then move on to 3.0.

Don't you have a playground to properly test your changes before
deploying to production ?

On Wed, Dec 9, 2009 at 17:55, Rob Staveley (Tom) &lt;rstaveley@seseit.com&gt; wrote:
&gt; COMPRESS is supported (only deprecated) in 2.9.1, so I'm expecting them to be supported
http://lucene.apache.org/java/2_9_1/api/all/org/apache/lucene/document/Field.Store.html#COMPRESS
&gt;
&gt; I guess I should expect optimize() to increase the size of the index as compressed fields
are expanded as it says in item 2 at http://lucene.apache.org/java/3_0_0/changes/Changes.html#3.0.0.changes_in_runtime_behavior
&gt;
&gt; That must mean that 3.0.0 can read 2.9.1 indexes, but I'm wondering if I shouldn't simply
upgrade the readers straight from 2.3.1 to 3.0.0 now with an immediate optimize handling the
conversion. Can I safely assume that 3.0.0 is able to read 2.3.1?
&gt;
&gt; Making code changes to the readers in production is tricky in my infrastructure and making
one transition rather than two is very desirable.
&gt;
&gt; -----Original Message-----
&gt; From: Danil Å¢ORIN [mailto:torindan@gmail.com]
&gt; Sent: 09 December 2009 15:15
&gt; To: java-user@lucene.apache.org
&gt; Subject: Re: Index file compatibility and a migration plan to lucene 3
&gt;
&gt; 2nd point can be simply archived by an optimize (which will read old
&gt; segments and will create a new one)
&gt; But I'm not sure how it handles compressed fields.
&gt;
&gt;
&gt; On Wed, Dec 9, 2009 at 16:50, Rob Staveley (Tom) &lt;rstaveley@seseit.com&gt; wrote:
&gt;&gt; Thanks, Danil. I think you've saved me a lot of time. Weiwei too - converting rather
than reindexing everything, which will save a lot of time.
&gt;&gt;
&gt;&gt; So, I should do this:
&gt;&gt;
&gt;&gt; 1. Convert readers to 2.9.1, which should be able to read any 2.x index including
the existing 2.3.1 indexes
&gt;&gt; 2. Convert writers to 2.9.1, using Weiwei's idea (converting the index with a 2.9.1
reader+writer conversion utility) to save some time.
&gt;&gt; 3. Have the writers push converted indexes to the readers using the existing production
infrastructure
&gt;&gt; 4. Like (9.) in my original plan. [Go through my index writers and index reader clients
and systematically purge all of the Field.Store.COMPRESS fields and migrate to an explicit
CompressionTools approach where applicable and no compression where applicable. During this
phase I'll expect to have CompressionTools-compressed fields coexisting with their Field.Store.COMPRESS
predecessors, where index reader client use of Field.Store.COMPRESS is in transit to the explicit
decompression approach.]
&gt;&gt; 5. Convert the readers to 3.0.0, which should be able to read 2.9.1, if there are
no compressed fields (??)
&gt;&gt; 6. Convert the writers to 3.0.0
&gt;&gt;
&gt;&gt;
&gt;&gt;
&gt;&gt; -----Original Message-----
&gt;&gt; From: Danil Å¢ORIN [mailto:torindan@gmail.com]
&gt;&gt; Sent: 09 December 2009 13:20
&gt;&gt; To: java-user@lucene.apache.org
&gt;&gt; Subject: Re: Index file compatibility and a migration plan to lucene 3
&gt;&gt;
&gt;&gt; You NEED to update your readers first, or else they will be unable to
&gt;&gt; read files created by newer version.
&gt;&gt; And trust me, there are changes in index format from 2.3 -&gt; 2.9
&gt;&gt;
&gt;&gt; On Wed, Dec 9, 2009 at 15:11, Weiwei Wang &lt;ww.wang.cs@gmail.com&gt; wrote:
&gt;&gt;&gt; Hi, Rob,
&gt;&gt;&gt; I read
&gt;&gt;&gt; http://wiki.apache.org/lucene-java/BackwardsCompatibility#File_Formats and
&gt;&gt;&gt; found no compatibility guarantee for IndexWriter between different version.
&gt;&gt;&gt;
&gt;&gt;&gt; You can run your idea as a test and see the output.
&gt;&gt;&gt; If it doesn't work, i suggest you convert your index to new version as I
&gt;&gt;&gt; said in the last post.
&gt;&gt;&gt;
&gt;&gt;&gt; You can develop a convert tool to do this job automatically(that what i have
&gt;&gt;&gt; done).
&gt;&gt;&gt;
&gt;&gt;&gt; If you do not have full access to the data center, you can read(readonly
&gt;&gt;&gt; mode is preferred) from the data center(through nfs or something like that)
&gt;&gt;&gt; and write to your local disk.
&gt;&gt;&gt;
&gt;&gt;&gt; When all converting is done, you can copy the new index to the data center
&gt;&gt;&gt; with the help of the administrator.
&gt;&gt;&gt;
&gt;&gt;&gt; On Wed, Dec 9, 2009 at 8:42 PM, Rob Staveley (Tom) &lt;rstaveley@seseit.com&gt;wrote:
&gt;&gt;&gt;
&gt;&gt;&gt;&gt; Thanks for the swift response, Weiwei.
&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt; In my deployment, my index readers are in a data centre and therefore more
&gt;&gt;&gt;&gt; difficult to upgrade than the writers. That's why I wanted to start with
the
&gt;&gt;&gt;&gt; writers rather than the readers. I realise that it looks the wrong way round
&gt;&gt;&gt;&gt; and http://wiki.apache.org/lucene-java/BackwardsCompatibility#File_Formatseffectively
says that changing the reader first is a better idea for most
&gt;&gt;&gt;&gt; situations, but I wanted to know if writer first would work for me for 2.3.1
&gt;&gt;&gt;&gt; -&gt; 3.0.0.
&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt; -----Original Message-----
&gt;&gt;&gt;&gt; From: Weiwei Wang [mailto:ww.wang.cs@gmail.com]
&gt;&gt;&gt;&gt; Sent: 09 December 2009 12:21
&gt;&gt;&gt;&gt; To: java-user@lucene.apache.org
&gt;&gt;&gt;&gt; Subject: Re: Index file compatibility and a migration plan to lucene 3
&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt; Iā€™ve finished a upgrade from 2.4.1 to 3.0.0
&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt; What I do is like this:
&gt;&gt;&gt;&gt; 1. Upgrade my user-defined analyzer, tokenizer and filter to 3.0.0
&gt;&gt;&gt;&gt; 2. Use a 3.0.0 IndexReader to read the old version index and then use a
&gt;&gt;&gt;&gt; 3.0.0 IndexWriter to write all the documents into a new index
&gt;&gt;&gt;&gt; 3. Update QueryPaser to 3.0.0
&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt; I've redeployed my system and it works fine now.
&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt; On Wed, Dec 9, 2009 at 8:13 PM, Rob Staveley (Tom) &lt;rstaveley@seseit.com
&gt;&gt;&gt;&gt; &gt;wrote:
&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt; &gt; I have Lucene 2.3.1 code and indexes deployed in production in a
&gt;&gt;&gt;&gt; &gt; distributed
&gt;&gt;&gt;&gt; &gt; system and would like to bring everything up to date with 3.0.0 via
&gt;&gt;&gt;&gt; 2.9.1.
&gt;&gt;&gt;&gt; &gt;
&gt;&gt;&gt;&gt; &gt; Here's my migration plan:
&gt;&gt;&gt;&gt; &gt;
&gt;&gt;&gt;&gt; &gt; 1. Add a index writer which generates a 2.9.1 "test" index
&gt;&gt;&gt;&gt; &gt; 2. Have that "test" index writer push that 2.9.1 "test" index into
&gt;&gt;&gt;&gt; &gt; production and see if the distributed 2.3.1 index readers can cope with
&gt;&gt;&gt;&gt; it
&gt;&gt;&gt;&gt; &gt; OK
&gt;&gt;&gt;&gt; &gt; 3. Upgrade my index writers to 2.9.1 (still using evil compressed fields)
&gt;&gt;&gt;&gt; -
&gt;&gt;&gt;&gt; &gt; we shall have 2.9.1 writers and 2.3.1 readers during this phase. See
if
&gt;&gt;&gt;&gt; it
&gt;&gt;&gt;&gt; &gt; works.
&gt;&gt;&gt;&gt; &gt; 4. Upgrade my index readers to 2.9.1 (still using evil compressed fields,
&gt;&gt;&gt;&gt; &gt; but with support for explicit use of CompressionTools for decompression,
&gt;&gt;&gt;&gt; &gt; where fields have been explicitly compressed with CompressionTools -
the
&gt;&gt;&gt;&gt; &gt; application will knows which need decompression)
&gt;&gt;&gt;&gt; &gt; 5. Add a CompressionTools to my "test" index writer, generating
&gt;&gt;&gt;&gt; explicitly
&gt;&gt;&gt;&gt; &gt; compressed fields in the 2.9.1 "test" index
&gt;&gt;&gt;&gt; &gt; 6. Test explicit decompression for relevant fields with CompressionTools
&gt;&gt;&gt;&gt; in
&gt;&gt;&gt;&gt; &gt; my 2.9.1 "test" index in my index readers
&gt;&gt;&gt;&gt; &gt; 7. Upgrade my "test" index writer to 3.0.0
&gt;&gt;&gt;&gt; &gt; 8. Have that "test" index writer push that 3.0.0 "test" index into
&gt;&gt;&gt;&gt; &gt; production and see if the distributed 2.9.1 index readers can cope with
&gt;&gt;&gt;&gt; it
&gt;&gt;&gt;&gt; &gt; OK
&gt;&gt;&gt;&gt; &gt; 9. Go through my index writers and index reader clients and
&gt;&gt;&gt;&gt; systematically
&gt;&gt;&gt;&gt; &gt; purge all of the Field.Store.COMPRESS fields and migrate to an explicit
&gt;&gt;&gt;&gt; &gt; CompressionTools approach where applicable and no compression where
&gt;&gt;&gt;&gt; &gt; applicable. During this phase I'll expect to have
&gt;&gt;&gt;&gt; &gt; CompressionTools-compressed fields coexisting with their
&gt;&gt;&gt;&gt; &gt; Field.Store.COMPRESS predecessors, where index reader client use of
&gt;&gt;&gt;&gt; &gt; Field.Store.COMPRESS is in transit to the explicit decompression
&gt;&gt;&gt;&gt; approach.
&gt;&gt;&gt;&gt; &gt; 10. Upgrade my index writers to 3.0.0
&gt;&gt;&gt;&gt; &gt; 11. Upgrade my index readers to 3.0.0
&gt;&gt;&gt;&gt; &gt;
&gt;&gt;&gt;&gt; &gt; I've simplified this a bit, because I shan't really be testing straight
&gt;&gt;&gt;&gt; off
&gt;&gt;&gt;&gt; &gt; in production(!) - I'll test the migration plan in a test cluster first;
&gt;&gt;&gt;&gt; &gt; but
&gt;&gt;&gt;&gt; &gt; this gives you the idea about the path.
&gt;&gt;&gt;&gt; &gt;
&gt;&gt;&gt;&gt; &gt; I wanted to know if I should expect problems with this plan. I'm
&gt;&gt;&gt;&gt; depending
&gt;&gt;&gt;&gt; &gt; on newer writers generating indexes for older readers and 3 is a major
&gt;&gt;&gt;&gt; &gt; number upgrade. It looks like I can get away with this in version 3,
but
&gt;&gt;&gt;&gt; &gt; that's by no means a guarantee according to
&gt;&gt;&gt;&gt; &gt; http://wiki.apache.org/lucene-java/BackwardsCompatibility#File_Formats
&gt;&gt;&gt;&gt; &gt;
&gt;&gt;&gt;&gt; &gt; Does this sound like a good plan?
&gt;&gt;&gt;&gt; &gt;
&gt;&gt;&gt;&gt; &gt;
&gt;&gt;&gt;&gt; &gt; ---------------------------------------------------------------------
&gt;&gt;&gt;&gt; &gt; To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
&gt;&gt;&gt;&gt; &gt; For additional commands, e-mail: java-user-help@lucene.apache.org
&gt;&gt;&gt;&gt; &gt;
&gt;&gt;&gt;&gt; &gt;
&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt; --
&gt;&gt;&gt;&gt; Weiwei Wang
&gt;&gt;&gt;&gt; Alex Wang
&gt;&gt;&gt;&gt; ēˇ‹å·¨å·¨
&gt;&gt;&gt;&gt; Room 403, Mengmin Wei Building
&gt;&gt;&gt;&gt; Computer Science Department
&gt;&gt;&gt;&gt; Gulou Campus of Nanjing University
&gt;&gt;&gt;&gt; Nanjing, P.R.China, 210093
&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt; Homepage: http://cs.nju.edu.cn/rl/weiweiwang
&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt; ---------------------------------------------------------------------
&gt;&gt;&gt;&gt; To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
&gt;&gt;&gt;&gt; For additional commands, e-mail: java-user-help@lucene.apache.org
&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt;
&gt;&gt;&gt;
&gt;&gt;&gt;
&gt;&gt;&gt; --
&gt;&gt;&gt; Weiwei Wang
&gt;&gt;&gt; Alex Wang
&gt;&gt;&gt; ēˇ‹å·¨å·¨
&gt;&gt;&gt; Room 403, Mengmin Wei Building
&gt;&gt;&gt; Computer Science Department
&gt;&gt;&gt; Gulou Campus of Nanjing University
&gt;&gt;&gt; Nanjing, P.R.China, 210093
&gt;&gt;&gt;
&gt;&gt;&gt; Homepage: http://cs.nju.edu.cn/rl/weiweiwang
&gt;&gt;&gt;
&gt;&gt;
&gt;&gt; ---------------------------------------------------------------------
&gt;&gt; To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
&gt;&gt; For additional commands, e-mail: java-user-help@lucene.apache.org
&gt;&gt;
&gt;&gt;
&gt;&gt; ---------------------------------------------------------------------
&gt;&gt; To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
&gt;&gt; For additional commands, e-mail: java-user-help@lucene.apache.org
&gt;&gt;
&gt;&gt;
&gt;
&gt; ---------------------------------------------------------------------
&gt; To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
&gt; For additional commands, e-mail: java-user-help@lucene.apache.org
&gt;
&gt;
&gt; ---------------------------------------------------------------------
&gt; To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
&gt; For additional commands, e-mail: java-user-help@lucene.apache.org
&gt;
&gt;

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org



</pre>
</div>
</content>
</entry>
<entry>
<title>RE: Index file compatibility and a migration plan to lucene 3</title>
<author><name>&quot;Rob Staveley \(Tom\)&quot; &lt;rstaveley@seseit.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-java-user/200912.mbox/%3c018601ca78e7$fb6243a0$f226cae0$@com%3e"/>
<id>urn:uuid:%3c018601ca78e7$fb6243a0$f226cae0$@com%3e</id>
<updated>2009-12-09T15:55:05Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
COMPRESS is supported (only deprecated) in 2.9.1, so I'm expecting them to be supported http://lucene.apache.org/java/2_9_1/api/all/org/apache/lucene/document/Field.Store.html#COMPRESS

I guess I should expect optimize() to increase the size of the index as compressed fields
are expanded as it says in item 2 at http://lucene.apache.org/java/3_0_0/changes/Changes.html#3.0.0.changes_in_runtime_behavior


That must mean that 3.0.0 can read 2.9.1 indexes, but I'm wondering if I shouldn't simply
upgrade the readers straight from 2.3.1 to 3.0.0 now with an immediate optimize handling the
conversion. Can I safely assume that 3.0.0 is able to read 2.3.1?

Making code changes to the readers in production is tricky in my infrastructure and making
one transition rather than two is very desirable.

-----Original Message-----
From: Danil Å¢ORIN [mailto:torindan@gmail.com] 
Sent: 09 December 2009 15:15
To: java-user@lucene.apache.org
Subject: Re: Index file compatibility and a migration plan to lucene 3

2nd point can be simply archived by an optimize (which will read old
segments and will create a new one)
But I'm not sure how it handles compressed fields.


On Wed, Dec 9, 2009 at 16:50, Rob Staveley (Tom) &lt;rstaveley@seseit.com&gt; wrote:
&gt; Thanks, Danil. I think you've saved me a lot of time. Weiwei too - converting rather
than reindexing everything, which will save a lot of time.
&gt;
&gt; So, I should do this:
&gt;
&gt; 1. Convert readers to 2.9.1, which should be able to read any 2.x index including the
existing 2.3.1 indexes
&gt; 2. Convert writers to 2.9.1, using Weiwei's idea (converting the index with a 2.9.1 reader+writer
conversion utility) to save some time.
&gt; 3. Have the writers push converted indexes to the readers using the existing production
infrastructure
&gt; 4. Like (9.) in my original plan. [Go through my index writers and index reader clients
and systematically purge all of the Field.Store.COMPRESS fields and migrate to an explicit
CompressionTools approach where applicable and no compression where applicable. During this
phase I'll expect to have CompressionTools-compressed fields coexisting with their Field.Store.COMPRESS
predecessors, where index reader client use of Field.Store.COMPRESS is in transit to the explicit
decompression approach.]
&gt; 5. Convert the readers to 3.0.0, which should be able to read 2.9.1, if there are no
compressed fields (??)
&gt; 6. Convert the writers to 3.0.0
&gt;
&gt;
&gt;
&gt; -----Original Message-----
&gt; From: Danil Å¢ORIN [mailto:torindan@gmail.com]
&gt; Sent: 09 December 2009 13:20
&gt; To: java-user@lucene.apache.org
&gt; Subject: Re: Index file compatibility and a migration plan to lucene 3
&gt;
&gt; You NEED to update your readers first, or else they will be unable to
&gt; read files created by newer version.
&gt; And trust me, there are changes in index format from 2.3 -&gt; 2.9
&gt;
&gt; On Wed, Dec 9, 2009 at 15:11, Weiwei Wang &lt;ww.wang.cs@gmail.com&gt; wrote:
&gt;&gt; Hi, Rob,
&gt;&gt; I read
&gt;&gt; http://wiki.apache.org/lucene-java/BackwardsCompatibility#File_Formats and
&gt;&gt; found no compatibility guarantee for IndexWriter between different version.
&gt;&gt;
&gt;&gt; You can run your idea as a test and see the output.
&gt;&gt; If it doesn't work, i suggest you convert your index to new version as I
&gt;&gt; said in the last post.
&gt;&gt;
&gt;&gt; You can develop a convert tool to do this job automatically(that what i have
&gt;&gt; done).
&gt;&gt;
&gt;&gt; If you do not have full access to the data center, you can read(readonly
&gt;&gt; mode is preferred) from the data center(through nfs or something like that)
&gt;&gt; and write to your local disk.
&gt;&gt;
&gt;&gt; When all converting is done, you can copy the new index to the data center
&gt;&gt; with the help of the administrator.
&gt;&gt;
&gt;&gt; On Wed, Dec 9, 2009 at 8:42 PM, Rob Staveley (Tom) &lt;rstaveley@seseit.com&gt;wrote:
&gt;&gt;
&gt;&gt;&gt; Thanks for the swift response, Weiwei.
&gt;&gt;&gt;
&gt;&gt;&gt; In my deployment, my index readers are in a data centre and therefore more
&gt;&gt;&gt; difficult to upgrade than the writers. That's why I wanted to start with the
&gt;&gt;&gt; writers rather than the readers. I realise that it looks the wrong way round
&gt;&gt;&gt; and http://wiki.apache.org/lucene-java/BackwardsCompatibility#File_Formatseffectively
says that changing the reader first is a better idea for most
&gt;&gt;&gt; situations, but I wanted to know if writer first would work for me for 2.3.1
&gt;&gt;&gt; -&gt; 3.0.0.
&gt;&gt;&gt;
&gt;&gt;&gt; -----Original Message-----
&gt;&gt;&gt; From: Weiwei Wang [mailto:ww.wang.cs@gmail.com]
&gt;&gt;&gt; Sent: 09 December 2009 12:21
&gt;&gt;&gt; To: java-user@lucene.apache.org
&gt;&gt;&gt; Subject: Re: Index file compatibility and a migration plan to lucene 3
&gt;&gt;&gt;
&gt;&gt;&gt; Iā€™ve finished a upgrade from 2.4.1 to 3.0.0
&gt;&gt;&gt;
&gt;&gt;&gt; What I do is like this:
&gt;&gt;&gt; 1. Upgrade my user-defined analyzer, tokenizer and filter to 3.0.0
&gt;&gt;&gt; 2. Use a 3.0.0 IndexReader to read the old version index and then use a
&gt;&gt;&gt; 3.0.0 IndexWriter to write all the documents into a new index
&gt;&gt;&gt; 3. Update QueryPaser to 3.0.0
&gt;&gt;&gt;
&gt;&gt;&gt; I've redeployed my system and it works fine now.
&gt;&gt;&gt;
&gt;&gt;&gt;
&gt;&gt;&gt; On Wed, Dec 9, 2009 at 8:13 PM, Rob Staveley (Tom) &lt;rstaveley@seseit.com
&gt;&gt;&gt; &gt;wrote:
&gt;&gt;&gt;
&gt;&gt;&gt; &gt; I have Lucene 2.3.1 code and indexes deployed in production in a
&gt;&gt;&gt; &gt; distributed
&gt;&gt;&gt; &gt; system and would like to bring everything up to date with 3.0.0 via
&gt;&gt;&gt; 2.9.1.
&gt;&gt;&gt; &gt;
&gt;&gt;&gt; &gt; Here's my migration plan:
&gt;&gt;&gt; &gt;
&gt;&gt;&gt; &gt; 1. Add a index writer which generates a 2.9.1 "test" index
&gt;&gt;&gt; &gt; 2. Have that "test" index writer push that 2.9.1 "test" index into
&gt;&gt;&gt; &gt; production and see if the distributed 2.3.1 index readers can cope with
&gt;&gt;&gt; it
&gt;&gt;&gt; &gt; OK
&gt;&gt;&gt; &gt; 3. Upgrade my index writers to 2.9.1 (still using evil compressed fields)
&gt;&gt;&gt; -
&gt;&gt;&gt; &gt; we shall have 2.9.1 writers and 2.3.1 readers during this phase. See if
&gt;&gt;&gt; it
&gt;&gt;&gt; &gt; works.
&gt;&gt;&gt; &gt; 4. Upgrade my index readers to 2.9.1 (still using evil compressed fields,
&gt;&gt;&gt; &gt; but with support for explicit use of CompressionTools for decompression,
&gt;&gt;&gt; &gt; where fields have been explicitly compressed with CompressionTools - the
&gt;&gt;&gt; &gt; application will knows which need decompression)
&gt;&gt;&gt; &gt; 5. Add a CompressionTools to my "test" index writer, generating
&gt;&gt;&gt; explicitly
&gt;&gt;&gt; &gt; compressed fields in the 2.9.1 "test" index
&gt;&gt;&gt; &gt; 6. Test explicit decompression for relevant fields with CompressionTools
&gt;&gt;&gt; in
&gt;&gt;&gt; &gt; my 2.9.1 "test" index in my index readers
&gt;&gt;&gt; &gt; 7. Upgrade my "test" index writer to 3.0.0
&gt;&gt;&gt; &gt; 8. Have that "test" index writer push that 3.0.0 "test" index into
&gt;&gt;&gt; &gt; production and see if the distributed 2.9.1 index readers can cope with
&gt;&gt;&gt; it
&gt;&gt;&gt; &gt; OK
&gt;&gt;&gt; &gt; 9. Go through my index writers and index reader clients and
&gt;&gt;&gt; systematically
&gt;&gt;&gt; &gt; purge all of the Field.Store.COMPRESS fields and migrate to an explicit
&gt;&gt;&gt; &gt; CompressionTools approach where applicable and no compression where
&gt;&gt;&gt; &gt; applicable. During this phase I'll expect to have
&gt;&gt;&gt; &gt; CompressionTools-compressed fields coexisting with their
&gt;&gt;&gt; &gt; Field.Store.COMPRESS predecessors, where index reader client use of
&gt;&gt;&gt; &gt; Field.Store.COMPRESS is in transit to the explicit decompression
&gt;&gt;&gt; approach.
&gt;&gt;&gt; &gt; 10. Upgrade my index writers to 3.0.0
&gt;&gt;&gt; &gt; 11. Upgrade my index readers to 3.0.0
&gt;&gt;&gt; &gt;
&gt;&gt;&gt; &gt; I've simplified this a bit, because I shan't really be testing straight
&gt;&gt;&gt; off
&gt;&gt;&gt; &gt; in production(!) - I'll test the migration plan in a test cluster first;
&gt;&gt;&gt; &gt; but
&gt;&gt;&gt; &gt; this gives you the idea about the path.
&gt;&gt;&gt; &gt;
&gt;&gt;&gt; &gt; I wanted to know if I should expect problems with this plan. I'm
&gt;&gt;&gt; depending
&gt;&gt;&gt; &gt; on newer writers generating indexes for older readers and 3 is a major
&gt;&gt;&gt; &gt; number upgrade. It looks like I can get away with this in version 3, but
&gt;&gt;&gt; &gt; that's by no means a guarantee according to
&gt;&gt;&gt; &gt; http://wiki.apache.org/lucene-java/BackwardsCompatibility#File_Formats
&gt;&gt;&gt; &gt;
&gt;&gt;&gt; &gt; Does this sound like a good plan?
&gt;&gt;&gt; &gt;
&gt;&gt;&gt; &gt;
&gt;&gt;&gt; &gt; ---------------------------------------------------------------------
&gt;&gt;&gt; &gt; To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
&gt;&gt;&gt; &gt; For additional commands, e-mail: java-user-help@lucene.apache.org
&gt;&gt;&gt; &gt;
&gt;&gt;&gt; &gt;
&gt;&gt;&gt;
&gt;&gt;&gt;
&gt;&gt;&gt; --
&gt;&gt;&gt; Weiwei Wang
&gt;&gt;&gt; Alex Wang
&gt;&gt;&gt; ēˇ‹å·¨å·¨
&gt;&gt;&gt; Room 403, Mengmin Wei Building
&gt;&gt;&gt; Computer Science Department
&gt;&gt;&gt; Gulou Campus of Nanjing University
&gt;&gt;&gt; Nanjing, P.R.China, 210093
&gt;&gt;&gt;
&gt;&gt;&gt; Homepage: http://cs.nju.edu.cn/rl/weiweiwang
&gt;&gt;&gt;
&gt;&gt;&gt;
&gt;&gt;&gt; ---------------------------------------------------------------------
&gt;&gt;&gt; To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
&gt;&gt;&gt; For additional commands, e-mail: java-user-help@lucene.apache.org
&gt;&gt;&gt;
&gt;&gt;&gt;
&gt;&gt;
&gt;&gt;
&gt;&gt; --
&gt;&gt; Weiwei Wang
&gt;&gt; Alex Wang
&gt;&gt; ēˇ‹å·¨å·¨
&gt;&gt; Room 403, Mengmin Wei Building
&gt;&gt; Computer Science Department
&gt;&gt; Gulou Campus of Nanjing University
&gt;&gt; Nanjing, P.R.China, 210093
&gt;&gt;
&gt;&gt; Homepage: http://cs.nju.edu.cn/rl/weiweiwang
&gt;&gt;
&gt;
&gt; ---------------------------------------------------------------------
&gt; To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
&gt; For additional commands, e-mail: java-user-help@lucene.apache.org
&gt;
&gt;
&gt; ---------------------------------------------------------------------
&gt; To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
&gt; For additional commands, e-mail: java-user-help@lucene.apache.org
&gt;
&gt;

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org



</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Index file compatibility and a migration plan to lucene 3</title>
<author><name>=?UTF-8?B?RGFuaWwgxaJPUklO?= &lt;torindan@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-java-user/200912.mbox/%3c2ffb6d060912090715w72e97a85udecf34885e6a4b9@mail.gmail.com%3e"/>
<id>urn:uuid:%3c2ffb6d060912090715w72e97a85udecf34885e6a4b9@mail-gmail-com%3e</id>
<updated>2009-12-09T15:15:25Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
2nd point can be simply archived by an optimize (which will read old
segments and will create a new one)
But I'm not sure how it handles compressed fields.


On Wed, Dec 9, 2009 at 16:50, Rob Staveley (Tom) &lt;rstaveley@seseit.com&gt; wrote:
&gt; Thanks, Danil. I think you've saved me a lot of time. Weiwei too - converting rather
than reindexing everything, which will save a lot of time.
&gt;
&gt; So, I should do this:
&gt;
&gt; 1. Convert readers to 2.9.1, which should be able to read any 2.x index including the
existing 2.3.1 indexes
&gt; 2. Convert writers to 2.9.1, using Weiwei's idea (converting the index with a 2.9.1 reader+writer
conversion utility) to save some time.
&gt; 3. Have the writers push converted indexes to the readers using the existing production
infrastructure
&gt; 4. Like (9.) in my original plan. [Go through my index writers and index reader clients
and systematically purge all of the Field.Store.COMPRESS fields and migrate to an explicit
CompressionTools approach where applicable and no compression where applicable. During this
phase I'll expect to have CompressionTools-compressed fields coexisting with their Field.Store.COMPRESS
predecessors, where index reader client use of Field.Store.COMPRESS is in transit to the explicit
decompression approach.]
&gt; 5. Convert the readers to 3.0.0, which should be able to read 2.9.1, if there are no
compressed fields (??)
&gt; 6. Convert the writers to 3.0.0
&gt;
&gt;
&gt;
&gt; -----Original Message-----
&gt; From: Danil Å¢ORIN [mailto:torindan@gmail.com]
&gt; Sent: 09 December 2009 13:20
&gt; To: java-user@lucene.apache.org
&gt; Subject: Re: Index file compatibility and a migration plan to lucene 3
&gt;
&gt; You NEED to update your readers first, or else they will be unable to
&gt; read files created by newer version.
&gt; And trust me, there are changes in index format from 2.3 -&gt; 2.9
&gt;
&gt; On Wed, Dec 9, 2009 at 15:11, Weiwei Wang &lt;ww.wang.cs@gmail.com&gt; wrote:
&gt;&gt; Hi, Rob,
&gt;&gt; I read
&gt;&gt; http://wiki.apache.org/lucene-java/BackwardsCompatibility#File_Formats and
&gt;&gt; found no compatibility guarantee for IndexWriter between different version.
&gt;&gt;
&gt;&gt; You can run your idea as a test and see the output.
&gt;&gt; If it doesn't work, i suggest you convert your index to new version as I
&gt;&gt; said in the last post.
&gt;&gt;
&gt;&gt; You can develop a convert tool to do this job automatically(that what i have
&gt;&gt; done).
&gt;&gt;
&gt;&gt; If you do not have full access to the data center, you can read(readonly
&gt;&gt; mode is preferred) from the data center(through nfs or something like that)
&gt;&gt; and write to your local disk.
&gt;&gt;
&gt;&gt; When all converting is done, you can copy the new index to the data center
&gt;&gt; with the help of the administrator.
&gt;&gt;
&gt;&gt; On Wed, Dec 9, 2009 at 8:42 PM, Rob Staveley (Tom) &lt;rstaveley@seseit.com&gt;wrote:
&gt;&gt;
&gt;&gt;&gt; Thanks for the swift response, Weiwei.
&gt;&gt;&gt;
&gt;&gt;&gt; In my deployment, my index readers are in a data centre and therefore more
&gt;&gt;&gt; difficult to upgrade than the writers. That's why I wanted to start with the
&gt;&gt;&gt; writers rather than the readers. I realise that it looks the wrong way round
&gt;&gt;&gt; and http://wiki.apache.org/lucene-java/BackwardsCompatibility#File_Formatseffectively
says that changing the reader first is a better idea for most
&gt;&gt;&gt; situations, but I wanted to know if writer first would work for me for 2.3.1
&gt;&gt;&gt; -&gt; 3.0.0.
&gt;&gt;&gt;
&gt;&gt;&gt; -----Original Message-----
&gt;&gt;&gt; From: Weiwei Wang [mailto:ww.wang.cs@gmail.com]
&gt;&gt;&gt; Sent: 09 December 2009 12:21
&gt;&gt;&gt; To: java-user@lucene.apache.org
&gt;&gt;&gt; Subject: Re: Index file compatibility and a migration plan to lucene 3
&gt;&gt;&gt;
&gt;&gt;&gt; Iā€™ve finished a upgrade from 2.4.1 to 3.0.0
&gt;&gt;&gt;
&gt;&gt;&gt; What I do is like this:
&gt;&gt;&gt; 1. Upgrade my user-defined analyzer, tokenizer and filter to 3.0.0
&gt;&gt;&gt; 2. Use a 3.0.0 IndexReader to read the old version index and then use a
&gt;&gt;&gt; 3.0.0 IndexWriter to write all the documents into a new index
&gt;&gt;&gt; 3. Update QueryPaser to 3.0.0
&gt;&gt;&gt;
&gt;&gt;&gt; I've redeployed my system and it works fine now.
&gt;&gt;&gt;
&gt;&gt;&gt;
&gt;&gt;&gt; On Wed, Dec 9, 2009 at 8:13 PM, Rob Staveley (Tom) &lt;rstaveley@seseit.com
&gt;&gt;&gt; &gt;wrote:
&gt;&gt;&gt;
&gt;&gt;&gt; &gt; I have Lucene 2.3.1 code and indexes deployed in production in a
&gt;&gt;&gt; &gt; distributed
&gt;&gt;&gt; &gt; system and would like to bring everything up to date with 3.0.0 via
&gt;&gt;&gt; 2.9.1.
&gt;&gt;&gt; &gt;
&gt;&gt;&gt; &gt; Here's my migration plan:
&gt;&gt;&gt; &gt;
&gt;&gt;&gt; &gt; 1. Add a index writer which generates a 2.9.1 "test" index
&gt;&gt;&gt; &gt; 2. Have that "test" index writer push that 2.9.1 "test" index into
&gt;&gt;&gt; &gt; production and see if the distributed 2.3.1 index readers can cope with
&gt;&gt;&gt; it
&gt;&gt;&gt; &gt; OK
&gt;&gt;&gt; &gt; 3. Upgrade my index writers to 2.9.1 (still using evil compressed fields)
&gt;&gt;&gt; -
&gt;&gt;&gt; &gt; we shall have 2.9.1 writers and 2.3.1 readers during this phase. See if
&gt;&gt;&gt; it
&gt;&gt;&gt; &gt; works.
&gt;&gt;&gt; &gt; 4. Upgrade my index readers to 2.9.1 (still using evil compressed fields,
&gt;&gt;&gt; &gt; but with support for explicit use of CompressionTools for decompression,
&gt;&gt;&gt; &gt; where fields have been explicitly compressed with CompressionTools - the
&gt;&gt;&gt; &gt; application will knows which need decompression)
&gt;&gt;&gt; &gt; 5. Add a CompressionTools to my "test" index writer, generating
&gt;&gt;&gt; explicitly
&gt;&gt;&gt; &gt; compressed fields in the 2.9.1 "test" index
&gt;&gt;&gt; &gt; 6. Test explicit decompression for relevant fields with CompressionTools
&gt;&gt;&gt; in
&gt;&gt;&gt; &gt; my 2.9.1 "test" index in my index readers
&gt;&gt;&gt; &gt; 7. Upgrade my "test" index writer to 3.0.0
&gt;&gt;&gt; &gt; 8. Have that "test" index writer push that 3.0.0 "test" index into
&gt;&gt;&gt; &gt; production and see if the distributed 2.9.1 index readers can cope with
&gt;&gt;&gt; it
&gt;&gt;&gt; &gt; OK
&gt;&gt;&gt; &gt; 9. Go through my index writers and index reader clients and
&gt;&gt;&gt; systematically
&gt;&gt;&gt; &gt; purge all of the Field.Store.COMPRESS fields and migrate to an explicit
&gt;&gt;&gt; &gt; CompressionTools approach where applicable and no compression where
&gt;&gt;&gt; &gt; applicable. During this phase I'll expect to have
&gt;&gt;&gt; &gt; CompressionTools-compressed fields coexisting with their
&gt;&gt;&gt; &gt; Field.Store.COMPRESS predecessors, where index reader client use of
&gt;&gt;&gt; &gt; Field.Store.COMPRESS is in transit to the explicit decompression
&gt;&gt;&gt; approach.
&gt;&gt;&gt; &gt; 10. Upgrade my index writers to 3.0.0
&gt;&gt;&gt; &gt; 11. Upgrade my index readers to 3.0.0
&gt;&gt;&gt; &gt;
&gt;&gt;&gt; &gt; I've simplified this a bit, because I shan't really be testing straight
&gt;&gt;&gt; off
&gt;&gt;&gt; &gt; in production(!) - I'll test the migration plan in a test cluster first;
&gt;&gt;&gt; &gt; but
&gt;&gt;&gt; &gt; this gives you the idea about the path.
&gt;&gt;&gt; &gt;
&gt;&gt;&gt; &gt; I wanted to know if I should expect problems with this plan. I'm
&gt;&gt;&gt; depending
&gt;&gt;&gt; &gt; on newer writers generating indexes for older readers and 3 is a major
&gt;&gt;&gt; &gt; number upgrade. It looks like I can get away with this in version 3, but
&gt;&gt;&gt; &gt; that's by no means a guarantee according to
&gt;&gt;&gt; &gt; http://wiki.apache.org/lucene-java/BackwardsCompatibility#File_Formats
&gt;&gt;&gt; &gt;
&gt;&gt;&gt; &gt; Does this sound like a good plan?
&gt;&gt;&gt; &gt;
&gt;&gt;&gt; &gt;
&gt;&gt;&gt; &gt; ---------------------------------------------------------------------
&gt;&gt;&gt; &gt; To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
&gt;&gt;&gt; &gt; For additional commands, e-mail: java-user-help@lucene.apache.org
&gt;&gt;&gt; &gt;
&gt;&gt;&gt; &gt;
&gt;&gt;&gt;
&gt;&gt;&gt;
&gt;&gt;&gt; --
&gt;&gt;&gt; Weiwei Wang
&gt;&gt;&gt; Alex Wang
&gt;&gt;&gt; ēˇ‹å·¨å·¨
&gt;&gt;&gt; Room 403, Mengmin Wei Building
&gt;&gt;&gt; Computer Science Department
&gt;&gt;&gt; Gulou Campus of Nanjing University
&gt;&gt;&gt; Nanjing, P.R.China, 210093
&gt;&gt;&gt;
&gt;&gt;&gt; Homepage: http://cs.nju.edu.cn/rl/weiweiwang
&gt;&gt;&gt;
&gt;&gt;&gt;
&gt;&gt;&gt; ---------------------------------------------------------------------
&gt;&gt;&gt; To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
&gt;&gt;&gt; For additional commands, e-mail: java-user-help@lucene.apache.org
&gt;&gt;&gt;
&gt;&gt;&gt;
&gt;&gt;
&gt;&gt;
&gt;&gt; --
&gt;&gt; Weiwei Wang
&gt;&gt; Alex Wang
&gt;&gt; ēˇ‹å·¨å·¨
&gt;&gt; Room 403, Mengmin Wei Building
&gt;&gt; Computer Science Department
&gt;&gt; Gulou Campus of Nanjing University
&gt;&gt; Nanjing, P.R.China, 210093
&gt;&gt;
&gt;&gt; Homepage: http://cs.nju.edu.cn/rl/weiweiwang
&gt;&gt;
&gt;
&gt; ---------------------------------------------------------------------
&gt; To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
&gt; For additional commands, e-mail: java-user-help@lucene.apache.org
&gt;
&gt;
&gt; ---------------------------------------------------------------------
&gt; To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
&gt; For additional commands, e-mail: java-user-help@lucene.apache.org
&gt;
&gt;

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org



</pre>
</div>
</content>
</entry>
<entry>
<title>RE: Index file compatibility and a migration plan to lucene 3</title>
<author><name>&quot;Rob Staveley \(Tom\)&quot; &lt;rstaveley@seseit.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-java-user/200912.mbox/%3c015501ca78de$f88d19b0$e9a74d10$@com%3e"/>
<id>urn:uuid:%3c015501ca78de$f88d19b0$e9a74d10$@com%3e</id>
<updated>2009-12-09T14:50:34Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Thanks, Danil. I think you've saved me a lot of time. Weiwei too - converting rather than reindexing
everything, which will save a lot of time.

So, I should do this:

1. Convert readers to 2.9.1, which should be able to read any 2.x index including the existing
2.3.1 indexes
2. Convert writers to 2.9.1, using Weiwei's idea (converting the index with a 2.9.1 reader+writer
conversion utility) to save some time.
3. Have the writers push converted indexes to the readers using the existing production infrastructure
4. Like (9.) in my original plan. [Go through my index writers and index reader clients and
systematically purge all of the Field.Store.COMPRESS fields and migrate to an explicit CompressionTools
approach where applicable and no compression where applicable. During this phase I'll expect
to have CompressionTools-compressed fields coexisting with their Field.Store.COMPRESS predecessors,
where index reader client use of Field.Store.COMPRESS is in transit to the explicit decompression
approach.]
5. Convert the readers to 3.0.0, which should be able to read 2.9.1, if there are no compressed
fields (??)
6. Convert the writers to 3.0.0



-----Original Message-----
From: Danil Å¢ORIN [mailto:torindan@gmail.com] 
Sent: 09 December 2009 13:20
To: java-user@lucene.apache.org
Subject: Re: Index file compatibility and a migration plan to lucene 3

You NEED to update your readers first, or else they will be unable to
read files created by newer version.
And trust me, there are changes in index format from 2.3 -&gt; 2.9

On Wed, Dec 9, 2009 at 15:11, Weiwei Wang &lt;ww.wang.cs@gmail.com&gt; wrote:
&gt; Hi, Rob,
&gt; I read
&gt; http://wiki.apache.org/lucene-java/BackwardsCompatibility#File_Formats and
&gt; found no compatibility guarantee for IndexWriter between different version.
&gt;
&gt; You can run your idea as a test and see the output.
&gt; If it doesn't work, i suggest you convert your index to new version as I
&gt; said in the last post.
&gt;
&gt; You can develop a convert tool to do this job automatically(that what i have
&gt; done).
&gt;
&gt; If you do not have full access to the data center, you can read(readonly
&gt; mode is preferred) from the data center(through nfs or something like that)
&gt; and write to your local disk.
&gt;
&gt; When all converting is done, you can copy the new index to the data center
&gt; with the help of the administrator.
&gt;
&gt; On Wed, Dec 9, 2009 at 8:42 PM, Rob Staveley (Tom) &lt;rstaveley@seseit.com&gt;wrote:
&gt;
&gt;&gt; Thanks for the swift response, Weiwei.
&gt;&gt;
&gt;&gt; In my deployment, my index readers are in a data centre and therefore more
&gt;&gt; difficult to upgrade than the writers. That's why I wanted to start with the
&gt;&gt; writers rather than the readers. I realise that it looks the wrong way round
&gt;&gt; and http://wiki.apache.org/lucene-java/BackwardsCompatibility#File_Formatseffectively
says that changing the reader first is a better idea for most
&gt;&gt; situations, but I wanted to know if writer first would work for me for 2.3.1
&gt;&gt; -&gt; 3.0.0.
&gt;&gt;
&gt;&gt; -----Original Message-----
&gt;&gt; From: Weiwei Wang [mailto:ww.wang.cs@gmail.com]
&gt;&gt; Sent: 09 December 2009 12:21
&gt;&gt; To: java-user@lucene.apache.org
&gt;&gt; Subject: Re: Index file compatibility and a migration plan to lucene 3
&gt;&gt;
&gt;&gt; Iā€™ve finished a upgrade from 2.4.1 to 3.0.0
&gt;&gt;
&gt;&gt; What I do is like this:
&gt;&gt; 1. Upgrade my user-defined analyzer, tokenizer and filter to 3.0.0
&gt;&gt; 2. Use a 3.0.0 IndexReader to read the old version index and then use a
&gt;&gt; 3.0.0 IndexWriter to write all the documents into a new index
&gt;&gt; 3. Update QueryPaser to 3.0.0
&gt;&gt;
&gt;&gt; I've redeployed my system and it works fine now.
&gt;&gt;
&gt;&gt;
&gt;&gt; On Wed, Dec 9, 2009 at 8:13 PM, Rob Staveley (Tom) &lt;rstaveley@seseit.com
&gt;&gt; &gt;wrote:
&gt;&gt;
&gt;&gt; &gt; I have Lucene 2.3.1 code and indexes deployed in production in a
&gt;&gt; &gt; distributed
&gt;&gt; &gt; system and would like to bring everything up to date with 3.0.0 via
&gt;&gt; 2.9.1.
&gt;&gt; &gt;
&gt;&gt; &gt; Here's my migration plan:
&gt;&gt; &gt;
&gt;&gt; &gt; 1. Add a index writer which generates a 2.9.1 "test" index
&gt;&gt; &gt; 2. Have that "test" index writer push that 2.9.1 "test" index into
&gt;&gt; &gt; production and see if the distributed 2.3.1 index readers can cope with
&gt;&gt; it
&gt;&gt; &gt; OK
&gt;&gt; &gt; 3. Upgrade my index writers to 2.9.1 (still using evil compressed fields)
&gt;&gt; -
&gt;&gt; &gt; we shall have 2.9.1 writers and 2.3.1 readers during this phase. See if
&gt;&gt; it
&gt;&gt; &gt; works.
&gt;&gt; &gt; 4. Upgrade my index readers to 2.9.1 (still using evil compressed fields,
&gt;&gt; &gt; but with support for explicit use of CompressionTools for decompression,
&gt;&gt; &gt; where fields have been explicitly compressed with CompressionTools - the
&gt;&gt; &gt; application will knows which need decompression)
&gt;&gt; &gt; 5. Add a CompressionTools to my "test" index writer, generating
&gt;&gt; explicitly
&gt;&gt; &gt; compressed fields in the 2.9.1 "test" index
&gt;&gt; &gt; 6. Test explicit decompression for relevant fields with CompressionTools
&gt;&gt; in
&gt;&gt; &gt; my 2.9.1 "test" index in my index readers
&gt;&gt; &gt; 7. Upgrade my "test" index writer to 3.0.0
&gt;&gt; &gt; 8. Have that "test" index writer push that 3.0.0 "test" index into
&gt;&gt; &gt; production and see if the distributed 2.9.1 index readers can cope with
&gt;&gt; it
&gt;&gt; &gt; OK
&gt;&gt; &gt; 9. Go through my index writers and index reader clients and
&gt;&gt; systematically
&gt;&gt; &gt; purge all of the Field.Store.COMPRESS fields and migrate to an explicit
&gt;&gt; &gt; CompressionTools approach where applicable and no compression where
&gt;&gt; &gt; applicable. During this phase I'll expect to have
&gt;&gt; &gt; CompressionTools-compressed fields coexisting with their
&gt;&gt; &gt; Field.Store.COMPRESS predecessors, where index reader client use of
&gt;&gt; &gt; Field.Store.COMPRESS is in transit to the explicit decompression
&gt;&gt; approach.
&gt;&gt; &gt; 10. Upgrade my index writers to 3.0.0
&gt;&gt; &gt; 11. Upgrade my index readers to 3.0.0
&gt;&gt; &gt;
&gt;&gt; &gt; I've simplified this a bit, because I shan't really be testing straight
&gt;&gt; off
&gt;&gt; &gt; in production(!) - I'll test the migration plan in a test cluster first;
&gt;&gt; &gt; but
&gt;&gt; &gt; this gives you the idea about the path.
&gt;&gt; &gt;
&gt;&gt; &gt; I wanted to know if I should expect problems with this plan. I'm
&gt;&gt; depending
&gt;&gt; &gt; on newer writers generating indexes for older readers and 3 is a major
&gt;&gt; &gt; number upgrade. It looks like I can get away with this in version 3, but
&gt;&gt; &gt; that's by no means a guarantee according to
&gt;&gt; &gt; http://wiki.apache.org/lucene-java/BackwardsCompatibility#File_Formats
&gt;&gt; &gt;
&gt;&gt; &gt; Does this sound like a good plan?
&gt;&gt; &gt;
&gt;&gt; &gt;
&gt;&gt; &gt; ---------------------------------------------------------------------
&gt;&gt; &gt; To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
&gt;&gt; &gt; For additional commands, e-mail: java-user-help@lucene.apache.org
&gt;&gt; &gt;
&gt;&gt; &gt;
&gt;&gt;
&gt;&gt;
&gt;&gt; --
&gt;&gt; Weiwei Wang
&gt;&gt; Alex Wang
&gt;&gt; ēˇ‹å·¨å·¨
&gt;&gt; Room 403, Mengmin Wei Building
&gt;&gt; Computer Science Department
&gt;&gt; Gulou Campus of Nanjing University
&gt;&gt; Nanjing, P.R.China, 210093
&gt;&gt;
&gt;&gt; Homepage: http://cs.nju.edu.cn/rl/weiweiwang
&gt;&gt;
&gt;&gt;
&gt;&gt; ---------------------------------------------------------------------
&gt;&gt; To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
&gt;&gt; For additional commands, e-mail: java-user-help@lucene.apache.org
&gt;&gt;
&gt;&gt;
&gt;
&gt;
&gt; --
&gt; Weiwei Wang
&gt; Alex Wang
&gt; ēˇ‹å·¨å·¨
&gt; Room 403, Mengmin Wei Building
&gt; Computer Science Department
&gt; Gulou Campus of Nanjing University
&gt; Nanjing, P.R.China, 210093
&gt;
&gt; Homepage: http://cs.nju.edu.cn/rl/weiweiwang
&gt;

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org



</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Index file compatibility and a migration plan to lucene 3</title>
<author><name>=?UTF-8?B?RGFuaWwgxaJPUklO?= &lt;torindan@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-java-user/200912.mbox/%3c2ffb6d060912090519p290aae88idac9252c5dec16ab@mail.gmail.com%3e"/>
<id>urn:uuid:%3c2ffb6d060912090519p290aae88idac9252c5dec16ab@mail-gmail-com%3e</id>
<updated>2009-12-09T13:19:38Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
You NEED to update your readers first, or else they will be unable to
read files created by newer version.
And trust me, there are changes in index format from 2.3 -&gt; 2.9

On Wed, Dec 9, 2009 at 15:11, Weiwei Wang &lt;ww.wang.cs@gmail.com&gt; wrote:
&gt; Hi, Rob,
&gt; I read
&gt; http://wiki.apache.org/lucene-java/BackwardsCompatibility#File_Formats and
&gt; found no compatibility guarantee for IndexWriter between different version.
&gt;
&gt; You can run your idea as a test and see the output.
&gt; If it doesn't work, i suggest you convert your index to new version as I
&gt; said in the last post.
&gt;
&gt; You can develop a convert tool to do this job automatically(that what i have
&gt; done).
&gt;
&gt; If you do not have full access to the data center, you can read(readonly
&gt; mode is preferred) from the data center(through nfs or something like that)
&gt; and write to your local disk.
&gt;
&gt; When all converting is done, you can copy the new index to the data center
&gt; with the help of the administrator.
&gt;
&gt; On Wed, Dec 9, 2009 at 8:42 PM, Rob Staveley (Tom) &lt;rstaveley@seseit.com&gt;wrote:
&gt;
&gt;&gt; Thanks for the swift response, Weiwei.
&gt;&gt;
&gt;&gt; In my deployment, my index readers are in a data centre and therefore more
&gt;&gt; difficult to upgrade than the writers. That's why I wanted to start with the
&gt;&gt; writers rather than the readers. I realise that it looks the wrong way round
&gt;&gt; and http://wiki.apache.org/lucene-java/BackwardsCompatibility#File_Formatseffectively
says that changing the reader first is a better idea for most
&gt;&gt; situations, but I wanted to know if writer first would work for me for 2.3.1
&gt;&gt; -&gt; 3.0.0.
&gt;&gt;
&gt;&gt; -----Original Message-----
&gt;&gt; From: Weiwei Wang [mailto:ww.wang.cs@gmail.com]
&gt;&gt; Sent: 09 December 2009 12:21
&gt;&gt; To: java-user@lucene.apache.org
&gt;&gt; Subject: Re: Index file compatibility and a migration plan to lucene 3
&gt;&gt;
&gt;&gt; Iā€™ve finished a upgrade from 2.4.1 to 3.0.0
&gt;&gt;
&gt;&gt; What I do is like this:
&gt;&gt; 1. Upgrade my user-defined analyzer, tokenizer and filter to 3.0.0
&gt;&gt; 2. Use a 3.0.0 IndexReader to read the old version index and then use a
&gt;&gt; 3.0.0 IndexWriter to write all the documents into a new index
&gt;&gt; 3. Update QueryPaser to 3.0.0
&gt;&gt;
&gt;&gt; I've redeployed my system and it works fine now.
&gt;&gt;
&gt;&gt;
&gt;&gt; On Wed, Dec 9, 2009 at 8:13 PM, Rob Staveley (Tom) &lt;rstaveley@seseit.com
&gt;&gt; &gt;wrote:
&gt;&gt;
&gt;&gt; &gt; I have Lucene 2.3.1 code and indexes deployed in production in a
&gt;&gt; &gt; distributed
&gt;&gt; &gt; system and would like to bring everything up to date with 3.0.0 via
&gt;&gt; 2.9.1.
&gt;&gt; &gt;
&gt;&gt; &gt; Here's my migration plan:
&gt;&gt; &gt;
&gt;&gt; &gt; 1. Add a index writer which generates a 2.9.1 "test" index
&gt;&gt; &gt; 2. Have that "test" index writer push that 2.9.1 "test" index into
&gt;&gt; &gt; production and see if the distributed 2.3.1 index readers can cope with
&gt;&gt; it
&gt;&gt; &gt; OK
&gt;&gt; &gt; 3. Upgrade my index writers to 2.9.1 (still using evil compressed fields)
&gt;&gt; -
&gt;&gt; &gt; we shall have 2.9.1 writers and 2.3.1 readers during this phase. See if
&gt;&gt; it
&gt;&gt; &gt; works.
&gt;&gt; &gt; 4. Upgrade my index readers to 2.9.1 (still using evil compressed fields,
&gt;&gt; &gt; but with support for explicit use of CompressionTools for decompression,
&gt;&gt; &gt; where fields have been explicitly compressed with CompressionTools - the
&gt;&gt; &gt; application will knows which need decompression)
&gt;&gt; &gt; 5. Add a CompressionTools to my "test" index writer, generating
&gt;&gt; explicitly
&gt;&gt; &gt; compressed fields in the 2.9.1 "test" index
&gt;&gt; &gt; 6. Test explicit decompression for relevant fields with CompressionTools
&gt;&gt; in
&gt;&gt; &gt; my 2.9.1 "test" index in my index readers
&gt;&gt; &gt; 7. Upgrade my "test" index writer to 3.0.0
&gt;&gt; &gt; 8. Have that "test" index writer push that 3.0.0 "test" index into
&gt;&gt; &gt; production and see if the distributed 2.9.1 index readers can cope with
&gt;&gt; it
&gt;&gt; &gt; OK
&gt;&gt; &gt; 9. Go through my index writers and index reader clients and
&gt;&gt; systematically
&gt;&gt; &gt; purge all of the Field.Store.COMPRESS fields and migrate to an explicit
&gt;&gt; &gt; CompressionTools approach where applicable and no compression where
&gt;&gt; &gt; applicable. During this phase I'll expect to have
&gt;&gt; &gt; CompressionTools-compressed fields coexisting with their
&gt;&gt; &gt; Field.Store.COMPRESS predecessors, where index reader client use of
&gt;&gt; &gt; Field.Store.COMPRESS is in transit to the explicit decompression
&gt;&gt; approach.
&gt;&gt; &gt; 10. Upgrade my index writers to 3.0.0
&gt;&gt; &gt; 11. Upgrade my index readers to 3.0.0
&gt;&gt; &gt;
&gt;&gt; &gt; I've simplified this a bit, because I shan't really be testing straight
&gt;&gt; off
&gt;&gt; &gt; in production(!) - I'll test the migration plan in a test cluster first;
&gt;&gt; &gt; but
&gt;&gt; &gt; this gives you the idea about the path.
&gt;&gt; &gt;
&gt;&gt; &gt; I wanted to know if I should expect problems with this plan. I'm
&gt;&gt; depending
&gt;&gt; &gt; on newer writers generating indexes for older readers and 3 is a major
&gt;&gt; &gt; number upgrade. It looks like I can get away with this in version 3, but
&gt;&gt; &gt; that's by no means a guarantee according to
&gt;&gt; &gt; http://wiki.apache.org/lucene-java/BackwardsCompatibility#File_Formats
&gt;&gt; &gt;
&gt;&gt; &gt; Does this sound like a good plan?
&gt;&gt; &gt;
&gt;&gt; &gt;
&gt;&gt; &gt; ---------------------------------------------------------------------
&gt;&gt; &gt; To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
&gt;&gt; &gt; For additional commands, e-mail: java-user-help@lucene.apache.org
&gt;&gt; &gt;
&gt;&gt; &gt;
&gt;&gt;
&gt;&gt;
&gt;&gt; --
&gt;&gt; Weiwei Wang
&gt;&gt; Alex Wang
&gt;&gt; ēˇ‹å·¨å·¨
&gt;&gt; Room 403, Mengmin Wei Building
&gt;&gt; Computer Science Department
&gt;&gt; Gulou Campus of Nanjing University
&gt;&gt; Nanjing, P.R.China, 210093
&gt;&gt;
&gt;&gt; Homepage: http://cs.nju.edu.cn/rl/weiweiwang
&gt;&gt;
&gt;&gt;
&gt;&gt; ---------------------------------------------------------------------
&gt;&gt; To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
&gt;&gt; For additional commands, e-mail: java-user-help@lucene.apache.org
&gt;&gt;
&gt;&gt;
&gt;
&gt;
&gt; --
&gt; Weiwei Wang
&gt; Alex Wang
&gt; ēˇ‹å·¨å·¨
&gt; Room 403, Mengmin Wei Building
&gt; Computer Science Department
&gt; Gulou Campus of Nanjing University
&gt; Nanjing, P.R.China, 210093
&gt;
&gt; Homepage: http://cs.nju.edu.cn/rl/weiweiwang
&gt;

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org



</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Index file compatibility and a migration plan to lucene 3</title>
<author><name>Weiwei Wang &lt;ww.wang.cs@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-java-user/200912.mbox/%3c7d94dcde0912090511k15712ea2i9f306b727df1a69e@mail.gmail.com%3e"/>
<id>urn:uuid:%3c7d94dcde0912090511k15712ea2i9f306b727df1a69e@mail-gmail-com%3e</id>
<updated>2009-12-09T13:11:07Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Hi, Rob,
I read
http://wiki.apache.org/lucene-java/BackwardsCompatibility#File_Formats and
found no compatibility guarantee for IndexWriter between different version.

You can run your idea as a test and see the output.
If it doesn't work, i suggest you convert your index to new version as I
said in the last post.

You can develop a convert tool to do this job automatically(that what i have
done).

If you do not have full access to the data center, you can read(readonly
mode is preferred) from the data center(through nfs or something like that)
and write to your local disk.

When all converting is done, you can copy the new index to the data center
with the help of the administrator.

On Wed, Dec 9, 2009 at 8:42 PM, Rob Staveley (Tom) &lt;rstaveley@seseit.com&gt;wrote:

&gt; Thanks for the swift response, Weiwei.
&gt;
&gt; In my deployment, my index readers are in a data centre and therefore more
&gt; difficult to upgrade than the writers. That's why I wanted to start with the
&gt; writers rather than the readers. I realise that it looks the wrong way round
&gt; and http://wiki.apache.org/lucene-java/BackwardsCompatibility#File_Formatseffectively
says that changing the reader first is a better idea for most
&gt; situations, but I wanted to know if writer first would work for me for 2.3.1
&gt; -&gt; 3.0.0.
&gt;
&gt; -----Original Message-----
&gt; From: Weiwei Wang [mailto:ww.wang.cs@gmail.com]
&gt; Sent: 09 December 2009 12:21
&gt; To: java-user@lucene.apache.org
&gt; Subject: Re: Index file compatibility and a migration plan to lucene 3
&gt;
&gt; Iā€™ve finished a upgrade from 2.4.1 to 3.0.0
&gt;
&gt; What I do is like this:
&gt; 1. Upgrade my user-defined analyzer, tokenizer and filter to 3.0.0
&gt; 2. Use a 3.0.0 IndexReader to read the old version index and then use a
&gt; 3.0.0 IndexWriter to write all the documents into a new index
&gt; 3. Update QueryPaser to 3.0.0
&gt;
&gt; I've redeployed my system and it works fine now.
&gt;
&gt;
&gt; On Wed, Dec 9, 2009 at 8:13 PM, Rob Staveley (Tom) &lt;rstaveley@seseit.com
&gt; &gt;wrote:
&gt;
&gt; &gt; I have Lucene 2.3.1 code and indexes deployed in production in a
&gt; &gt; distributed
&gt; &gt; system and would like to bring everything up to date with 3.0.0 via
&gt; 2.9.1.
&gt; &gt;
&gt; &gt; Here's my migration plan:
&gt; &gt;
&gt; &gt; 1. Add a index writer which generates a 2.9.1 "test" index
&gt; &gt; 2. Have that "test" index writer push that 2.9.1 "test" index into
&gt; &gt; production and see if the distributed 2.3.1 index readers can cope with
&gt; it
&gt; &gt; OK
&gt; &gt; 3. Upgrade my index writers to 2.9.1 (still using evil compressed fields)
&gt; -
&gt; &gt; we shall have 2.9.1 writers and 2.3.1 readers during this phase. See if
&gt; it
&gt; &gt; works.
&gt; &gt; 4. Upgrade my index readers to 2.9.1 (still using evil compressed fields,
&gt; &gt; but with support for explicit use of CompressionTools for decompression,
&gt; &gt; where fields have been explicitly compressed with CompressionTools - the
&gt; &gt; application will knows which need decompression)
&gt; &gt; 5. Add a CompressionTools to my "test" index writer, generating
&gt; explicitly
&gt; &gt; compressed fields in the 2.9.1 "test" index
&gt; &gt; 6. Test explicit decompression for relevant fields with CompressionTools
&gt; in
&gt; &gt; my 2.9.1 "test" index in my index readers
&gt; &gt; 7. Upgrade my "test" index writer to 3.0.0
&gt; &gt; 8. Have that "test" index writer push that 3.0.0 "test" index into
&gt; &gt; production and see if the distributed 2.9.1 index readers can cope with
&gt; it
&gt; &gt; OK
&gt; &gt; 9. Go through my index writers and index reader clients and
&gt; systematically
&gt; &gt; purge all of the Field.Store.COMPRESS fields and migrate to an explicit
&gt; &gt; CompressionTools approach where applicable and no compression where
&gt; &gt; applicable. During this phase I'll expect to have
&gt; &gt; CompressionTools-compressed fields coexisting with their
&gt; &gt; Field.Store.COMPRESS predecessors, where index reader client use of
&gt; &gt; Field.Store.COMPRESS is in transit to the explicit decompression
&gt; approach.
&gt; &gt; 10. Upgrade my index writers to 3.0.0
&gt; &gt; 11. Upgrade my index readers to 3.0.0
&gt; &gt;
&gt; &gt; I've simplified this a bit, because I shan't really be testing straight
&gt; off
&gt; &gt; in production(!) - I'll test the migration plan in a test cluster first;
&gt; &gt; but
&gt; &gt; this gives you the idea about the path.
&gt; &gt;
&gt; &gt; I wanted to know if I should expect problems with this plan. I'm
&gt; depending
&gt; &gt; on newer writers generating indexes for older readers and 3 is a major
&gt; &gt; number upgrade. It looks like I can get away with this in version 3, but
&gt; &gt; that's by no means a guarantee according to
&gt; &gt; http://wiki.apache.org/lucene-java/BackwardsCompatibility#File_Formats
&gt; &gt;
&gt; &gt; Does this sound like a good plan?
&gt; &gt;
&gt; &gt;
&gt; &gt; ---------------------------------------------------------------------
&gt; &gt; To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
&gt; &gt; For additional commands, e-mail: java-user-help@lucene.apache.org
&gt; &gt;
&gt; &gt;
&gt;
&gt;
&gt; --
&gt; Weiwei Wang
&gt; Alex Wang
&gt; ēˇ‹å·¨å·¨
&gt; Room 403, Mengmin Wei Building
&gt; Computer Science Department
&gt; Gulou Campus of Nanjing University
&gt; Nanjing, P.R.China, 210093
&gt;
&gt; Homepage: http://cs.nju.edu.cn/rl/weiweiwang
&gt;
&gt;
&gt; ---------------------------------------------------------------------
&gt; To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
&gt; For additional commands, e-mail: java-user-help@lucene.apache.org
&gt;
&gt;


-- 
Weiwei Wang
Alex Wang
ēˇ‹å·¨å·¨
Room 403, Mengmin Wei Building
Computer Science Department
Gulou Campus of Nanjing University
Nanjing, P.R.China, 210093

Homepage: http://cs.nju.edu.cn/rl/weiweiwang


</pre>
</div>
</content>
</entry>
<entry>
<title>RE: Index file compatibility and a migration plan to lucene 3</title>
<author><name>&quot;Rob Staveley \(Tom\)&quot; &lt;rstaveley@seseit.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-java-user/200912.mbox/%3c010401ca78cd$059f3870$10dda950$@com%3e"/>
<id>urn:uuid:%3c010401ca78cd$059f3870$10dda950$@com%3e</id>
<updated>2009-12-09T12:42:05Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Thanks for the swift response, Weiwei.

In my deployment, my index readers are in a data centre and therefore more difficult to upgrade
than the writers. That's why I wanted to start with the writers rather than the readers. I
realise that it looks the wrong way round and http://wiki.apache.org/lucene-java/BackwardsCompatibility#File_Formats
effectively says that changing the reader first is a better idea for most situations, but
I wanted to know if writer first would work for me for 2.3.1 -&gt; 3.0.0.

-----Original Message-----
From: Weiwei Wang [mailto:ww.wang.cs@gmail.com] 
Sent: 09 December 2009 12:21
To: java-user@lucene.apache.org
Subject: Re: Index file compatibility and a migration plan to lucene 3

Iā€™ve finished a upgrade from 2.4.1 to 3.0.0

What I do is like this:
1. Upgrade my user-defined analyzer, tokenizer and filter to 3.0.0
2. Use a 3.0.0 IndexReader to read the old version index and then use a
3.0.0 IndexWriter to write all the documents into a new index
3. Update QueryPaser to 3.0.0

I've redeployed my system and it works fine now.


On Wed, Dec 9, 2009 at 8:13 PM, Rob Staveley (Tom) &lt;rstaveley@seseit.com&gt;wrote:

&gt; I have Lucene 2.3.1 code and indexes deployed in production in a
&gt; distributed
&gt; system and would like to bring everything up to date with 3.0.0 via 2.9.1.
&gt;
&gt; Here's my migration plan:
&gt;
&gt; 1. Add a index writer which generates a 2.9.1 "test" index
&gt; 2. Have that "test" index writer push that 2.9.1 "test" index into
&gt; production and see if the distributed 2.3.1 index readers can cope with it
&gt; OK
&gt; 3. Upgrade my index writers to 2.9.1 (still using evil compressed fields) -
&gt; we shall have 2.9.1 writers and 2.3.1 readers during this phase. See if it
&gt; works.
&gt; 4. Upgrade my index readers to 2.9.1 (still using evil compressed fields,
&gt; but with support for explicit use of CompressionTools for decompression,
&gt; where fields have been explicitly compressed with CompressionTools - the
&gt; application will knows which need decompression)
&gt; 5. Add a CompressionTools to my "test" index writer, generating explicitly
&gt; compressed fields in the 2.9.1 "test" index
&gt; 6. Test explicit decompression for relevant fields with CompressionTools in
&gt; my 2.9.1 "test" index in my index readers
&gt; 7. Upgrade my "test" index writer to 3.0.0
&gt; 8. Have that "test" index writer push that 3.0.0 "test" index into
&gt; production and see if the distributed 2.9.1 index readers can cope with it
&gt; OK
&gt; 9. Go through my index writers and index reader clients and systematically
&gt; purge all of the Field.Store.COMPRESS fields and migrate to an explicit
&gt; CompressionTools approach where applicable and no compression where
&gt; applicable. During this phase I'll expect to have
&gt; CompressionTools-compressed fields coexisting with their
&gt; Field.Store.COMPRESS predecessors, where index reader client use of
&gt; Field.Store.COMPRESS is in transit to the explicit decompression approach.
&gt; 10. Upgrade my index writers to 3.0.0
&gt; 11. Upgrade my index readers to 3.0.0
&gt;
&gt; I've simplified this a bit, because I shan't really be testing straight off
&gt; in production(!) - I'll test the migration plan in a test cluster first;
&gt; but
&gt; this gives you the idea about the path.
&gt;
&gt; I wanted to know if I should expect problems with this plan. I'm depending
&gt; on newer writers generating indexes for older readers and 3 is a major
&gt; number upgrade. It looks like I can get away with this in version 3, but
&gt; that's by no means a guarantee according to
&gt; http://wiki.apache.org/lucene-java/BackwardsCompatibility#File_Formats
&gt;
&gt; Does this sound like a good plan?
&gt;
&gt;
&gt; ---------------------------------------------------------------------
&gt; To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
&gt; For additional commands, e-mail: java-user-help@lucene.apache.org
&gt;
&gt;


-- 
Weiwei Wang
Alex Wang
ēˇ‹å·¨å·¨
Room 403, Mengmin Wei Building
Computer Science Department
Gulou Campus of Nanjing University
Nanjing, P.R.China, 210093

Homepage: http://cs.nju.edu.cn/rl/weiweiwang


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org



</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Index file compatibility and a migration plan to lucene 3</title>
<author><name>Weiwei Wang &lt;ww.wang.cs@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-java-user/200912.mbox/%3c7d94dcde0912090421g693b7834n98ee9b33e5714fa6@mail.gmail.com%3e"/>
<id>urn:uuid:%3c7d94dcde0912090421g693b7834n98ee9b33e5714fa6@mail-gmail-com%3e</id>
<updated>2009-12-09T12:21:08Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Iā€™ve finished a upgrade from 2.4.1 to 3.0.0

What I do is like this:
1. Upgrade my user-defined analyzer, tokenizer and filter to 3.0.0
2. Use a 3.0.0 IndexReader to read the old version index and then use a
3.0.0 IndexWriter to write all the documents into a new index
3. Update QueryPaser to 3.0.0

I've redeployed my system and it works fine now.


On Wed, Dec 9, 2009 at 8:13 PM, Rob Staveley (Tom) &lt;rstaveley@seseit.com&gt;wrote:

&gt; I have Lucene 2.3.1 code and indexes deployed in production in a
&gt; distributed
&gt; system and would like to bring everything up to date with 3.0.0 via 2.9.1.
&gt;
&gt; Here's my migration plan:
&gt;
&gt; 1. Add a index writer which generates a 2.9.1 "test" index
&gt; 2. Have that "test" index writer push that 2.9.1 "test" index into
&gt; production and see if the distributed 2.3.1 index readers can cope with it
&gt; OK
&gt; 3. Upgrade my index writers to 2.9.1 (still using evil compressed fields) -
&gt; we shall have 2.9.1 writers and 2.3.1 readers during this phase. See if it
&gt; works.
&gt; 4. Upgrade my index readers to 2.9.1 (still using evil compressed fields,
&gt; but with support for explicit use of CompressionTools for decompression,
&gt; where fields have been explicitly compressed with CompressionTools - the
&gt; application will knows which need decompression)
&gt; 5. Add a CompressionTools to my "test" index writer, generating explicitly
&gt; compressed fields in the 2.9.1 "test" index
&gt; 6. Test explicit decompression for relevant fields with CompressionTools in
&gt; my 2.9.1 "test" index in my index readers
&gt; 7. Upgrade my "test" index writer to 3.0.0
&gt; 8. Have that "test" index writer push that 3.0.0 "test" index into
&gt; production and see if the distributed 2.9.1 index readers can cope with it
&gt; OK
&gt; 9. Go through my index writers and index reader clients and systematically
&gt; purge all of the Field.Store.COMPRESS fields and migrate to an explicit
&gt; CompressionTools approach where applicable and no compression where
&gt; applicable. During this phase I'll expect to have
&gt; CompressionTools-compressed fields coexisting with their
&gt; Field.Store.COMPRESS predecessors, where index reader client use of
&gt; Field.Store.COMPRESS is in transit to the explicit decompression approach.
&gt; 10. Upgrade my index writers to 3.0.0
&gt; 11. Upgrade my index readers to 3.0.0
&gt;
&gt; I've simplified this a bit, because I shan't really be testing straight off
&gt; in production(!) - I'll test the migration plan in a test cluster first;
&gt; but
&gt; this gives you the idea about the path.
&gt;
&gt; I wanted to know if I should expect problems with this plan. I'm depending
&gt; on newer writers generating indexes for older readers and 3 is a major
&gt; number upgrade. It looks like I can get away with this in version 3, but
&gt; that's by no means a guarantee according to
&gt; http://wiki.apache.org/lucene-java/BackwardsCompatibility#File_Formats
&gt;
&gt; Does this sound like a good plan?
&gt;
&gt;
&gt; ---------------------------------------------------------------------
&gt; To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
&gt; For additional commands, e-mail: java-user-help@lucene.apache.org
&gt;
&gt;


-- 
Weiwei Wang
Alex Wang
ēˇ‹å·¨å·¨
Room 403, Mengmin Wei Building
Computer Science Department
Gulou Campus of Nanjing University
Nanjing, P.R.China, 210093

Homepage: http://cs.nju.edu.cn/rl/weiweiwang


</pre>
</div>
</content>
</entry>
<entry>
<title>Index file compatibility and a migration plan to lucene 3</title>
<author><name>&quot;Rob Staveley \(Tom\)&quot; &lt;rstaveley@seseit.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-java-user/200912.mbox/%3c00e601ca78c9$00c4af50$024e0df0$@com%3e"/>
<id>urn:uuid:%3c00e601ca78c9$00c4af50$024e0df0$@com%3e</id>
<updated>2009-12-09T12:13:20Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
I have Lucene 2.3.1 code and indexes deployed in production in a distributed
system and would like to bring everything up to date with 3.0.0 via 2.9.1.

Here's my migration plan:

1. Add a index writer which generates a 2.9.1 "test" index 
2. Have that "test" index writer push that 2.9.1 "test" index into
production and see if the distributed 2.3.1 index readers can cope with it
OK
3. Upgrade my index writers to 2.9.1 (still using evil compressed fields) -
we shall have 2.9.1 writers and 2.3.1 readers during this phase. See if it
works.
4. Upgrade my index readers to 2.9.1 (still using evil compressed fields,
but with support for explicit use of CompressionTools for decompression,
where fields have been explicitly compressed with CompressionTools - the
application will knows which need decompression)
5. Add a CompressionTools to my "test" index writer, generating explicitly
compressed fields in the 2.9.1 "test" index
6. Test explicit decompression for relevant fields with CompressionTools in
my 2.9.1 "test" index in my index readers
7. Upgrade my "test" index writer to 3.0.0 
8. Have that "test" index writer push that 3.0.0 "test" index into
production and see if the distributed 2.9.1 index readers can cope with it
OK
9. Go through my index writers and index reader clients and systematically
purge all of the Field.Store.COMPRESS fields and migrate to an explicit
CompressionTools approach where applicable and no compression where
applicable. During this phase I'll expect to have
CompressionTools-compressed fields coexisting with their
Field.Store.COMPRESS predecessors, where index reader client use of
Field.Store.COMPRESS is in transit to the explicit decompression approach.
10. Upgrade my index writers to 3.0.0 
11. Upgrade my index readers to 3.0.0

I've simplified this a bit, because I shan't really be testing straight off
in production(!) - I'll test the migration plan in a test cluster first; but
this gives you the idea about the path.

I wanted to know if I should expect problems with this plan. I'm depending
on newer writers generating indexes for older readers and 3 is a major
number upgrade. It looks like I can get away with this in version 3, but
that's by no means a guarantee according to
http://wiki.apache.org/lucene-java/BackwardsCompatibility#File_Formats

Does this sound like a good plan?


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org



</pre>
</div>
</content>
</entry>
<entry>
<title>Re: NearSpansUnordered payloads not returning all the time</title>
<author><name>Michael McCandless &lt;lucene@mikemccandless.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-java-user/200912.mbox/%3c9ac0c6aa0912090223t55e9de63l3dde4fad36db96d0@mail.gmail.com%3e"/>
<id>urn:uuid:%3c9ac0c6aa0912090223t55e9de63l3dde4fad36db96d0@mail-gmail-com%3e</id>
<updated>2009-12-09T10:23:05Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
There was a thread a while back about how span queries don't enumerate
every possible span, but I can't remember if that included sometimes
missing payloads...

Mike

On Tue, Dec 8, 2009 at 7:34 PM, Jason Rutherglen
&lt;jason.rutherglen@gmail.com&gt; wrote:
&gt; Howdy,
&gt;
&gt; I am wondering if anyone has seen
&gt; NearSpansUnordered.getPayload() not return payloads that are
&gt; verifiably accessible via IR.termPositions? It's a bit confusing
&gt; because most of the time they're returned properly.
&gt;
&gt; I suspect the payload logic gets tripped up in
&gt; NearSpansUnordered. I'll put together a test case, however the
&gt; difficulty is that we're only seeing the issue with largish 800
&gt; MB indexes, which could make the test case a little crazy.
&gt;
&gt; Jason
&gt;
&gt; ---------------------------------------------------------------------
&gt; To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
&gt; For additional commands, e-mail: java-user-help@lucene.apache.org
&gt;
&gt;

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org



</pre>
</div>
</content>
</entry>
<entry>
<title>Re: FileNotFoundException on index</title>
<author><name>Michael McCandless &lt;lucene@mikemccandless.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-java-user/200912.mbox/%3c9ac0c6aa0912090159s15ab937fp42c8016bab0790ac@mail.gmail.com%3e"/>
<id>urn:uuid:%3c9ac0c6aa0912090159s15ab937fp42c8016bab0790ac@mail-gmail-com%3e</id>
<updated>2009-12-09T09:59:00Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
OK thanks for bringing closure!

Accidentally allowing 2 writers to write to the same index quickly
leads to corruption.  They are like the Betta fish: they fight to the
death, removing each others files, if you put them in the same cage.

Mike

On Wed, Dec 9, 2009 at 1:56 AM, Max Lynch &lt;ihasmax@gmail.com&gt; wrote:
&gt; Hi Mike,
&gt;
&gt; Missed your response on this,
&gt; What I was doing was physically removing index/write.lock if older than 8
&gt; hours, allowing another process of my indexer to run.  I realize in
&gt; hindsight that there is no reason why I should be doing this and it was
&gt; really stupid.  I think I was under the impression one of my pylucene
&gt; processes was hanging.
&gt;
&gt; On Fri, Oct 9, 2009 at 3:44 AM, Michael McCandless &lt;
&gt; lucene@mikemccandless.com&gt; wrote:
&gt;
&gt;&gt; You can use o.a.l.index.CheckIndex to fix the index.  It will remove
&gt;&gt; references to any segments that are missing or have problems during
&gt;&gt; testing.  First run it without -fix to see what problems there are.
&gt;&gt; Then take a backup of the index.  Then run it with -fix.  The index
&gt;&gt; will lose all docs in those segments that it removes.
&gt;&gt;
&gt;&gt; Can you describe what led up to this?  Is it repeatable?
&gt;&gt;
&gt;&gt; Mike
&gt;&gt;
&gt;&gt; On Fri, Oct 9, 2009 at 12:37 AM, Max Lynch &lt;ihasmax@gmail.com&gt; wrote:
&gt;&gt; &gt; Missed your response, thanks Bernd.
&gt;&gt; &gt;
&gt;&gt; &gt; I don't think that's it, since I haven't been executing any commands like
&gt;&gt; &gt; that.  The only thing I could think of is corruption.  I've got the index
&gt;&gt; &gt; backed up in case there is a way to fix it (it won't matter in a week or
&gt;&gt; so
&gt;&gt; &gt; since I cull any documents older than 25 days).
&gt;&gt; &gt;
&gt;&gt; &gt; Is there a way to fix this?
&gt;&gt; &gt;
&gt;&gt; &gt; Thanks.
&gt;&gt; &gt;
&gt;&gt; &gt; On Thu, Oct 8, 2009 at 3:01 AM, Bernd Fondermann &lt;
&gt;&gt; &gt; bernd.fondermann@googlemail.com&gt; wrote:
&gt;&gt; &gt;
&gt;&gt; &gt;&gt; Hi Max
&gt;&gt; &gt;&gt;
&gt;&gt; &gt;&gt; just a guess: maybe you deleted all *.c source files in that area and
&gt;&gt; &gt;&gt; unintentionally deleted this index file, too.
&gt;&gt; &gt;&gt;
&gt;&gt; &gt;&gt;  Bernd
&gt;&gt; &gt;&gt;
&gt;&gt; &gt;&gt; On Fri, Oct 2, 2009 at 17:10, Max Lynch &lt;ihasmax@gmail.com&gt; wrote:
&gt;&gt; &gt;&gt; &gt; I'm getting this error when I try to run my searcher and my indexer:
&gt;&gt; &gt;&gt; &gt;
&gt;&gt; &gt;&gt; &gt; Traceback (most recent call last):
&gt;&gt; &gt;&gt; &gt; self.searcher = lucene.IndexSearcher(self.directory)
&gt;&gt; &gt;&gt; &gt; JavaError: java.io.FileNotFoundException:
&gt;&gt; &gt;&gt; /home/spider/misc/index/_275c.cfs
&gt;&gt; &gt;&gt; &gt; (No such file or directory)
&gt;&gt; &gt;&gt; &gt;
&gt;&gt; &gt;&gt; &gt; I don't know anything about the format of the Lucene index, but I
&gt;&gt; notice
&gt;&gt; &gt;&gt; I
&gt;&gt; &gt;&gt; &gt; have several _275* files from b to j but no c.
&gt;&gt; &gt;&gt; &gt;
&gt;&gt; &gt;&gt; &gt; Any ideas?
&gt;&gt; &gt;&gt; &gt;
&gt;&gt; &gt;&gt; &gt; Thanks.
&gt;&gt; &gt;&gt; &gt;
&gt;&gt; &gt;&gt;
&gt;&gt; &gt;&gt; ---------------------------------------------------------------------
&gt;&gt; &gt;&gt; To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
&gt;&gt; &gt;&gt; For additional commands, e-mail: java-user-help@lucene.apache.org
&gt;&gt; &gt;&gt;
&gt;&gt; &gt;&gt;
&gt;&gt; &gt;
&gt;&gt;
&gt;&gt; ---------------------------------------------------------------------
&gt;&gt; To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
&gt;&gt; For additional commands, e-mail: java-user-help@lucene.apache.org
&gt;&gt;
&gt;&gt;
&gt;

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org



</pre>
</div>
</content>
</entry>
<entry>
<title>Kindly spot out the reason for this undesired output.</title>
<author><name>DHIVYA M &lt;dhivyakrishnan87@yahoo.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-java-user/200912.mbox/%3c386784.67012.qm@web94806.mail.in2.yahoo.com%3e"/>
<id>urn:uuid:%3c386784-67012-qm@web94806-mail-in2-yahoo-com%3e</id>
<updated>2009-12-09T09:57:28Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Hi all,
 
I have attached the code file and also the output screenshot. The problem is:
 
When i search for a query, it gives out the resultant documents containing the query but,
 
if two documents a and b contains that query,
Result is:
 
Link for a:
contents of a
contents of b
 
Link for b:
contents of a
contents of b
 
kindly go through the attached documents for the code, obtained output and the correct output
screenshots. 
Can anyone pls solve this problem.
I want the contents of the corresponding file alone to be displayed and the repetition should
be avoided.
 
Also how to rank my results / how to implement relevancy ranking.
Am a beginner and have no idea so kindly help me on this context.
 
Thanks,
Dhivya


      The INTERNET now has a personality. YOURS! See your Yahoo! Homepage. http://in.yahoo.com/

</pre>
</div>
</content>
</entry>
<entry>
<title>RE: [VOTE] Push fast-vector-highlighter mvn artifacts for 3.0.0 and 2.9.1</title>
<author><name>&quot;Uwe Schindler&quot; &lt;uwe@thetaphi.de&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-java-user/200912.mbox/%3c01AF1109550C4822A317DE46CFE66280@VEGA%3e"/>
<id>urn:uuid:%3c01AF1109550C4822A317DE46CFE66280@VEGA%3e</id>
<updated>2009-12-09T09:28:10Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Hi all,

The missing maven artifacts for the fast-vector-highlighter contrib of
Lucene Java in version 2.9.1 and 3.0.0 are now available at:

http://repo1.maven.org/maven2/org/apache/lucene/
http://repo2.maven.org/maven2/org/apache/lucene/

Uwe

-----
Uwe Schindler
uschindler@apache.org 
Apache Lucene Java Committer
Bremen, Germany
http://lucene.apache.org/java/docs/

&gt; From: Uwe Schindler [mailto:uwe@thetaphi.de]
&gt; Sent: Tuesday, December 08, 2009 10:41 PM
&gt; To: java-dev@lucene.apache.org; general@lucene.apache.org
&gt; Subject: RE: [VOTE] Push fast-vector-highlighter mvn artifacts for 3.0.0
&gt; and 2.9.1
&gt; 
&gt; I got 3 binding votes from Grant, Mike, and Ted (and one from Simon, who
&gt; was
&gt; a big help on Sunday evening when I created the artifacts), so I push the
&gt; maven artifacts onto the rsync repo in few minutes.
&gt; 
&gt; Thanks!
&gt; 
&gt; -----
&gt; Uwe Schindler
&gt; H.-H.-Meier-Allee 63, D-28213 Bremen
&gt; http://www.thetaphi.de
&gt; eMail: uwe@thetaphi.de
&gt; 
&gt; &gt; -----Original Message-----
&gt; &gt; From: Uwe Schindler [mailto:uwe@thetaphi.de]
&gt; &gt; Sent: Tuesday, December 08, 2009 7:03 PM
&gt; &gt; To: java-dev@lucene.apache.org
&gt; &gt; Subject: [VOTE] Push fast-vector-highlighter mvn artifacts for 3.0.0 and
&gt; &gt; 2.9.1
&gt; &gt;
&gt; &gt; Sorry,
&gt; &gt;
&gt; &gt; I initially didn't want to start a vote, as Grant only proposed to
&gt; "maybe
&gt; &gt; start one". But nobody responded (esp. to the questions in this mail) I
&gt; &gt; ask
&gt; &gt; again, an I will start the vote for now.
&gt; &gt;
&gt; &gt;
&gt; ==========================================================================
&gt; &gt; ==
&gt; &gt; Please vote, that the missing artifacts for of fast-verctor-highlighter
&gt; of
&gt; &gt; Lucene Java 2.9.1 and 3.0.0 should be pushed to repoX.maven.org.
&gt; &gt;
&gt; &gt; You can find the artifacts here:
&gt; &gt; http://people.apache.org/~uschindler/staging-area/
&gt; &gt;
&gt; &gt; This dir contains only the maven folder to be copied to maven-rsync dir
&gt; on
&gt; &gt; p.a.o. The top-level version in the maven metadata is 3.0.0, which
&gt; &gt; conforms
&gt; &gt; to the current state on maven (so during merging both folders during
&gt; &gt; build,
&gt; &gt; I set preference to metadata.xml of 3.0.0).
&gt; &gt;
&gt; &gt; All files are signed by my PGP key (even the 2.9.1 ones; that release
&gt; was
&gt; &gt; originally built by Mike McCandless).
&gt; &gt;
&gt; ==========================================================================
&gt; &gt; ==
&gt; &gt;
&gt; &gt; What I additionally found out until now (because Simon nagged me):
&gt; &gt;
&gt; &gt; If you compare the JAR files inside the binary ZIP file from the apache
&gt; &gt; archive and the JAR files directly published on maven (for the other
&gt; &gt; contribs), the MD5s/SHA1s are different even as they are created from
&gt; the
&gt; &gt; same source code (because the timestamps inside the JAR are different,
&gt; for
&gt; &gt; 2.9.1 another JDK compiler/platform was used). This interestingly does
&gt; not
&gt; &gt; apply to lucene-core.jar in 3.0. Because of that I see no problem with
&gt; &gt; this
&gt; &gt; maven release, even that they are not the orginal JAR files from the
&gt; &gt; binary
&gt; &gt; distrib.
&gt; &gt;
&gt; &gt; What is not nice, is that the svn revision number in the manifest is
&gt; &gt; different, but else is exactly the same, see my comments below in
&gt; earlier
&gt; &gt; mails about changing the ant script for showing the SVN rev of the last
&gt; &gt; changed file.
&gt; &gt;
&gt; &gt; So if nobody objects to release these rebuild jar files, all signed by
&gt; my
&gt; &gt; key, I would like to simply put them on the maven-rsync folder.
&gt; &gt;
&gt; &gt; Uwe
&gt; &gt;
&gt; &gt; -----
&gt; &gt; Uwe Schindler
&gt; &gt; H.-H.-Meier-Allee 63, D-28213 Bremen
&gt; &gt; http://www.thetaphi.de
&gt; &gt; eMail: uwe@thetaphi.de
&gt; &gt;
&gt; &gt;
&gt; &gt; &gt; -----Original Message-----
&gt; &gt; &gt; From: Chris Hostetter [mailto:hossman_lucene@fucit.org]
&gt; &gt; &gt; Sent: Tuesday, December 08, 2009 6:48 PM
&gt; &gt; &gt; To: java-dev@lucene.apache.org
&gt; &gt; &gt; Subject: Re: (NAG) Push fast-vector-highlighter mvn artifacts for 3.0
&gt; &gt; and
&gt; &gt; &gt; 2.9
&gt; &gt; &gt;
&gt; &gt; &gt;
&gt; &gt; &gt; : What to do now, any votes on adding the missing maven artifacts for
&gt; &gt; &gt; : fast-vector-highlighter to 2.9.1 and 3.0.0 on the apache maven
&gt; &gt; &gt; reposititory?
&gt; &gt; &gt;
&gt; &gt; &gt; It's not even clear to me that anything special needs to be done
&gt; before
&gt; &gt; &gt; publishing those jars to maven.  2.9.1 and 3.0.0 were already voted on
&gt; &gt; and
&gt; &gt; &gt; released -- including all of the source code in them.
&gt; &gt; &gt;
&gt; &gt; &gt; The safest bet least likely to anger the process gods is just to call
&gt; a
&gt; &gt; &gt; vote (new thread with VOTE in the subject) and cast a vote ...
&gt; &gt; considering
&gt; &gt; &gt; the sources has already been reviewed it should go pretty quick.
&gt; &gt; &gt;
&gt; &gt; &gt; :
&gt; &gt; &gt; : &gt; I rebuilt the maven-dir for 2.9.1 and 3.0.0, merged them (3.0.0 is
&gt; &gt; &gt; top-
&gt; &gt; &gt; : &gt; level
&gt; &gt; &gt; : &gt; version) and extracted only fast-vector-highlighter:
&gt; &gt; &gt; : &gt;
&gt; &gt; &gt; : &gt; http://people.apache.org/~uschindler/staging-area/
&gt; &gt; &gt; : &gt;
&gt; &gt; &gt; : &gt; I will copy this dir to the maven folder on people.a.o, when I got
&gt; &gt; &gt; votes
&gt; &gt; &gt; : &gt; (how many)? At least someone should check the signatures.
&gt; &gt; &gt; : &gt;
&gt; &gt; &gt; : &gt; By the way, we have a small error in our ant build.xml that
&gt; inserts
&gt; &gt; &gt; : &gt; svnversion into the manifest file. This version is not the version
&gt; &gt; of
&gt; &gt; &gt; the
&gt; &gt; &gt; : &gt; last changed item (would be svnversion -c) but the current svn
&gt; &gt; &gt; version,
&gt; &gt; &gt; : &gt; even
&gt; &gt; &gt; : &gt; that I checked out the corresponding tags. It's no problem at all,
&gt; &gt; but
&gt; &gt; &gt; not
&gt; &gt; &gt; : &gt; very nice.
&gt; &gt; &gt; : &gt;
&gt; &gt; &gt; : &gt; Maybe we should change build.xml to call "svnversion -c" in
&gt; future,
&gt; &gt; to
&gt; &gt; &gt; get
&gt; &gt; &gt; : &gt; the real number.
&gt; &gt; &gt; : &gt;
&gt; &gt; &gt; : &gt; Uwe
&gt; &gt; &gt; : &gt;
&gt; &gt; &gt; : &gt; -----
&gt; &gt; &gt; : &gt; Uwe Schindler
&gt; &gt; &gt; : &gt; H.-H.-Meier-Allee 63, D-28213 Bremen
&gt; &gt; &gt; : &gt; http://www.thetaphi.de
&gt; &gt; &gt; : &gt; eMail: uwe@thetaphi.de
&gt; &gt; &gt; : &gt;
&gt; &gt; &gt; : &gt;
&gt; &gt; &gt; : &gt; &gt; -----Original Message-----
&gt; &gt; &gt; : &gt; &gt; From: Grant Ingersoll [mailto:gsingers@apache.org]
&gt; &gt; &gt; : &gt; &gt; Sent: Saturday, December 05, 2009 10:26 PM
&gt; &gt; &gt; : &gt; &gt; To: java-dev@lucene.apache.org
&gt; &gt; &gt; : &gt; &gt; Subject: Re: Push fast-vector-highlighter mvn artifacts for 3.0
&gt; &gt; and
&gt; &gt; &gt; 2.9
&gt; &gt; &gt; : &gt; &gt;
&gt; &gt; &gt; : &gt; &gt; I suppose we could put up the artifacts on a dev site and then
&gt; we
&gt; &gt; &gt; could
&gt; &gt; &gt; : &gt; &gt; vote to release both of them pretty quickly.  I think that
&gt; should
&gt; &gt; be
&gt; &gt; &gt; : &gt; easy
&gt; &gt; &gt; : &gt; &gt; to do, since it pretty much only involves verifying the jar and
&gt; &gt; the
&gt; &gt; &gt; : &gt; &gt; signatures.
&gt; &gt; &gt; : &gt; &gt;
&gt; &gt; &gt; : &gt; &gt; On Dec 5, 2009, at 1:03 PM, Simon Willnauer wrote:
&gt; &gt; &gt; : &gt; &gt;
&gt; &gt; &gt; : &gt; &gt; &gt; hi folks,
&gt; &gt; &gt; : &gt; &gt; &gt; The maven artifacts for fast-vector-highlighter have never
&gt; been
&gt; &gt; &gt; pushed
&gt; &gt; &gt; : &gt; &gt; &gt; since it was released because there were no pom.xml.template
&gt; &gt; &gt; inside
&gt; &gt; &gt; : &gt; &gt; &gt; the module. I added a pom file a day ago in the context of
&gt; &gt; &gt; : &gt; &gt; &gt; LUCENE-2107. I already talked to uwe and grant how to deal
&gt; with
&gt; &gt; &gt; this
&gt; &gt; &gt; : &gt; &gt; &gt; issues and if we should push the artifact for Lucene 2.9 /
&gt; 3.0.
&gt; &gt; &gt; Since
&gt; &gt; &gt; : &gt; &gt; &gt; this is only a metadata file we could consider rebuilding
the
&gt; &gt; &gt; : &gt; &gt; &gt; artefacts and publish them for those releases. I can not
&gt; &gt; remember
&gt; &gt; &gt; that
&gt; &gt; &gt; : &gt; &gt; &gt; anything like that happened before, so we should discuss how
&gt; to
&gt; &gt; &gt; deal
&gt; &gt; &gt; : &gt; &gt; &gt; with this situation and if we should wait until 3.1.
&gt; &gt; &gt; : &gt; &gt; &gt;
&gt; &gt; &gt; : &gt; &gt; &gt; simon
&gt; &gt; &gt; : &gt; &gt; &gt;
&gt; &gt; &gt; : &gt; &gt; &gt; --------------------------------------------------------------
&gt; --
&gt; &gt; --
&gt; &gt; &gt; ---
&gt; &gt; &gt; : &gt; &gt; &gt; To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
&gt; &gt; &gt; : &gt; &gt; &gt; For additional commands, e-mail: java-dev-
&gt; help@lucene.apache.org
&gt; &gt; &gt; : &gt; &gt; &gt;
&gt; &gt; &gt; : &gt; &gt;
&gt; &gt; &gt; : &gt; &gt;
&gt; &gt; &gt; : &gt; &gt;
&gt; &gt; &gt; : &gt; &gt; ----------------------------------------------------------------
&gt; --
&gt; &gt; --
&gt; &gt; &gt; -
&gt; &gt; &gt; : &gt; &gt; To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
&gt; &gt; &gt; : &gt; &gt; For additional commands, e-mail: java-dev-help@lucene.apache.org
&gt; &gt; &gt; : &gt;
&gt; &gt; &gt; : &gt;
&gt; &gt; &gt; : &gt;
&gt; &gt; &gt; : &gt; ------------------------------------------------------------------
&gt; --
&gt; &gt; -
&gt; &gt; &gt; : &gt; To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
&gt; &gt; &gt; : &gt; For additional commands, e-mail: java-dev-help@lucene.apache.org
&gt; &gt; &gt; :
&gt; &gt; &gt; :
&gt; &gt; &gt; :
&gt; &gt; &gt; : --------------------------------------------------------------------
&gt; -
&gt; &gt; &gt; : To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
&gt; &gt; &gt; : For additional commands, e-mail: java-dev-help@lucene.apache.org
&gt; &gt; &gt; :
&gt; &gt; &gt;
&gt; &gt; &gt;
&gt; &gt; &gt;
&gt; &gt; &gt; -Hoss
&gt; &gt; &gt;
&gt; &gt; &gt;
&gt; &gt; &gt; ---------------------------------------------------------------------
&gt; &gt; &gt; To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
&gt; &gt; &gt; For additional commands, e-mail: java-dev-help@lucene.apache.org
&gt; &gt;
&gt; &gt;
&gt; &gt;
&gt; &gt; ---------------------------------------------------------------------
&gt; &gt; To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
&gt; &gt; For additional commands, e-mail: java-dev-help@lucene.apache.org
&gt; 



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org



</pre>
</div>
</content>
</entry>
<entry>
<title>Re: HOW to do date range searchi in 3.0</title>
<author><name>Weiwei Wang &lt;ww.wang.cs@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-java-user/200912.mbox/%3c7d94dcde0912082312k45ab557cj9c44caa694506f40@mail.gmail.com%3e"/>
<id>urn:uuid:%3c7d94dcde0912082312k45ab557cj9c44caa694506f40@mail-gmail-com%3e</id>
<updated>2009-12-09T07:12:21Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Thanks, Uwe. I've found the problem. the updateTime field is lost when i
converted my index from an older version.

Another question, is there any detailed tutorial about Lucene 3.0.0?

2009/12/9 Uwe Schindler &lt;uwe@thetaphi.de&gt;

&gt; How did you index your date?
&gt;
&gt; I would suggest to reindex the date using NumericField! And then query
&gt; using
&gt; NumericRangeQuery. If reindexing is not possible the Query like you have
&gt; done, should work. Please give us examples of how you indexed and how you
&gt; query.
&gt;
&gt; Uwe
&gt;
&gt; -----
&gt; Uwe Schindler
&gt; H.-H.-Meier-Allee 63, D-28213 Bremen
&gt; http://www.thetaphi.de
&gt; eMail: uwe@thetaphi.de
&gt;
&gt; &gt; -----Original Message-----
&gt; &gt; From: Weiwei Wang [mailto:ww.wang.cs@gmail.com]
&gt; &gt; Sent: Wednesday, December 09, 2009 4:23 AM
&gt; &gt; To: java-user@lucene.apache.org
&gt; &gt; Subject: HOW to do date range searchi in 3.0
&gt; &gt;
&gt; &gt; Hi, all
&gt; &gt;      I need to do a date range search like date:[a previous time to null]
&gt; &gt; I used a filter to do this job, the code  is shown below:
&gt; &gt;     Calendar c = Calendar.getInstance();
&gt; &gt;     c.setTimeInMillis(c.getTimeInMillis() -
&gt; &gt; parameter.getRecentUpdateConstraint()
&gt; &gt;         * RosaCrawlerConstants.ONE_DAY_IN_MILLISECOND);
&gt; &gt;     String fromTime = DateTools.dateToString(c.getTime(),
&gt; &gt; DateTools.Resolution.DAY);
&gt; &gt;     Query updateTimeRange = new
&gt; &gt; TermRangeQuery("updateTime",fromTime,null,true,false);
&gt; &gt;     query.add(updateTimeRange, BooleanClause.Occur.MUST);
&gt; &gt;
&gt; &gt; However, it doesn't work as before in version 2.4.1(I'm updating my
&gt; &gt; project
&gt; &gt; from version 2.4.1 to lucenen 3.0.0)
&gt; &gt;
&gt; &gt; Could anybody here offer me a solution?
&gt; &gt; --
&gt; &gt; Weiwei Wang
&gt; &gt; Alex Wang
&gt; &gt; ēˇ‹å·¨å·¨
&gt; &gt; Room 403, Mengmin Wei Building
&gt; &gt; Computer Science Department
&gt; &gt; Gulou Campus of Nanjing University
&gt; &gt; Nanjing, P.R.China, 210093
&gt; &gt;
&gt; &gt; Homepage: http://cs.nju.edu.cn/rl/weiweiwang
&gt;
&gt;
&gt; ---------------------------------------------------------------------
&gt; To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
&gt; For additional commands, e-mail: java-user-help@lucene.apache.org
&gt;
&gt;


-- 
Weiwei Wang
Alex Wang
ēˇ‹å·¨å·¨
Room 403, Mengmin Wei Building
Computer Science Department
Gulou Campus of Nanjing University
Nanjing, P.R.China, 210093

Homepage: http://cs.nju.edu.cn/rl/weiweiwang


</pre>
</div>
</content>
</entry>
<entry>
<title>RE: HOW to do date range searchi in 3.0</title>
<author><name>&quot;Uwe Schindler&quot; &lt;uwe@thetaphi.de&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-java-user/200912.mbox/%3c6F73E5EFE6B9474AAF68829D340E3944@VEGA%3e"/>
<id>urn:uuid:%3c6F73E5EFE6B9474AAF68829D340E3944@VEGA%3e</id>
<updated>2009-12-09T07:09:24Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
How did you index your date?

I would suggest to reindex the date using NumericField! And then query using
NumericRangeQuery. If reindexing is not possible the Query like you have
done, should work. Please give us examples of how you indexed and how you
query.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

&gt; -----Original Message-----
&gt; From: Weiwei Wang [mailto:ww.wang.cs@gmail.com]
&gt; Sent: Wednesday, December 09, 2009 4:23 AM
&gt; To: java-user@lucene.apache.org
&gt; Subject: HOW to do date range searchi in 3.0
&gt; 
&gt; Hi, all
&gt;      I need to do a date range search like date:[a previous time to null]
&gt; I used a filter to do this job, the code  is shown below:
&gt;     Calendar c = Calendar.getInstance();
&gt;     c.setTimeInMillis(c.getTimeInMillis() -
&gt; parameter.getRecentUpdateConstraint()
&gt;         * RosaCrawlerConstants.ONE_DAY_IN_MILLISECOND);
&gt;     String fromTime = DateTools.dateToString(c.getTime(),
&gt; DateTools.Resolution.DAY);
&gt;     Query updateTimeRange = new
&gt; TermRangeQuery("updateTime",fromTime,null,true,false);
&gt;     query.add(updateTimeRange, BooleanClause.Occur.MUST);
&gt; 
&gt; However, it doesn't work as before in version 2.4.1(I'm updating my
&gt; project
&gt; from version 2.4.1 to lucenen 3.0.0)
&gt; 
&gt; Could anybody here offer me a solution?
&gt; --
&gt; Weiwei Wang
&gt; Alex Wang
&gt; $B2&amp;V[V[(B
&gt; Room 403, Mengmin Wei Building
&gt; Computer Science Department
&gt; Gulou Campus of Nanjing University
&gt; Nanjing, P.R.China, 210093
&gt; 
&gt; Homepage: http://cs.nju.edu.cn/rl/weiweiwang


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org



</pre>
</div>
</content>
</entry>
<entry>
<title>Re: FileNotFoundException on index</title>
<author><name>Max Lynch &lt;ihasmax@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-java-user/200912.mbox/%3c3836ec640912082256n33210772j3a8840b51f7a6737@mail.gmail.com%3e"/>
<id>urn:uuid:%3c3836ec640912082256n33210772j3a8840b51f7a6737@mail-gmail-com%3e</id>
<updated>2009-12-09T06:56:05Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Hi Mike,

Missed your response on this,
What I was doing was physically removing index/write.lock if older than 8
hours, allowing another process of my indexer to run.  I realize in
hindsight that there is no reason why I should be doing this and it was
really stupid.  I think I was under the impression one of my pylucene
processes was hanging.

On Fri, Oct 9, 2009 at 3:44 AM, Michael McCandless &lt;
lucene@mikemccandless.com&gt; wrote:

&gt; You can use o.a.l.index.CheckIndex to fix the index.  It will remove
&gt; references to any segments that are missing or have problems during
&gt; testing.  First run it without -fix to see what problems there are.
&gt; Then take a backup of the index.  Then run it with -fix.  The index
&gt; will lose all docs in those segments that it removes.
&gt;
&gt; Can you describe what led up to this?  Is it repeatable?
&gt;
&gt; Mike
&gt;
&gt; On Fri, Oct 9, 2009 at 12:37 AM, Max Lynch &lt;ihasmax@gmail.com&gt; wrote:
&gt; &gt; Missed your response, thanks Bernd.
&gt; &gt;
&gt; &gt; I don't think that's it, since I haven't been executing any commands like
&gt; &gt; that.  The only thing I could think of is corruption.  I've got the index
&gt; &gt; backed up in case there is a way to fix it (it won't matter in a week or
&gt; so
&gt; &gt; since I cull any documents older than 25 days).
&gt; &gt;
&gt; &gt; Is there a way to fix this?
&gt; &gt;
&gt; &gt; Thanks.
&gt; &gt;
&gt; &gt; On Thu, Oct 8, 2009 at 3:01 AM, Bernd Fondermann &lt;
&gt; &gt; bernd.fondermann@googlemail.com&gt; wrote:
&gt; &gt;
&gt; &gt;&gt; Hi Max
&gt; &gt;&gt;
&gt; &gt;&gt; just a guess: maybe you deleted all *.c source files in that area and
&gt; &gt;&gt; unintentionally deleted this index file, too.
&gt; &gt;&gt;
&gt; &gt;&gt;  Bernd
&gt; &gt;&gt;
&gt; &gt;&gt; On Fri, Oct 2, 2009 at 17:10, Max Lynch &lt;ihasmax@gmail.com&gt; wrote:
&gt; &gt;&gt; &gt; I'm getting this error when I try to run my searcher and my indexer:
&gt; &gt;&gt; &gt;
&gt; &gt;&gt; &gt; Traceback (most recent call last):
&gt; &gt;&gt; &gt; self.searcher = lucene.IndexSearcher(self.directory)
&gt; &gt;&gt; &gt; JavaError: java.io.FileNotFoundException:
&gt; &gt;&gt; /home/spider/misc/index/_275c.cfs
&gt; &gt;&gt; &gt; (No such file or directory)
&gt; &gt;&gt; &gt;
&gt; &gt;&gt; &gt; I don't know anything about the format of the Lucene index, but I
&gt; notice
&gt; &gt;&gt; I
&gt; &gt;&gt; &gt; have several _275* files from b to j but no c.
&gt; &gt;&gt; &gt;
&gt; &gt;&gt; &gt; Any ideas?
&gt; &gt;&gt; &gt;
&gt; &gt;&gt; &gt; Thanks.
&gt; &gt;&gt; &gt;
&gt; &gt;&gt;
&gt; &gt;&gt; ---------------------------------------------------------------------
&gt; &gt;&gt; To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
&gt; &gt;&gt; For additional commands, e-mail: java-user-help@lucene.apache.org
&gt; &gt;&gt;
&gt; &gt;&gt;
&gt; &gt;
&gt;
&gt; ---------------------------------------------------------------------
&gt; To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
&gt; For additional commands, e-mail: java-user-help@lucene.apache.org
&gt;
&gt;


</pre>
</div>
</content>
</entry>
<entry>
<title>HOW to do date range searchi in 3.0</title>
<author><name>Weiwei Wang &lt;ww.wang.cs@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-java-user/200912.mbox/%3c7d94dcde0912081923t36af687eycce4cedd5e85e047@mail.gmail.com%3e"/>
<id>urn:uuid:%3c7d94dcde0912081923t36af687eycce4cedd5e85e047@mail-gmail-com%3e</id>
<updated>2009-12-09T03:23:02Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Hi, all
     I need to do a date range search like date:[a previous time to null]
I used a filter to do this job, the code  is shown below:
    Calendar c = Calendar.getInstance();
    c.setTimeInMillis(c.getTimeInMillis() -
parameter.getRecentUpdateConstraint()
        * RosaCrawlerConstants.ONE_DAY_IN_MILLISECOND);
    String fromTime = DateTools.dateToString(c.getTime(),
DateTools.Resolution.DAY);
    Query updateTimeRange = new
TermRangeQuery("updateTime",fromTime,null,true,false);
    query.add(updateTimeRange, BooleanClause.Occur.MUST);

However, it doesn't work as before in version 2.4.1(I'm updating my project
from version 2.4.1 to lucenen 3.0.0)

Could anybody here offer me a solution?
-- 
Weiwei Wang
Alex Wang
ēˇ‹å·¨å·¨
Room 403, Mengmin Wei Building
Computer Science Department
Gulou Campus of Nanjing University
Nanjing, P.R.China, 210093

Homepage: http://cs.nju.edu.cn/rl/weiweiwang


</pre>
</div>
</content>
</entry>
<entry>
<title>NearSpansUnordered payloads not returning all the time</title>
<author><name>Jason Rutherglen &lt;jason.rutherglen@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-java-user/200912.mbox/%3c85d3c3b60912081634j4bfa8cbdm3a3ad5286c3a9b70@mail.gmail.com%3e"/>
<id>urn:uuid:%3c85d3c3b60912081634j4bfa8cbdm3a3ad5286c3a9b70@mail-gmail-com%3e</id>
<updated>2009-12-09T00:34:07Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Howdy,

I am wondering if anyone has seen
NearSpansUnordered.getPayload() not return payloads that are
verifiably accessible via IR.termPositions? It's a bit confusing
because most of the time they're returned properly.

I suspect the payload logic gets tripped up in
NearSpansUnordered. I'll put together a test case, however the
difficulty is that we're only seeing the issue with largish 800
MB indexes, which could make the test case a little crazy.

Jason

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org



</pre>
</div>
</content>
</entry>
<entry>
<title>RE: TopFieldDocCollector and v3.0.0</title>
<author><name>&quot;Uwe Schindler&quot; &lt;uwe@thetaphi.de&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-java-user/200912.mbox/%3c336011D970CD40A6957FF6840E538E26@VEGA%3e"/>
<id>urn:uuid:%3c336011D970CD40A6957FF6840E538E26@VEGA%3e</id>
<updated>2009-12-08T21:43:53Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Sorry wrong word, Germans often have the problem with English "must". It has
to be to be "but you must not".

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


&gt; -----Original Message-----
&gt; From: Steven A Rowe [mailto:sarowe@syr.edu]
&gt; Sent: Tuesday, December 08, 2009 8:42 PM
&gt; To: java-user@lucene.apache.org
&gt; Subject: RE: TopFieldDocCollector and v3.0.0
&gt; 
&gt; Hi Uwe,
&gt; 
&gt; On 12/08/2009 at 9:40 AM, Uwe Schindler wrote:
&gt; &gt; After the move to 3.0, you can (but you must not) further update
&gt; &gt; your code to use generics, which is not really needed but will
&gt; &gt; remove all compiler warnings.
&gt; 
&gt; This sounds like you're telling people that although they are able to
&gt; update their code to use generics, it is forbidden.
&gt; 
&gt; I'm sure, though, that you mean that they are not required to do so:
&gt; something like "but you need not" rather than "but you must not".
&gt; 
&gt; Steve
&gt; 
&gt; ---------------------------------------------------------------------
&gt; To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
&gt; For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org



</pre>
</div>
</content>
</entry>
<entry>
<title>Re: question related to Indexing</title>
<author><name>Phanindra Reva &lt;reva.phanindra@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-java-user/200912.mbox/%3cdc3d8a2a0912081241l1368bb6ahbe894247e8e72a3e@mail.gmail.com%3e"/>
<id>urn:uuid:%3cdc3d8a2a0912081241l1368bb6ahbe894247e8e72a3e@mail-gmail-com%3e</id>
<updated>2009-12-08T20:41:48Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Hello Tom and Erick,
                          I am really sorry for posting such a dull
question. Meanwhile I have explored a few other parts of API,
fortunately I have found a place which could exaclty fit for my case.
Thanks for patiently trying to understand my question.. and warning
me.
Bye.

On Tue, Dec 8, 2009 at 9:21 PM, Tom Hill &lt;solr-list@worldware.com&gt; wrote:
&gt; If you tell us WHY you want to do this, rather than HOW you want to do it,
&gt; the chances are much better that someone can help.
&gt;
&gt; What's the business motivation here?  What does the end user want to
&gt; achieve?
&gt;
&gt; Tom
&gt;
&gt; On Tue, Dec 8, 2009 at 8:16 AM, Phanindra Reva &lt;reva.phanindra@gmail.com&gt;wrote:
&gt;
&gt;&gt; Hello,
&gt;&gt;        Thanks for the reply. *strange* was expected. I am trying to
&gt;&gt; store field names as payloads, so I need unedited field names during
&gt;&gt; analysis part. And later my plan is to replace all the field names
&gt;&gt; with a default value and then store the document in the index. So, If
&gt;&gt; its possible to get the reference of the Document after the analysis,
&gt;&gt; I could modify all the field names. Even a way of modifying ( of
&gt;&gt; course , it should be after analysis and before adding to the index )
&gt;&gt; the field-name values that are going to be added to the index will
&gt;&gt; suffice.
&gt;&gt;     I guess.. this time you feel its much more strange, but that's my
&gt;&gt; task for which above mentioned is one way to accomplish.
&gt;&gt; Thanks.
&gt;&gt;
&gt;&gt; On Tue, Dec 8, 2009 at 4:55 PM, Erick Erickson &lt;erickerickson@gmail.com&gt;
&gt;&gt; wrote:
&gt;&gt; &gt; You're right, it *does* seem strange &lt;G&gt;....
&gt;&gt; &gt;
&gt;&gt; &gt; I'm having a really hard time imagining a use-case
&gt;&gt; &gt; for this capability, so it's hard to suggest
&gt;&gt; &gt; an approach. Perhaps you could supply
&gt;&gt; &gt; an outline of your use-case? This may be
&gt;&gt; &gt; an XY problem.
&gt;&gt; &gt;
&gt;&gt; &gt; Best
&gt;&gt; &gt; Erick
&gt;&gt; &gt;
&gt;&gt; &gt; On Tue, Dec 8, 2009 at 10:12 AM, Phanindra Reva &lt;
&gt;&gt; reva.phanindra@gmail.com&gt;wrote:
&gt;&gt; &gt;
&gt;&gt; &gt;&gt; Hello All,
&gt;&gt; &gt;&gt;              I am a newbie using Lucene. To be brief, I am just
&gt;&gt; &gt;&gt; wondering whether is there a point where we get the access to the
&gt;&gt; &gt;&gt; org.apache.lucene.document.Document (which is being indexed at the
&gt;&gt; &gt;&gt; moment)  after the analysing part is completed but exactly before it
&gt;&gt; &gt;&gt; is added to the index. My whole aim is to modify all the field names
&gt;&gt; &gt;&gt; present in the document before its being added to the index, but I
&gt;&gt; &gt;&gt; need those field names un-edited during the analysis part.
&gt;&gt; &gt;&gt;   If it seems strange.. please don't mind.
&gt;&gt; &gt;&gt; Thanks.
&gt;&gt; &gt;&gt;
&gt;&gt; &gt;&gt; ---------------------------------------------------------------------
&gt;&gt; &gt;&gt; To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
&gt;&gt; &gt;&gt; For additional commands, e-mail: java-user-help@lucene.apache.org
&gt;&gt; &gt;&gt;
&gt;&gt; &gt;&gt;
&gt;&gt; &gt;
&gt;&gt;
&gt;&gt; ---------------------------------------------------------------------
&gt;&gt; To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
&gt;&gt; For additional commands, e-mail: java-user-help@lucene.apache.org
&gt;&gt;
&gt;&gt;
&gt;

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org



</pre>
</div>
</content>
</entry>
<entry>
<title>Re: question related to Indexing</title>
<author><name>Tom Hill &lt;solr-list@worldware.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-java-user/200912.mbox/%3c660086cb0912081221r3fec9dcbw3bf8d4b4a64ea212@mail.gmail.com%3e"/>
<id>urn:uuid:%3c660086cb0912081221r3fec9dcbw3bf8d4b4a64ea212@mail-gmail-com%3e</id>
<updated>2009-12-08T20:21:36Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
If you tell us WHY you want to do this, rather than HOW you want to do it,
the chances are much better that someone can help.

What's the business motivation here?  What does the end user want to
achieve?

Tom

On Tue, Dec 8, 2009 at 8:16 AM, Phanindra Reva &lt;reva.phanindra@gmail.com&gt;wrote:

&gt; Hello,
&gt;        Thanks for the reply. *strange* was expected. I am trying to
&gt; store field names as payloads, so I need unedited field names during
&gt; analysis part. And later my plan is to replace all the field names
&gt; with a default value and then store the document in the index. So, If
&gt; its possible to get the reference of the Document after the analysis,
&gt; I could modify all the field names. Even a way of modifying ( of
&gt; course , it should be after analysis and before adding to the index )
&gt; the field-name values that are going to be added to the index will
&gt; suffice.
&gt;     I guess.. this time you feel its much more strange, but that's my
&gt; task for which above mentioned is one way to accomplish.
&gt; Thanks.
&gt;
&gt; On Tue, Dec 8, 2009 at 4:55 PM, Erick Erickson &lt;erickerickson@gmail.com&gt;
&gt; wrote:
&gt; &gt; You're right, it *does* seem strange &lt;G&gt;....
&gt; &gt;
&gt; &gt; I'm having a really hard time imagining a use-case
&gt; &gt; for this capability, so it's hard to suggest
&gt; &gt; an approach. Perhaps you could supply
&gt; &gt; an outline of your use-case? This may be
&gt; &gt; an XY problem.
&gt; &gt;
&gt; &gt; Best
&gt; &gt; Erick
&gt; &gt;
&gt; &gt; On Tue, Dec 8, 2009 at 10:12 AM, Phanindra Reva &lt;
&gt; reva.phanindra@gmail.com&gt;wrote:
&gt; &gt;
&gt; &gt;&gt; Hello All,
&gt; &gt;&gt;              I am a newbie using Lucene. To be brief, I am just
&gt; &gt;&gt; wondering whether is there a point where we get the access to the
&gt; &gt;&gt; org.apache.lucene.document.Document (which is being indexed at the
&gt; &gt;&gt; moment)  after the analysing part is completed but exactly before it
&gt; &gt;&gt; is added to the index. My whole aim is to modify all the field names
&gt; &gt;&gt; present in the document before its being added to the index, but I
&gt; &gt;&gt; need those field names un-edited during the analysis part.
&gt; &gt;&gt;   If it seems strange.. please don't mind.
&gt; &gt;&gt; Thanks.
&gt; &gt;&gt;
&gt; &gt;&gt; ---------------------------------------------------------------------
&gt; &gt;&gt; To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
&gt; &gt;&gt; For additional commands, e-mail: java-user-help@lucene.apache.org
&gt; &gt;&gt;
&gt; &gt;&gt;
&gt; &gt;
&gt;
&gt; ---------------------------------------------------------------------
&gt; To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
&gt; For additional commands, e-mail: java-user-help@lucene.apache.org
&gt;
&gt;


</pre>
</div>
</content>
</entry>
<entry>
<title>RE: TopFieldDocCollector and v3.0.0</title>
<author><name>Steven A Rowe &lt;sarowe@syr.edu&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-java-user/200912.mbox/%3c2D127F11DC79714E9B6A43AC9458147F3276B237@suex07-mbx-03.ad.syr.edu%3e"/>
<id>urn:uuid:%3c2D127F11DC79714E9B6A43AC9458147F3276B237@suex07-mbx-03-ad-syr-edu%3e</id>
<updated>2009-12-08T19:41:34Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Hi Uwe,

On 12/08/2009 at 9:40 AM, Uwe Schindler wrote:
&gt; After the move to 3.0, you can (but you must not) further update
&gt; your code to use generics, which is not really needed but will
&gt; remove all compiler warnings.

This sounds like you're telling people that although they are able to update their code to
use generics, it is forbidden.

I'm sure, though, that you mean that they are not required to do so: something like "but you
need not" rather than "but you must not".

Steve

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org



</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Problem searching field with % as value</title>
<author><name>Ian Lea &lt;ian.lea@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-java-user/200912.mbox/%3c8c4e68610912080918r1b77afb0t70715111e0338dd8@mail.gmail.com%3e"/>
<id>urn:uuid:%3c8c4e68610912080918r1b77afb0t70715111e0338dd8@mail-gmail-com%3e</id>
<updated>2009-12-08T17:18:23Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
If you store the field unanalyzed it will be indexed as is.  You can
then search for it via a TermQuery, or use QueryParser with
PerFieldAnalyzerWrapper specifying KeywordAnalyzer for the field
containing this character.

Another approach is to replace the % with something easier to work
with.  You could do this yourself or, I think, with MappingCharFilter.

Personally I'd probably replace "%" with "percent" somewhere in my
code with a simple string replacement.


--
Ian.



On Tue, Dec 8, 2009 at 5:02 PM, kanayo &lt;richies4all@gmail.com&gt; wrote:
&gt;
&gt; Thanks for your reply Erick.
&gt;
&gt; In Luke, its also not working. I tried to retrieving values from the field
&gt; by specifying the field as the search field and then specify % as the search
&gt; parameter which using StandardAnalyzer but nothing is displayed. Also while
&gt; Luke shows the query details for other search values, it dosent show query
&gt; details for search value of %.
&gt;
&gt; I think it is not Tokenized in the index. Is there anything else i can do to
&gt; be able to retrieve values from fields comprising of just %?
&gt;
&gt; Thanks for your assistance.
&gt;
&gt;
&gt; Erick Erickson wrote:
&gt;&gt;
&gt;&gt; Try printing out query.toString() to see what's actually being
&gt;&gt; sent to the searcher.
&gt;&gt;
&gt;&gt; You can try the same thing in Luke, specifying StandardAnalyzer
&gt;&gt; to parse queries.
&gt;&gt;
&gt;&gt; Are you sure you're specifying the fields in the query and not just the
&gt;&gt; '%'? That would go against your default field.
&gt;&gt;
&gt;&gt; When you say that you can see the fields in luke, are you storing the
&gt;&gt; field?
&gt;&gt; Because what you may be seeing is the *stored* value rather than the
&gt;&gt; *tokens*.
&gt;&gt; Make sure you're looking at the tokens in Luke..
&gt;&gt;
&gt;&gt; If none of that helps, could you post a code snippet or two (index and
&gt;&gt; query)?
&gt;&gt;
&gt;&gt; Best
&gt;&gt; Erick
&gt;&gt;
&gt;&gt; On Tue, Dec 8, 2009 at 11:04 AM, kanayo &lt;richies4all@gmail.com&gt; wrote:
&gt;&gt;
&gt;&gt;&gt;
&gt;&gt;&gt; Hi,
&gt;&gt;&gt;
&gt;&gt;&gt; I am a newbie to lucene. I am using Standard Analyzer in my lucene
&gt;&gt;&gt; project.
&gt;&gt;&gt; I am indexing some fields which may contain only "%" as a field value and
&gt;&gt;&gt; it
&gt;&gt;&gt; indexes fine and i can view the value against the field in the index
&gt;&gt;&gt; using
&gt;&gt;&gt; Luke.
&gt;&gt;&gt;
&gt;&gt;&gt; However when i try to retrieve the same field using indexsearcher and
&gt;&gt;&gt; passing "%" as a query parameter nothing is retrieved. It is simply being
&gt;&gt;&gt; ignored. I have also tried to escape the "%" while searching but still no
&gt;&gt;&gt; results.
&gt;&gt;&gt;
&gt;&gt;&gt; Is there anything am not doing right?
&gt;&gt;&gt;
&gt;&gt;&gt; Thanks in advance for your assistance.
&gt;&gt;&gt;
&gt;&gt;&gt;
&gt;&gt;&gt; --
&gt;&gt;&gt; View this message in context:
&gt;&gt;&gt; http://old.nabble.com/Problem-searching-field-with---as-value-tp26696184p26696184.html
&gt;&gt;&gt; Sent from the Lucene - Java Users mailing list archive at Nabble.com.
&gt;&gt;&gt;
&gt;&gt;&gt;
&gt;&gt;&gt; ---------------------------------------------------------------------
&gt;&gt;&gt; To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
&gt;&gt;&gt; For additional commands, e-mail: java-user-help@lucene.apache.org
&gt;&gt;&gt;
&gt;&gt;&gt;
&gt;&gt;
&gt;&gt;
&gt;
&gt; --
&gt; View this message in context: http://old.nabble.com/Problem-searching-field-with---as-value-tp26696184p26696993.html
&gt; Sent from the Lucene - Java Users mailing list archive at Nabble.com.
&gt;
&gt;
&gt; ---------------------------------------------------------------------
&gt; To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
&gt; For additional commands, e-mail: java-user-help@lucene.apache.org
&gt;
&gt;

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org



</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Problem searching field with % as value</title>
<author><name>kanayo &lt;richies4all@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-java-user/200912.mbox/%3c26696993.post@talk.nabble.com%3e"/>
<id>urn:uuid:%3c26696993-post@talk-nabble-com%3e</id>
<updated>2009-12-08T17:02:22Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>

Thanks for your reply Erick.

In Luke, its also not working. I tried to retrieving values from the field
by specifying the field as the search field and then specify % as the search
parameter which using StandardAnalyzer but nothing is displayed. Also while
Luke shows the query details for other search values, it dosent show query
details for search value of %.

I think it is not Tokenized in the index. Is there anything else i can do to
be able to retrieve values from fields comprising of just %?

Thanks for your assistance.


Erick Erickson wrote:
&gt; 
&gt; Try printing out query.toString() to see what's actually being
&gt; sent to the searcher.
&gt; 
&gt; You can try the same thing in Luke, specifying StandardAnalyzer
&gt; to parse queries.
&gt; 
&gt; Are you sure you're specifying the fields in the query and not just the
&gt; '%'? That would go against your default field.
&gt; 
&gt; When you say that you can see the fields in luke, are you storing the
&gt; field?
&gt; Because what you may be seeing is the *stored* value rather than the
&gt; *tokens*.
&gt; Make sure you're looking at the tokens in Luke..
&gt; 
&gt; If none of that helps, could you post a code snippet or two (index and
&gt; query)?
&gt; 
&gt; Best
&gt; Erick
&gt; 
&gt; On Tue, Dec 8, 2009 at 11:04 AM, kanayo &lt;richies4all@gmail.com&gt; wrote:
&gt; 
&gt;&gt;
&gt;&gt; Hi,
&gt;&gt;
&gt;&gt; I am a newbie to lucene. I am using Standard Analyzer in my lucene
&gt;&gt; project.
&gt;&gt; I am indexing some fields which may contain only "%" as a field value and
&gt;&gt; it
&gt;&gt; indexes fine and i can view the value against the field in the index
&gt;&gt; using
&gt;&gt; Luke.
&gt;&gt;
&gt;&gt; However when i try to retrieve the same field using indexsearcher and
&gt;&gt; passing "%" as a query parameter nothing is retrieved. It is simply being
&gt;&gt; ignored. I have also tried to escape the "%" while searching but still no
&gt;&gt; results.
&gt;&gt;
&gt;&gt; Is there anything am not doing right?
&gt;&gt;
&gt;&gt; Thanks in advance for your assistance.
&gt;&gt;
&gt;&gt;
&gt;&gt; --
&gt;&gt; View this message in context:
&gt;&gt; http://old.nabble.com/Problem-searching-field-with---as-value-tp26696184p26696184.html
&gt;&gt; Sent from the Lucene - Java Users mailing list archive at Nabble.com.
&gt;&gt;
&gt;&gt;
&gt;&gt; ---------------------------------------------------------------------
&gt;&gt; To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
&gt;&gt; For additional commands, e-mail: java-user-help@lucene.apache.org
&gt;&gt;
&gt;&gt;
&gt; 
&gt; 

-- 
View this message in context: http://old.nabble.com/Problem-searching-field-with---as-value-tp26696184p26696993.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org



</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Problem searching field with % as value</title>
<author><name>Erick Erickson &lt;erickerickson@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-java-user/200912.mbox/%3c359a92830912080820yed97137n2bc1ed9c4e79a120@mail.gmail.com%3e"/>
<id>urn:uuid:%3c359a92830912080820yed97137n2bc1ed9c4e79a120@mail-gmail-com%3e</id>
<updated>2009-12-08T16:20:01Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Try printing out query.toString() to see what's actually being
sent to the searcher.

You can try the same thing in Luke, specifying StandardAnalyzer
to parse queries.

Are you sure you're specifying the fields in the query and not just the
'%'? That would go against your default field.

When you say that you can see the fields in luke, are you storing the field?
Because what you may be seeing is the *stored* value rather than the
*tokens*.
Make sure you're looking at the tokens in Luke..

If none of that helps, could you post a code snippet or two (index and
query)?

Best
Erick

On Tue, Dec 8, 2009 at 11:04 AM, kanayo &lt;richies4all@gmail.com&gt; wrote:

&gt;
&gt; Hi,
&gt;
&gt; I am a newbie to lucene. I am using Standard Analyzer in my lucene project.
&gt; I am indexing some fields which may contain only "%" as a field value and
&gt; it
&gt; indexes fine and i can view the value against the field in the index using
&gt; Luke.
&gt;
&gt; However when i try to retrieve the same field using indexsearcher and
&gt; passing "%" as a query parameter nothing is retrieved. It is simply being
&gt; ignored. I have also tried to escape the "%" while searching but still no
&gt; results.
&gt;
&gt; Is there anything am not doing right?
&gt;
&gt; Thanks in advance for your assistance.
&gt;
&gt;
&gt; --
&gt; View this message in context:
&gt; http://old.nabble.com/Problem-searching-field-with---as-value-tp26696184p26696184.html
&gt; Sent from the Lucene - Java Users mailing list archive at Nabble.com.
&gt;
&gt;
&gt; ---------------------------------------------------------------------
&gt; To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
&gt; For additional commands, e-mail: java-user-help@lucene.apache.org
&gt;
&gt;


</pre>
</div>
</content>
</entry>
<entry>
<title>Re: question related to Indexing</title>
<author><name>Phanindra Reva &lt;reva.phanindra@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-java-user/200912.mbox/%3cdc3d8a2a0912080816j31b92663tc0eac853bfdce861@mail.gmail.com%3e"/>
<id>urn:uuid:%3cdc3d8a2a0912080816j31b92663tc0eac853bfdce861@mail-gmail-com%3e</id>
<updated>2009-12-08T16:16:25Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Hello,
        Thanks for the reply. *strange* was expected. I am trying to
store field names as payloads, so I need unedited field names during
analysis part. And later my plan is to replace all the field names
with a default value and then store the document in the index. So, If
its possible to get the reference of the Document after the analysis,
I could modify all the field names. Even a way of modifying ( of
course , it should be after analysis and before adding to the index )
the field-name values that are going to be added to the index will
suffice.
     I guess.. this time you feel its much more strange, but that's my
task for which above mentioned is one way to accomplish.
Thanks.

On Tue, Dec 8, 2009 at 4:55 PM, Erick Erickson &lt;erickerickson@gmail.com&gt; wrote:
&gt; You're right, it *does* seem strange &lt;G&gt;....
&gt;
&gt; I'm having a really hard time imagining a use-case
&gt; for this capability, so it's hard to suggest
&gt; an approach. Perhaps you could supply
&gt; an outline of your use-case? This may be
&gt; an XY problem.
&gt;
&gt; Best
&gt; Erick
&gt;
&gt; On Tue, Dec 8, 2009 at 10:12 AM, Phanindra Reva &lt;reva.phanindra@gmail.com&gt;wrote:
&gt;
&gt;&gt; Hello All,
&gt;&gt;              I am a newbie using Lucene. To be brief, I am just
&gt;&gt; wondering whether is there a point where we get the access to the
&gt;&gt; org.apache.lucene.document.Document (which is being indexed at the
&gt;&gt; moment)  after the analysing part is completed but exactly before it
&gt;&gt; is added to the index. My whole aim is to modify all the field names
&gt;&gt; present in the document before its being added to the index, but I
&gt;&gt; need those field names un-edited during the analysis part.
&gt;&gt;   If it seems strange.. please don't mind.
&gt;&gt; Thanks.
&gt;&gt;
&gt;&gt; ---------------------------------------------------------------------
&gt;&gt; To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
&gt;&gt; For additional commands, e-mail: java-user-help@lucene.apache.org
&gt;&gt;
&gt;&gt;
&gt;

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org



</pre>
</div>
</content>
</entry>
<entry>
<title>Problem searching field with % as value</title>
<author><name>kanayo &lt;richies4all@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-java-user/200912.mbox/%3c26696184.post@talk.nabble.com%3e"/>
<id>urn:uuid:%3c26696184-post@talk-nabble-com%3e</id>
<updated>2009-12-08T16:04:42Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>

Hi,

I am a newbie to lucene. I am using Standard Analyzer in my lucene project. 
I am indexing some fields which may contain only "%" as a field value and it
indexes fine and i can view the value against the field in the index using
Luke.

However when i try to retrieve the same field using indexsearcher and
passing "%" as a query parameter nothing is retrieved. It is simply being
ignored. I have also tried to escape the "%" while searching but still no
results.

Is there anything am not doing right?

Thanks in advance for your assistance.


-- 
View this message in context: http://old.nabble.com/Problem-searching-field-with---as-value-tp26696184p26696184.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org



</pre>
</div>
</content>
</entry>
<entry>
<title>Re: question related to Indexing</title>
<author><name>Erick Erickson &lt;erickerickson@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-java-user/200912.mbox/%3c359a92830912080755t53ed8447v6eb5b753a8282965@mail.gmail.com%3e"/>
<id>urn:uuid:%3c359a92830912080755t53ed8447v6eb5b753a8282965@mail-gmail-com%3e</id>
<updated>2009-12-08T15:55:52Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
You're right, it *does* seem strange &lt;G&gt;....

I'm having a really hard time imagining a use-case
for this capability, so it's hard to suggest
an approach. Perhaps you could supply
an outline of your use-case? This may be
an XY problem.

Best
Erick

On Tue, Dec 8, 2009 at 10:12 AM, Phanindra Reva &lt;reva.phanindra@gmail.com&gt;wrote:

&gt; Hello All,
&gt;              I am a newbie using Lucene. To be brief, I am just
&gt; wondering whether is there a point where we get the access to the
&gt; org.apache.lucene.document.Document (which is being indexed at the
&gt; moment)  after the analysing part is completed but exactly before it
&gt; is added to the index. My whole aim is to modify all the field names
&gt; present in the document before its being added to the index, but I
&gt; need those field names un-edited during the analysis part.
&gt;   If it seems strange.. please don't mind.
&gt; Thanks.
&gt;
&gt; ---------------------------------------------------------------------
&gt; To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
&gt; For additional commands, e-mail: java-user-help@lucene.apache.org
&gt;
&gt;


</pre>
</div>
</content>
</entry>
</feed>
