lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: EnwikiDocMaker
Date Wed, 03 Jun 2009 18:31:00 GMT
Shai, make sure you're able to process the full Wikipedia export, ie
you don't hit that weird issue (with Xerces) from LUCENE-1591, that
caused us to switch to the patched version of Xerces.

Mike

On Wed, Jun 3, 2009 at 2:13 PM, Shai Erera <serera@gmail.com> wrote:
> The current benchmark contains xerces-2.9.1-patched-XERCESJ-1257.jar, and
> its build.xml sets the classpath to include all .jar under the lib folder.
> So it looks like it is part of Benchmark.
>
> Maybe you fail to run it outside benchmark because you don't include it in
> your classpath?
>
> Anyway, I'll move to use Java's SAX parser and if all pass, remove the
> Xerces from benchmark as part of LUCENE-1595
>
> Shai
>
> On Wed, Jun 3, 2009 at 7:09 PM, Grant Ingersoll <gsingers@apache.org> wrote:
>>
>> +1
>> Note, Xerces Jar is not in benchmark, AFAICT.  It relies on the fact that
>> Java uses it under the hood.
>> I'm having this really weird situation where I'm using EnwikiDocMaker
>> outside the context of the benchmarker and I'm grasping at straws as to why
>> it is not working.  It seems to be a classpath issue, but is not Lucene
>> related so I'll spare the details.
>> -Grant
>> On Jun 3, 2009, at 5:58 AM, Shai Erera wrote:
>>
>> Then perhaps as part of 1595 I can change it to use Java's XML parser, and
>> test the Enwiki file. If all goes well, we may not need the XERCES jar in
>> benchmark? Anyway, I'll check that too
>>
>> On Wed, Jun 3, 2009 at 1:59 PM, Michael McCandless
>> <lucene@mikemccandless.com> wrote:
>>>
>>> I also don't know why it's specifically using Xerces...
>>>
>>> Mike
>>>
>>> On Wed, Jun 3, 2009 at 4:26 AM, Shai Erera <serera@gmail.com> wrote:
>>> > Grant, note that I'm changing the DocMakers in LUCENE-1595 including
>>> > this
>>> > one. So whatever the decision is following your question, I can do it
>>> > as
>>> > part of this issue, since that code will no longer be in
>>> > EnwikiDocMaker.
>>> >
>>> > Regarding to your question, I don't know why it should depend on Xerces
>>> > (rather than the default Java XML parser I assume?)
>>> >
>>> > Shai
>>> >
>>> > On Wed, Jun 3, 2009 at 2:48 AM, Grant Ingersoll <gsingers@apache.org>
>>> > wrote:
>>> >>
>>> >> Is there a reason the EnwikiDocMaker assumes Xerces for the SAX
>>> >> parser?
>>> >>  Line 96.
>>> >>
>>> >> Thanks,
>>> >> Grant
>>> >>
>>> >> ---------------------------------------------------------------------
>>> >> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> >> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>> >>
>>> >
>>> >
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>
>>
>> --------------------------
>> Grant Ingersoll
>> http://www.lucidimagination.com/
>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
>> Solr/Lucene:
>> http://www.lucidimagination.com/search
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message