lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@syr.edu>
Subject Re: Taking a step back
Date Thu, 11 May 2006 19:56:19 GMT
I mostly just started this thread b/c I thought there should be some 
coordination on the Java side of things given the number of changes 
being proposed.  I agree that we shouldn't be driven by the other ports, 
but I don't think we should totally dismiss them either.  If we can meet 
our goals while still keeping our friends happy, then I am all for it. 

Robert Engels wrote:
> 1) That is my point. In this case, they are not copying the impl, they are
> requesting changes to the format.
>
> I just think there are better ways of doing interoperability than file
> formats. In almost all cases where I've encountered (or built !) systems
> that did integration based on a known file format, it bit me in the ass in
> the end... and/or severely limited the ability of myself or others to
> change...
>
> -----Original Message-----
> From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
> Sent: Thursday, May 11, 2006 2:29 PM
> To: java-dev@lucene.apache.org
> Subject: Re: Taking a step back
>
>
> I don't want to get into this (so I'm replying!?), but I just want to point
> out 2 things:
> 1) So far we've never had a situation where Java Lucene was held back
> because of interoperability.  Ports tend to copy the implementation and
> adapt to Java Lucene.
> 2) Solr already does the HTTP server thing that you are describing, I
> believe.
>
> Otis
>
> ----- Original Message ----
> From: Robert Engels <rengels@ix.netcom.com>
> To: java-dev@lucene.apache.org
> Sent: Thursday, May 11, 2006 1:37:17 PM
> Subject: RE: Taking a step back
>
> I disagree with that a bit. I have found that certain languages lend
> themselves far better to certain file formats (that is, if an operation is
> very efficient to perform in a particular language, using a file format that
> allows the usage of that operation directly will often lead to much better
> performance). This is often true with byte ordering on particular hardware
> platforms. That is the whole reason this is an issue. Others can read the
> modified UTF, it is just not as efficient for them !
>
> But more importantly, I don't think Lucene (or others) should be "held back"
> attempting to adhere to a standardized file format.
>
> Take databases for example. Many available. All use different file formats,
> but all can be accessed with (pretty much) standardized SQL (using different
> drivers).
>
> I think Lucene could offer a similar approach at the API level, maybe an
> embedded TCP/IP interface / command processor (similar to an HTTP server).
>
> You are always going to have interoperability issues (sometimes even when
> using Java, but rarely), so I say dump the burden on the others, and just
> make Lucene the best Java search engine possible.
>
> Without starting some sort of flame war, I can't think of any advantages to
> not running a Java version of Lucene, but, that is just my opinion. It would
> be fairly straight forward to convert all of Lucene to C, and provide a Java
> binding, but why???
>
>
>
> -----Original Message-----
> From: Marvin Humphrey [mailto:marvin@rectangular.com]
> Sent: Thursday, May 11, 2006 12:08 PM
> To: java-dev@lucene.apache.org
> Subject: Re: Taking a step back
>
>
>
> On May 10, 2006, at 8:02 AM, Robert Engels wrote:
>
>   
>> The file format issue whoever is a non-issue. If you want
>> interoperability
>> between systems do it via remote invocation and IIOP, or some HTTP
>> interface. This is far more easier to control, especially through
>> version
>> change cycles - otherwise all platforms need to be updated together
>> - which
>> is very hard to do (unless you are using Java with WORA !).
>>
>> I also don't understand why Lucene doesn't focus on being THE JAVA
>> search
>> engine. Anything I think that detracts that from moving forward
>> should be
>> out of scope.
>>     
>
> I really don't relish the prospect that this might degenerate into a
> language argument, but I think it falls to me to respond, since the
> patch I submitted on Monday opens up a lot of possibilities for interop.
>
> I don't necessarily disagree.
>
> Abandoning all attempts at interop has its advantages.  One
> unfortunate albeit unavoidable aspect of Lucene is that it is tightly
> bound to its file format.  In a perfect world, the file reading/
> writing apparatus would be modular: the index would be read into
> memory using a plugin, manipulated, then saved using another plugin.
> That doesn't work, obviously, because indexes are commonly too large
> to be read into available RAM, and so the I/O stuff is scattered over
> the entire library, which makes maintaining compatibility laborious.
>
> However, Lucene has to make some effort to track its file format
> definition, so that it may live up to the commitments for backwards-
> compatibility codified earlier in this thread.  This is currently
> done using the File Formats document (though that document is
> incomplete and buggy).  There's not much difference between
> supporting the files written by an earlier version of Lucene and
> supporting the files written by another implementation of Lucene
> which adhere to the same spec.
>
> The only question is whether there are Java-specific optimizations
> which are so advantageous that they outweigh the benefits of
> interchange.  There is no inherent advantage in using Modified UTF-8
> over standard UTF-8, and the UTF-8 code I supplied actually speeds up
> Lucene by a couple percent because it simplifies some conditionals --
> all of the performance hit comes from using a bytecount as the String
> prefix.  I have good reasons to believe that this can go away, not
> the least of which is I've actually written a working implementation
> in Perl/C which uses bytecounts and I know where all the bottlenecks
> are.
>
> There are also advantages to keeping the file format public, both for
> Java Lucene and for the larger Apache Lucene project.  Of course
> there's the the raw usefulness of interchange.  For instance, it
> might be nice to whip up a little script in Perl or Ruby which works
> with your existing rig -- especially if there's a CPAN module that
> offers functionality you need which isn't available yet in Java, or
> you'd benefit from a near-instantaneous startup time.
>
> But more important, I'd argue, is that having all implementations
> share a common file format means that all the authors have an
> amplified interest in coordinating, communicating, and contributing.
> Just as learning new languages, programming or natural, broadens an
> individual's horizons, so does working out an implementation based on
> Lucene's data structures in another language lead to fresh thinking.
> The more cross-pollination of ideas from various authors and by
> proxy, their extended communities, the more all of the sub-projects
> gain and the faster Apache Lucene as a whole advances.
>
> Marvin Humphrey
> Rectangular Research
> http://www.rectangular.com/
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
>   

-- 

Grant Ingersoll 
Sr. Software Engineer 
Center for Natural Language Processing 
Syracuse University 
School of Information Studies 
335 Hinds Hall 
Syracuse, NY 13244 

http://www.cnlp.org 
Voice:  315-443-5484 
Fax: 315-443-6886 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message