lucenenet-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Granroth, Neal V." <neal.granr...@thermofisher.com>
Subject RE: Lucene.NET Community Status
Date Thu, 04 Nov 2010 14:34:29 GMT
Lot's of great points.  However, moving towards idiomatic .Net code is not wise and unnecessary
as has been pointed out by George and others.

- Neal

-----Original Message-----
From: Troy Howard [mailto:thoward37@gmail.com] 
Sent: Wednesday, November 03, 2010 10:47 PM
To: lucene-net-user@lucene.apache.org
Subject: Re: Lucene.NET Community Status

All,

I'm entering this conversation late as well. I'll apologize in advance, as I
know this will be lengthy.

Briefly, I'll list my "credentials" and reasons for concern here:

 - I've been using Lucene.Net for many years since the early versions and
have built significant products for my company using it. Those products are
a core source of our revenue, which is measured in the millions of $$s. The
success of my company's products are directly dependent on the success of
the Lucene.Net project.

 - I run software development at my company and make the final decisions
about what we do and how we use our resources. The developers here work on
open source code on our clock. I would like to have them start doing this
for Lucene.Net. We have very smart and productive people who could be a huge
asset to this project. I hope that the opportunity to leverage my company's
team will not be bypassed by the people running this project.

 - I have hacked extensively on the Lucene.Net internals to improve
performance in our product and have been manually maintaining our local
branch, merging in changes from the main project. I feel I have enough
knowledge of both the CS theory behind search engines and in particular this
codebase to not be intimidated by any aspect of the needs of this project.

 - I started a similar kind of open source project in that it is a .Net
implementation of an existing C++ open source project and struggled with the
"syntactic port" vs "conceptual port" issue, and so have perspective to
provide on that discussion


Relationship To ASF and Lucene
-----------------------------------------------

I'd like to address one thing upfront: This should definitely remain an
Apache Software Foundation project. As Grant and George have stated clearly
and accurately, this is a huge benefit for this project in terms of it's
credibility. This is not just because the name is well respected. It's
because of WHY the Apache name is so well respected: the processes and
values of the Foundation set excellent standards which encourages excellent
code. This is not just my opinion, but can be objectively proven by the
enormous success of the Apache projects. Complying with ASF's standards may
be difficult, but it's  extremely valuable.

I feel that Grant's recommendation of attempting to become a TLP at Apache
is the wrong direction. This should remain part of the Lucene project. It is
not unique in any substantial way from Lucene and thus doesn't warrant being
separate.

Also, there was some mention of Lucene's file format and maintaining that
compatibility. This is essential. If this ever changes, Lucene.Net will be
useless. Being cross platform and having a very stable on disk format is one
of it's most compelling aspects.


Microsoft's Interest and Involvement
---------------------------------------------------

Another thing to mention: Phil Haack and Scott Hanselman, while both are
Microsoft employees, are more than just a representative of the company they
work for. They are both outstanding advocates of open source software and
have been instrumental in the change of attitude that Microsoft has shown in
recent years towards this community. The fact that they have shown interest
in this issue doesn't mean Microsoft is interested, it means that this is a
significant issue for the .Net open source community. The fact
they they work for Microsoft means that they may be able to leverage
resources and wield clout from that vantage point that can benefit our
community greatly.

Regarding the question "What can Microsoft do to help"?.... I'll take a
somewhat radical stance here.

We need Visual J# not to have been abandoned... We need IronJava, like
IronPython or IronRuby. We need a native, MS developed and supported, fully
optimized and performant compiler for plain old Java code that runs on the
.Net runtime and exposes Java libraries to other .Net languages like F#, C#,
VB, etc..

There is a huge wealth of open source Java code out there, much of it in the
Apache project archives, which would all be "ported" at once. Currently our
community only gets access to Lucene.Net and iTextSharp and a few other
libraries where dedicated people like George put in hard hours of direct
syntax porting to implement these things in C#.

We need more than that.

I need Hadoop to run in .Net and HDFS, Hbase, Solr, Nutch, Tika, and
everything else in that ecosystem.

My company is actually at a critical point now, where we are considering
abandoning .Net/WCF as our service layer platform, and switching to Java, so
that we can leverage those excellent Java projects. Our business needs
demand that we have what Hadoop does. It will be easier for me to migrate my
application code to Java than to attempt to find equivalent functionality in
the existing .Net world or write my own framework, or port Hadoop.

So, if there was ONE thing that Microsoft could do to *significantly* help
the .Net developer community, it would be providing a *real* implementation
of IronJava which would obviate the need to port code completely, and simply
allow those libraries and applications to run in .Net natively.

That said, assuming that Visual J# remains "retired" (see:
http://msdn.microsoft.com/en-us/vjsharp/default ) this project is one of the
few things we .Net developers have to work with.


Java or .Net Code Idioms
-------------------------------------

I agree that moving to a codebase that is more .Net idiomatic will both
improve the user experience of end users of Lucene.Net but will also improve
the level of involvement that we can get from the community. To put it
simply, right now, hacking on the Lucene.Net core code means you
must understand Java idioms well, and how to translate those to .Net. This
is a skill set which is somewhat uncommon.

The "direct port" methodology also leads to code that is not fully optimized
for .Net. I have changed our local branch in a number of significant ways,
and improved performance significantly by doing so. I didn't change APIs, I
just change the implementations to be more appropriate for .Net, and
included generics.

The test suite provided with Lucene/Lucene.Net is a great benefit in that
regard, and helped me ensure that my changes didn't break functionality.
That said, the project need to improve in this regard. The classes
themselves need to be implemented in a more "testable" manner. Abstract base
classes instead of interfaces makes the code less mockable and thus less
testable. It also makes it harder to implement customized components into
the system. There are a number of things that are sealed or internal that do
not need to be.

Lucene (for Java) was awesome because it ran well as managed code and was
elegant and efficient in Java's environment. Any port of Lucene should
*retain those features* as well. The library should make sense and be
implemented in the most elegant and efficient way that it can be on the
platform it's implemented on. Lucene.Net should not be a port of Java Lucene
to .Net, it should be an *implementation* of Lucene running in .Net. Porting
implies line-for-line similarity. Implementing just implies that the
features are all represented.

For that reason, I support moving to a more idiomatic .Net implementation,
verified by the unit tests. The argument that "it will require smart people"
to understand the core code -- that's a *GOOD* requirement. If you don't
understand how it works, conceptually, perhaps you should not be attempting
to  implementing it. Merely porting or auto-converting code that "seems to
be the same" and "passes the unit tests", without really understanding the
details is not a safe way to ensure correct operation. What if there was a
subtle difference between the two syntaxes which led to differing (ie
incorrect) behaviour in some scenarios? What if the unit tests didn't cover
that scenario?

Regarding the help and support provided by the Lucene community, and the
books and examples that provide code samples.. Changing to a more .Net
idiomatic codebase, even if that meant top level API changes, would not be a
substantial issue that would prevent a .Net developer from understanding
example code written in Java. If the API is *basically* the same, but uses
foo.Size instead of foo.getSize()/foo.setSize() or List<T> instead of
ArrayList... those differences are minor and will not
cause significant issues for groking cross-language examples. People will
still get it... and .Net developers will be much happier.


So, take away is:
- My team and I will help hack on Lucene.Net and get paid to do it
- Lucene.Net should not change project status
- Microsoft should implement IronJava
- Moving towards idiomatic .Net code is the direction the project should go
and is not that big of a deal


Also, as a side-note. We're hiring in the Portland, Oregon area, and could
use developers who know Lucene.Net, and want to hack on it on the clock.
Send me your resume.


Thanks,

Troy Howard
Director of Software Development | discover-e Legal, LLC |
thoward37@gmail.com

Mime
View raw message