Return-Path: Delivered-To: apmail-lucene-lucene-net-user-archive@www.apache.org Received: (qmail 53834 invoked from network); 4 Nov 2010 14:34:27 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 4 Nov 2010 14:34:27 -0000 Received: (qmail 66723 invoked by uid 500); 4 Nov 2010 14:34:59 -0000 Delivered-To: apmail-lucene-lucene-net-user-archive@lucene.apache.org Received: (qmail 66697 invoked by uid 500); 4 Nov 2010 14:34:58 -0000 Mailing-List: contact lucene-net-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: lucene-net-user@lucene.apache.org Delivered-To: mailing list lucene-net-user@lucene.apache.org Received: (qmail 66689 invoked by uid 99); 4 Nov 2010 14:34:58 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Nov 2010 14:34:58 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [12.199.218.162] (HELO usmx03.thermofisher.com) (12.199.218.162) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Nov 2010 14:34:53 +0000 X-ASG-Debug-ID: 1288881272-47d1db920001-XJQg5L Received: from mxvolt02.thermofisher.com ([12.199.217.100]) by usmx03.thermofisher.com with ESMTP id 0H29N3u3mH0jFxTD (version=TLSv1 cipher=AES256-SHA bits=256 verify=NO) for ; Thu, 04 Nov 2010 07:34:32 -0700 (MST) X-Barracuda-Envelope-From: neal.granroth@thermofisher.com X-Barracuda-Apparent-Source-IP: 12.199.217.100 X-ASG-Whitelist: Client Received: from uspho-mxet02.amer.thermo.com (uspho-mxet02.amer.thermo.com [10.161.198.45]) by mxvolt02.thermofisher.com (8.13.8/8.13.8) with ESMTP id oA4EYWgg009933 for ; Thu, 4 Nov 2010 07:34:32 -0700 Received: from uspho-mxht01.amer.thermo.com (10.161.196.45) by uspho-mxet02.amer.thermo.com (10.161.198.45) with Microsoft SMTP Server (TLS) id 8.1.393.1; Thu, 4 Nov 2010 07:34:32 -0700 Received: from uspho-mxvs05.amer.thermo.com ([fe80::6d23:b7bc:c7ee:bcf5]) by uspho-mxht01.amer.thermo.com ([fe80::84e1:eae7:bb14:d667%14]) with mapi; Thu, 4 Nov 2010 07:34:31 -0700 From: "Granroth, Neal V." To: "lucene-net-user@lucene.apache.org" Date: Thu, 4 Nov 2010 07:34:29 -0700 Subject: RE: Lucene.NET Community Status Thread-Topic: Lucene.NET Community Status X-ASG-Orig-Subj: RE: Lucene.NET Community Status Thread-Index: Act70x2julR/FKpITpy+MCcw/4q1zgAWgWIA Message-ID: <3FCEA726F7253C4FAC829227E83E0514017291DA28@USPHO-MXVS05.amer.thermo.com> References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Barracuda-Connect: UNKNOWN[12.199.217.100] X-Barracuda-Start-Time: 1288881272 X-Barracuda-Encrypted: AES256-SHA X-Barracuda-URL: http://usmx03.thermofisher.com:8000/cgi-mod/mark.cgi X-Virus-Scanned: by bsmtpd at thermofisher.com Lot's of great points. However, moving towards idiomatic .Net code is not = wise and unnecessary as has been pointed out by George and others. - Neal -----Original Message----- From: Troy Howard [mailto:thoward37@gmail.com]=20 Sent: Wednesday, November 03, 2010 10:47 PM To: lucene-net-user@lucene.apache.org Subject: Re: Lucene.NET Community Status All, I'm entering this conversation late as well. I'll apologize in advance, as = I know this will be lengthy. Briefly, I'll list my "credentials" and reasons for concern here: - I've been using Lucene.Net for many years since the early versions and have built significant products for my company using it. Those products are a core source of our revenue, which is measured in the millions of $$s. The success of my company's products are directly dependent on the success of the Lucene.Net project. - I run software development at my company and make the final decisions about what we do and how we use our resources. The developers here work on open source code on our clock. I would like to have them start doing this for Lucene.Net. We have very smart and productive people who could be a hug= e asset to this project. I hope that the opportunity to leverage my company's team will not be bypassed by the people running this project. - I have hacked extensively on the Lucene.Net internals to improve performance in our product and have been manually maintaining our local branch, merging in changes from the main project. I feel I have enough knowledge of both the CS theory behind search engines and in particular thi= s codebase to not be intimidated by any aspect of the needs of this project. - I started a similar kind of open source project in that it is a .Net implementation of an existing C++ open source project and struggled with th= e "syntactic port" vs "conceptual port" issue, and so have perspective to provide on that discussion Relationship To ASF and Lucene ----------------------------------------------- I'd like to address one thing upfront: This should definitely remain an Apache Software Foundation project. As Grant and George have stated clearly and accurately, this is a huge benefit for this project in terms of it's credibility. This is not just because the name is well respected. It's because of WHY the Apache name is so well respected: the processes and values of the Foundation set excellent standards which encourages excellent code. This is not just my opinion, but can be objectively proven by the enormous success of the Apache projects. Complying with ASF's standards may be difficult, but it's extremely valuable. I feel that Grant's recommendation of attempting to become a TLP at Apache is the wrong direction. This should remain part of the Lucene project. It i= s not unique in any substantial way from Lucene and thus doesn't warrant bein= g separate. Also, there was some mention of Lucene's file format and maintaining that compatibility. This is essential. If this ever changes, Lucene.Net will be useless. Being cross platform and having a very stable on disk format is on= e of it's most compelling aspects. Microsoft's Interest and Involvement --------------------------------------------------- Another thing to mention: Phil Haack and Scott Hanselman, while both are Microsoft employees, are more than just a representative of the company the= y work for. They are both outstanding advocates of open source software and have been instrumental in the change of attitude that Microsoft has shown i= n recent years towards this community. The fact that they have shown interest in this issue doesn't mean Microsoft is interested, it means that this is a significant issue for the .Net open source community. The fact they they work for Microsoft means that they may be able to leverage resources and wield clout from that vantage point that can benefit our community greatly. Regarding the question "What can Microsoft do to help"?.... I'll take a somewhat radical stance here. We need Visual J# not to have been abandoned... We need IronJava, like IronPython or IronRuby. We need a native, MS developed and supported, fully optimized and performant compiler for plain old Java code that runs on the .Net runtime and exposes Java libraries to other .Net languages like F#, C#= , VB, etc.. There is a huge wealth of open source Java code out there, much of it in th= e Apache project archives, which would all be "ported" at once. Currently our community only gets access to Lucene.Net and iTextSharp and a few other libraries where dedicated people like George put in hard hours of direct syntax porting to implement these things in C#. We need more than that. I need Hadoop to run in .Net and HDFS, Hbase, Solr, Nutch, Tika, and everything else in that ecosystem. My company is actually at a critical point now, where we are considering abandoning .Net/WCF as our service layer platform, and switching to Java, s= o that we can leverage those excellent Java projects. Our business needs demand that we have what Hadoop does. It will be easier for me to migrate m= y application code to Java than to attempt to find equivalent functionality i= n the existing .Net world or write my own framework, or port Hadoop. So, if there was ONE thing that Microsoft could do to *significantly* help the .Net developer community, it would be providing a *real* implementation of IronJava which would obviate the need to port code completely, and simpl= y allow those libraries and applications to run in .Net natively. That said, assuming that Visual J# remains "retired" (see: http://msdn.microsoft.com/en-us/vjsharp/default ) this project is one of th= e few things we .Net developers have to work with. Java or .Net Code Idioms ------------------------------------- I agree that moving to a codebase that is more .Net idiomatic will both improve the user experience of end users of Lucene.Net but will also improv= e the level of involvement that we can get from the community. To put it simply, right now, hacking on the Lucene.Net core code means you must understand Java idioms well, and how to translate those to .Net. This is a skill set which is somewhat uncommon. The "direct port" methodology also leads to code that is not fully optimize= d for .Net. I have changed our local branch in a number of significant ways, and improved performance significantly by doing so. I didn't change APIs, I just change the implementations to be more appropriate for .Net, and included generics. The test suite provided with Lucene/Lucene.Net is a great benefit in that regard, and helped me ensure that my changes didn't break functionality. That said, the project need to improve in this regard. The classes themselves need to be implemented in a more "testable" manner. Abstract bas= e classes instead of interfaces makes the code less mockable and thus less testable. It also makes it harder to implement customized components into the system. There are a number of things that are sealed or internal that d= o not need to be. Lucene (for Java) was awesome because it ran well as managed code and was elegant and efficient in Java's environment. Any port of Lucene should *retain those features* as well. The library should make sense and be implemented in the most elegant and efficient way that it can be on the platform it's implemented on. Lucene.Net should not be a port of Java Lucen= e to .Net, it should be an *implementation* of Lucene running in .Net. Portin= g implies line-for-line similarity. Implementing just implies that the features are all represented. For that reason, I support moving to a more idiomatic .Net implementation, verified by the unit tests. The argument that "it will require smart people= " to understand the core code -- that's a *GOOD* requirement. If you don't understand how it works, conceptually, perhaps you should not be attempting to implementing it. Merely porting or auto-converting code that "seems to be the same" and "passes the unit tests", without really understanding the details is not a safe way to ensure correct operation. What if there was a subtle difference between the two syntaxes which led to differing (ie incorrect) behaviour in some scenarios? What if the unit tests didn't cover that scenario? Regarding the help and support provided by the Lucene community, and the books and examples that provide code samples.. Changing to a more .Net idiomatic codebase, even if that meant top level API changes, would not be = a substantial issue that would prevent a .Net developer from understanding example code written in Java. If the API is *basically* the same, but uses foo.Size instead of foo.getSize()/foo.setSize() or List instead of ArrayList... those differences are minor and will not cause significant issues for groking cross-language examples. People will still get it... and .Net developers will be much happier. So, take away is: - My team and I will help hack on Lucene.Net and get paid to do it - Lucene.Net should not change project status - Microsoft should implement IronJava - Moving towards idiomatic .Net code is the direction the project should go and is not that big of a deal Also, as a side-note. We're hiring in the Portland, Oregon area, and could use developers who know Lucene.Net, and want to hack on it on the clock. Send me your resume. Thanks, Troy Howard Director of Software Development | discover-e Legal, LLC | thoward37@gmail.com