lucene-lucene-net-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Mateja <peter.mat...@gmail.com>
Subject Re: [jira] Commented: (LUCENENET-380) Evaluate Sharpen as a port tool
Date Mon, 10 Jan 2011 22:43:22 GMT
The amount of custom work required for the conversion is starting to concern
me a bit.  Well, to clarify, the work itself doesn't concern me, but rather
I'm worried that this is going to make a purely automated conversion process
very difficult to pull off and probably very fragile.  The devil is
definitely in the details.

What are thoughts concerning how we can begin to tackle this?

How many of these issues can be handled by Sharpen, or a modified, custom
version of Sharpen?  What items are best handled by a pre/post processor?

A number of the items DIGY listed (thanks!) seem to fall under the scope of
"code intent", vs pure syntactical mapping.  I'd suggest that it's
unrealistic to expect any conversion tool to manage those types of issues.

Perhaps a process such as the following should be our initial draft:

1) Start with Lucene.Java source, initially the latest 3.0.3 release.
2) Make specific hand coded changes to the java source code to assist with
certain automated conversion issues.  These changes should be expressed as a
set of patch files, to be automatically applied to the java source on
subsequent iterations of this process.  Any patch rejections should break
the build.  These patches should be maintained as new code updates come from
the java source.
3) Run an automated conversion tool (Sharpen most likely.)
4) Perform any desired post processing to modify the source code structure,
setup project / solution files, etc.  Essentially, get the project into a
state that it's loadable by Visual Studio.  At this point there will be
errors (lots of them.)  The output of this step should be checked in as the
raw conversion source.
5) Make changes to the converted C# code, including necessary helper
classes, in order to fix all the remaining issues alluded to by DIGY.  Also,
run any automated post processing, such as Resharper code formatting (the
formatting settings should be standardized across the project to ensure
normalized and repeatable refactorings), inline docs tweaks, etc.  These
changes should also be expressed as a set of patch files, to be
automatically applied to the raw conversion source on subsequent iterations
of this process.  Any patch rejections should break the build.  These
patches should represent the bulk of the efforts of the Lucene.Net core dev
team.  The output of this step should be checked in as the official
Lucene.Net source code.

This entire process needs to be checked into a conversion process branch.
 After the initial build of this system, workflow would be split into the
following 2 vectors:
A) On java source changes (probably at a courser level than individual
commits,) steps 1-4 would be run to build a new base raw conversion source.
 With the java changes, it's possible that changes to the patch files in
step 2 would be required.  Then step 5 would be run to create the official
Lucene.Net source.  Again, fixes to the patches may be in order depending on
the complexity of the original java changes
B) Most other changes would be considered C#-side specific.  This might
involve platform specific bug fixes, desired code refactorings, etc.  These
changes would be made based on the current checked in Lucene.Net source, and
the patch files for step 5 would be updated to reflect those changes.

Conversion process changes would fall outside the scope of standard
development, being fairly disruptive.

Of course, this process does complicate the development / maintenance
process quite a bit, by making many more vectors of change.  And, I'm aware
that what I've blathered on about here has probably already been discussed,
but I wanted to get some discussion going.  Thoughts?

Peter Mateja
peter.mateja@gmail.com



On Sun, Jan 9, 2011 at 4:09 PM, Digy <digydigy@gmail.com> wrote:

> Having a "buildable" & "clean" code is just a beginning and should not
> result in lost of know-hows.
> Before trying to fix the bugs of the output of these tools, everyone should
> see how they were fixed in Lucene.Net 2.9.2.
> There is no need to reinvent the wheel.
>
> Here is a quick list of tips & tricks as far as I can remember.
>
> * Decimal separator is not always ".", some locales use "," (while parsing
> float/double).
> * "Set" in Java accepts "null" as argument.  A null-control is needed while
> porting.
> * ReadResolve should be ported by implementing the interface
> "System.Runtime.Serialization.IObjectReference"
>        public Object
> GetRealObject(System.Runtime.Serialization.StreamingContext context)
>        {
>            return ReadResolve();
>        }
> * .NET emits "\ufffd" as invalid char but java as "\x00"
> * Use StringComparer.Ordinal while comparing strings.
> * FIPS compliance.  use SHA1 instead of MD5
> * Use "System.Runtime.Serialization.OnDeserialized" attribute on
> Serializable classes.
>        void OnDeserialized(System.Runtime.Serialization.StreamingContext
> context)
>        {
>            -----
>        }
> * Use "System.IO.Path.DirectorySeparatorChar" or "Path.Combine" instead of
> using "\\". (causes problems on Mono)
> * Iteration problems.  "if (i.MoveNext()){...}" can not be used (in a while
> loop)  to detect the end of the list.
> * Port of TreeSet. TreeSet in Java sorts its contents based on the default
> Comparator of the items, but the ArrayList does not.
> * Unexpected results when writing custom analyzers. Override
> Read,ReadBlock,ReadLine,Peek,ReadToEnd in ReusableStringReader.
> * Multi-dimensional arrays: "length" in java returns the number of
> dimensions. In c# "Length" returns the total number of elements in all
> dimensions.
> * Copy private fields in the class' "Clone" method.
> * Don't forget: base-36-encoding is used in filenames.
> * Use "if (dataLen <=0 )" instead of  "if (dataLen == -1)" to detect end of
> stream.
> * Case insensivity. Don't use public names such as "text" and "Text" in a
> single class (problem for VB users).
> * Use ThreadClass in SupportClass.cs instead of System.Threading.Thread
> * Use "System.Text.Encoding.UTF8" instead of "System.Text.Encoding.ASCII"
> * ">>>" is already implemented in SupportClass.
> * Threshold differences between .NET & Java while comparing floats/doubles.
> ----Use also these classes:
> * There is a good implementation of WeakHashTable in SupportClass. (needs
> "Generics")
> * There is a very fast LRU cache impl. (SimpleLRUCache). (needs "Generics")
>
>
> PS: This not a complete list and there may be many others from other
> contributers of Lucene.Net
>
> DIGY
>
>
>
>
> -----Original Message-----
> From: Peter Mateja [mailto:peter.mateja@gmail.com]
> Sent: Friday, January 07, 2011 7:53 PM
> To: lucene-net-dev@lucene.apache.org
> Subject: Re: [jira] Commented: (LUCENENET-380) Evaluate Sharpen as a port
> tool
>
> Nice work Alex!
>
> Not that this represents a solution, but I did load up the core source from
> your conversion into a VS2010 project, then ran Resharper's code cleanup on
> it.
>
> This process took care of all the unused 'using Java.*' references, cleanup
> up formatting, etc.  However, I'm still seeing a good many things that need
> work:
>
> 1) ICloseable -> IDisposable, including refactoring of the implementation
> from Close() to Dispose() (and also considering any additional refactoring
> of the Disposable pattern.)
> 2) IFieldCache is marked as an interface, but has tons of static fields,
> subclasses and interfaces.  This may be ok in Java, but not in C#.  Not
> sure
> what the best course of action here might be... perhaps create an abstract
> base class called FieldCache or FieldCacheBase to house this stuff, and
> pull
> out the nested classes / interfaces into their own files.
> 3) Use of a generic WeakReference<>, which doesn't exist in generic form in
> the .Net Framework.  This is something which could either be refactored or
> implemented as generic.
> 4) ICloneable interface not implemented (see IndexInput.cs)
> 5) Unsigned bitwise shift assignment operator doesn't exist in C#.  See
> IndexOutput.cs, WriteVInt() method.  The line i >>>= 7; in java flags an
> error in C#.  I'm not entirely sure in this case, but I believe this can
> safely be converted to: i >>= 7; in this case, especially given the comment
> that negative numbers are not supported.
> 6) Use of Java DecimalFormat class.  An appropriate .Net replacement should
> be easily substituted with some refactoring of the code.
> 7) Use of Runtime.IdentityHashCode().  Not sure how necessary this is.
> 8) Java specific value type parsing calls should be refactored to .Net
> (e.g.
> double.ParseDouble() => double.Parse())
> 9) Use of the java ReadResolve() object serialization pattern needs to be
> analyzed / refactored (see FieldCache.DefaultByteParser (or in the
> translated version, IFieldCache._IByteParser)).
> 10) Use of Sharpen references.
> 11) Use of Java's NumberFormatException... should be refactored to use an
> appropriate standard exception type (perhaps FormatException, though I'm
> not
> sure this is appropriate) or create an internal Exception class for this
> case.
>
> There's plenty more build issues... I need to put this down for the rest of
> the day, so I thought I'd at least get this out to the list.
>
> Peter Mateja
> peter.mateja@gmail.com
>
>
>
> On Fri, Jan 7, 2011 at 9:34 AM, Neal Granroth (JIRA) <jira@apache.org
> >wrote:
>
> >
> >    [
> >
> https://issues.apache.org/jira/browse/LUCENENET-380?page=com.atlassian.jira
> .
>
> plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12978816#acti
> on_12978816]
> >
> > Neal Granroth commented on LUCENENET-380:
> > -----------------------------------------
> >
> > Thanks Alex,
> >
> > What would be the plan for handling the Sharpen artifacts that prevent
> the
> > converted code from being built by the .NET SDK compiler?
> >
> > Do you envision a post-conversion script to strip out statements like:
> > using Java.Lang
> > using Java.IO
> >
> > and replace Sharpen-specific classes with standard .NET classes:
> > Sharpen.Collections.*
> > Sharpen.Runtime.*
> >
> >
> >
> > > Evaluate Sharpen as a port tool
> > > -------------------------------
> > >
> > >                 Key: LUCENENET-380
> > >                 URL:
> https://issues.apache.org/jira/browse/LUCENENET-380
> > >             Project: Lucene.Net
> > >          Issue Type: Task
> > >            Reporter: George Aroush
> > >         Attachments:
> 3.0.2_JavaToCSharpConverter_AfterPostProcessing.zip,
> > 3.0.2_JavaToCSharpConverter_NoPostProcessing.zip, IndexWriter.java,
> > Lucene.Net.3_0_3_Sharpen20110106.zip, Lucene.Net.Sharpen20101104.zip,
> > Lucene.Net.Sharpen20101114.zip, NIOFSDirectory.java, QueryParser.java,
> > TestBufferedIndexInput.java, TestDateFilter.java
> > >
> > >
> > > This task is to evaluate Sharpen as a port tool for Lucene.Net.
> > > The files to be evaluated are attached.  We need to run those files
> > (which are off Java Lucene 2.9.2) against Sharpen and compare the result
> > against JLCA result.
> >
> > --
> > This message is automatically generated by JIRA.
> > -
> > You can reply to this email to add a comment to the issue online.
> >
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message