lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-2308) Separately specify a field's type
Date Wed, 31 Aug 2011 13:24:09 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094514#comment-13094514
] 

Uwe Schindler commented on LUCENE-2308:
---------------------------------------

I am on opposite side:

In general the constructor of the immutable class is hidden (package-private or private depending
on class hierarchy). So nobody can use it. The only API the user sees is the builder pattern.
By that we only have *one* API and one usage type.

Builder patterns can be formatted very nice and it does not matter if people do:

{code:java}
Field.Builder b = new Field.Builder();
b.setFoo();
b.setBar();
Field f = b.build();
{code}

versus

{code:java}
Field f = new Field.Builder()
 .setFoo()
 .setBar()
 .build();
{code}

The last chaining one is even more readable, and that is why *I* prefer builders. A so called
"telescoping constructor" is the antipattern because its completely unreadable, as Java lacks
of named parameters [the best example is WordDelimiterFilter, that one is horrible - a typical
candidate for WordDelimiterFilter.Builder subclass). The chaining code is for stack based
machines like the JVM and the x86 processors also more natural than the first one. The return
value of the previous call resides already on the stack after the method returns, but instead
of popping it and pushing again, it can stay there and you simply add the parameters of the
next method call. This leads to also very elegant bytecode, for which hotspot has optimizations
:-)

About code duplication: You can in the hidden ctor of the immutable class make a clone of
the builder and keep it somewhere private final inside the instance. This one then holds the
unmodifiable instance state.

About number of objects (yes, we have the builder object and possibly a clone to it as suggested
before and finally the immutable object): The number of objects is really nonsense here as
all of those will be created in the Eden space and disappear as soon as the loop/method exits.
You can try autoboxing with a recent JavaVM - you would in most cases see no slowdown caused
by autoboxing. These are problems from pre-2000 when we had Java 1.1.

Uwe

> Separately specify a field's type
> ---------------------------------
>
>                 Key: LUCENE-2308
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2308
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>              Labels: gsoc2011, lucene-gsoc-11, mentor
>             Fix For: 4.0
>
>         Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, LUCENE-2308-12.patch,
LUCENE-2308-13.patch, LUCENE-2308-14.patch, LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch,
LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, LUCENE-2308-20.patch, LUCENE-2308-21.patch,
LUCENE-2308-3.patch, LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, LUCENE-2308-7.patch,
LUCENE-2308-8.patch, LUCENE-2308-9.patch, LUCENE-2308-branch.patch, LUCENE-2308-final.patch,
LUCENE-2308-ltc.patch, LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, LUCENE-2308-merge-3.patch,
LUCENE-2308.branchdiffs, LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch,
LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch
>
>
> This came up from dicussions on IRC.  I'm summarizing here...
> Today when you make a Field to add to a document you can set things
> index or not, stored or not, analyzed or not, details like omitTfAP,
> omitNorms, index term vectors (separately controlling
> offsets/positions), etc.
> I think we should factor these out into a new class (FieldType?).
> Then you could re-use this FieldType instance across multiple fields.
> The Field instance would still hold the actual value.
> We could then do per-field analyzers by adding a setAnalyzer on the
> FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
> for per-field codecs (with flex), where we now have
> PerFieldCodecWrapper).
> This would NOT be a schema!  It's just refactoring what we already
> specify today.  EG it's not serialized into the index.
> This has been discussed before, and I know Michael Busch opened a more
> ambitious (I think?) issue.  I think this is a good first baby step.  We could
> consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
> off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message