commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Carman <ja...@carmanconsulting.com>
Subject Re: [Math] Improving combinationIterator?
Date Tue, 27 Aug 2013 12:02:03 GMT
I can imagine it in CollectionUtils if it's generified.

On Tue, Aug 27, 2013 at 2:12 AM, Phil Steitz <phil.steitz@gmail.com> wrote:
> On 8/26/13 4:37 AM, Gilles wrote:
>> On Sun, 25 Aug 2013 17:14:37 -0700, Phil Steitz wrote:
>>> On 8/25/13 9:59 AM, Gilles wrote:
>>>> On Sun, 25 Aug 2013 09:19:41 -0700, Phil Steitz wrote:
>>>>> On 8/25/13 8:10 AM, Gilles wrote:
>>>>>> On Sat, 24 Aug 2013 21:55:36 -0000, psteitz@apache.org wrote:
>>>>>>
>>>>>>> [...]
>>>>>>
>>>>>> I wonder whether the utility should involve the creation of
>>>>>> an instance of a bona fide "Combination" class.
>>>>>> I.e. instead of a "naked"
>>>>>>   Iterator<int[]> combinationIterator(int n, int k)
>>>>>> there would be
>>>>>>   Iterator<Combination> combinationIterator(int n, int k)
>>>>>> where "Combination" would provide an accessor
>>>>>>   public int[] toArray() {
>>>>>>     // ...
>>>>>>   }
>>>>>
>>>>> I need the "naked arrays" generated very quickly with minimal
>>>>> overhead for the K-S use case.
>>>>
>>>> The only "overhead" would be the creation of an instance for
>>>> wrapping the "int[]".
>>>> [As allocation is fairly cheap in Java, that should not be a
>>>> problem.]
>>>>
>>>>> My use case uses the arrays
>>>>> directly, but others could use the indices to extract objects from
>>>>> lists.  For example,  what I thought would be natural would be to
>>>>> add
>>>>> public static <T> Iterator<List<T>> combinationsIterator(List<T>
>>>>> list, int k)
>>>>
>>>> Would this be the base implementation, from which your case would
>>>> be implemented by retrieving "Integer" from the list and
>>>> copy/convert
>>>> them into an "int[]"?
>>>
>>> No, the opposite actually.  The base impl is what exists now.  It
>>> works efficiently directly on an int[] array to compute the indices
>>> within the source list (in my use case, {0, 1, ..., n-1} which is
>>> actually canonical).  Then if you want to source subsets of objects
>>> from a list of objects, you can compute the int[] array that
>>> actually *is* the combination and extract the corresponding elements
>>> from the list.
>>
>> I got it.
>>
>>>>
>>>>>
>>>>> What you would get out of this is element lists.  And the
>>>>> implementation could postpone copy / creation until the indices
>>>>> have
>>>>> been selected using the existing method.
>>>>
>>>> I don't follow ...
>>>
>>> See above.  I am probably still not explaining clearly what I have
>>> in mind.  To me, a "combination" is a subset of the integers {0,
>>> ..., n-1}.  Technically it is a set, not a list, but there is a
>>> natural order available and enumeration can work easier if you use
>>> that order and adopt the convention that you represent combinations
>>> as ordered lists of integers.  The LexicographicCombinationIterator
>>> that I committed does that, using primitive arrays to represent the
>>> lists.  The iterator itself uses a primitive array of int[] to
>>> maintain state and create successive iterates.  Use of a single
>>> primitive array eliminates all object creation / garbage collection,
>>> which is not necessary to just enumerate the combinations.  Once you
>>> have a "true" combination represented as an array of int[], you can
>>> use it to specify a "combination"  (really sample or subset) of a
>>> list of any kind of object.
>>
>> So, IIUC it would be useful to have a utility like
>> -----
>>  public static List<T> getSample(List<T> elements,
>>                                  int[] combination) {
>>    final List<T> samples = new ArrayList<T>(combination.length);
>>    for (int index : combination) {
>>      samples.add(elements.get(index));
>>    }
>>  }
>
> Yes, that is what the List version would use.  Not sure how
> generically useful it would be, but I can imagine other use cases.
> It might be best to call it filter or subList or something and put
> it in MathArrays.  Not sure where it would belong as a generic utility.
>
>
>> -----
>>
>>>>
>>>>>  I am not yet sold on the
>>>>> value of "Combination" as an abstraction, since what you really
>>>>> use
>>>>> is just the list of selected elements.
>>>>
>>>> Maybe "Combination" is not worth implementing indeed; but what
>>>> about
>>>> a "Combinations" class:
>>>>
>>>> public class Combinations implements Iterable<int[]> {
>>>>   private final int n;
>>>>   private final int k;
>>>>
>>>>   public Combinations(int n, int k) {
>>>>     // ...
>>>>   }
>>>>
>>>>   public Iterator<int[]> iterator() {
>>>>     return new LexicographicCombinationIterator(n, k);
>>>>   }
>>>>
>>>>   public Comparator<int[]> comparator() {
>>>>     return new LexicographicCombinationComparator(n, k);
>>>>   }
>>>>
>>>>   private static class LexicographicCombinationIterator implements
>>>> Iterator<int[]> {
>>>>     // [Your implementation]
>>>>   }
>>>>
>>>>   private static class LexicographicCombinationComparator
>>>> implements Comparator<int[]> {
>>>>       public int compare(int[], int[]) {
>>>>       // ... using e.g. "lexNorm" implementation.
>>>>     }
>>>>   }
>>>> }
>>>>
>>>> Being "Iterable" allows nice usage in a "for" loop:
>>>>
>>>> for (int[] combination : new Combinations(4, 2)) {
>>>>   // ... etc.
>>>> }
>>>
>>> I thought about that and that is what I meant by the original
>>> suggestion that we possibly make the
>>> LexicographicCombinationsIterator a public class.  I just don't see
>>> the practical use cases for it as a standalone class.
>>
>> The "for"-loop example above uses the iterator functionality but
>> in a much less verbose way (no explicit calls to "hasNext()" and
>> "next()").
>> This is standard practice for everything that can be handled as a
>> collection.
>>
>> Then, we can also have your iterator factory method:
>> -----
>> public static Iterator<int[]> combinationsIterator(int n, int k) {
>>   return new Combinations(4, 2).iterator();
>> }
>> -----
>>
>> Then, we also don't hide some code in "test" where it could be made
>> visible and be used elsewhere (cf. "lexNorm").
>
> OK, I see your point.
>>
>>>  What I have
>>> immediate practical need for is just a way to get an Iterator<int[]>
>>> based on <n,k>.  As you note above, it is the Iterator that is
>>> useful.  I can imagine use cases for the object-list version
>>> public static <T> Iterator<List<T>> combinationsIterator(List<T>
>>> list, int k ) though.  If we do extract a public class, I would
>>> prefer that it be called something like CombinationsIterator.
>>
>> I think that a design based on the "Iterable" interface is
>> preferrable.
>> From it you can always get the iterator functionality (by
>> definition of
>> the interface), while the reverse is not true.
>
> Agreed.
>>
>>> If we
>>> have other iteration strategies that we want to implement, we could
>>> make it an abstract parent for LexicographicCombinationsIterator.  I
>>> would recommend holding off complicating the design / structure
>>> though until we have these though.
>>
>> Sorry to insist but it is not a complication, it is a
>> _simplification_ ;-)
>> Instead of a specifically named method ("combinationsIterator"),
>> we rely
>> on the standard API ("Iterable") of the Java language.
>
> Yeah, I get it now.  Go for it or open a ticket so we don't forget.
>
> Phil
>>
>>
>> Gilles
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message