accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Keith Turner (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-3633) Please provide information on implementing custom iterators in the documentation
Date Tue, 10 Mar 2015 14:51:38 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-3633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14354999#comment-14354999
] 

Keith Turner commented on ACCUMULO-3633:
----------------------------------------

[~elserj] some thoughts as I read over it.   If you make a 2nd revision, would you mind putting
it on RB?  

 * init section : iterator config can come from other places than table config
 * Instantiation section : Could also mention per table classpath.  Also could mention that
one should not seek source in init()
 * Iterator design section : List is mentioned, could also mention Tree of iterators.  Also
could use Tree terminology in deepCopy section.  Could mention deepCopy() should be called
 in init()
 * explain isolation, data sources do not change until top iterator returns  key/value, could
move [html isolation iterater documation|https://github.com/apache/accumulo/blob/1.6.2/docs/src/main/resources/isolation.html]
to this new section in user manual
 * in addition to combiner and filter, could mention transforming iterator
 * next section : does it have to be a cached key value?
 * could explain why cross row operations are not recommended
 * hasTop section : java iterators have nothing similar to hasTop

Some code like the following showing how tserver will call iterators for a scan may be useful.

{code:java}


 List<KeyValue> batch;
 Range range = //range from client
 while(!overSizeLimit(batch)){
   source = systemIterator()
   for(SKVI iter : iterators){
    iter.init(source, opts, env)
    source = iter  
   }

   //read a batch of data to return to client
   topIter = iterators.last()
   topIter.seek(range, ...)

   while(topIter.hasTop() && !overSizeLimit(batch)){
       key = topIter.getTopKey()
       val = topIter.getTopValue()
       batch.add(new KeyValue(key, val)
       if(systemDataSourcesChanged()){
         //code does not show isolation case, which will keep using same data sources until
a row boundry is hit 
         range = new Range(key, false, range.endKey(), range.endKeyInclusive());
         break;
       }
   }
 }
 //return batch of key values to client
{code}

> Please provide information on implementing custom iterators in the documentation
> --------------------------------------------------------------------------------
>
>                 Key: ACCUMULO-3633
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-3633
>             Project: Accumulo
>          Issue Type: Wish
>          Components: docs
>    Affects Versions: 1.6.0
>         Environment: Centos 6.5, Accumulo 1.6.0, CDH 5. 
>            Reporter: Vaibhav Thapliyal
>            Assignee: Josh Elser
>              Labels: documentation
>             Fix For: 1.7.0
>
>         Attachments: 0001-ACCUMULO-3633-User-manual-chapter-on-custom-iterator.patch
>
>
> Dear all,
> Can you please provide a documentation regarding creating custom Iterators. For example,
explain the functionality  of the functions inside SortedKeyValueIterator and how to override
those functions.
> Please explain how these functions are executed (which class calls these functions when
the iterator executes).
> I would appreciate if these changes are made in your future documentations as this would
help developers who are new to accumulo to quickly get started on writing their own custom
iterators which is an essential part of accumulo. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message