accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: Abnormal behaviour of custom iterator in getting entries
Date Tue, 16 Jun 2015 05:37:23 GMT
To enable remote debugging, in ACCUMULO_TSERVER_OPTS in accumulo-env.sh, 
add the following "-Xdebug 
-Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8888"

In this case, you would then use the port 8888 in Eclipse to do a Remote 
Java Application debugging session. Your TServer would need to be 
running locally to do this. If it's running on a remote host, you could 
do some trickery setting up SSH tunnels.

--

One problem with your iterator is that you are not returning your data 
in sorted order. This is a very bad idea as it invalidates the contract 
of the SortedKeyValueIterator interface and will cause you trouble in 
the future.

I'm not certain if this is why you are having problems with the 
BatchScanner -- I would have thought this would be problematic in both 
the Scanner and BatchScanner. You may have just found a set of 
conditions that this happened to not fail using the Scanner when it 
should have failed.

The omitted code in your myFunction() is a little scary too. You do not 
want to consume all of the data in the Range at one time as you will 
cause the server to run out of memory. SKVIs are meant to be run over 
data in your table _without_ keeping all of the data in memory. Think 
more of iterators as functions being applied to a stream of Keys and Values.

You can buffer small amounts of data in an iterator in memory (for 
example, buffering a row is fairly common), however this also requires 
sufficient memory on the tablet server to keep any row in memory. e.g. 
if you have a row that has 100k key-values in it, you will run out of 
memory.

madhvi wrote:
> Thanks Josh.
>
> Outline of my code is:
>
> public class TestIterator extends WrappingIterator {
>
> HashMap<String, Integer> holder = new HashMap<>();
> private Iterator<Map.Entry<String, Integer>> entries=null;
> private Entry<String, Integer> entry=null;
> private Key emitKey;
> private Value emitValue;
>
> @Override
> public void seek(Range range, Collection<ByteSequence> columnFamilies,
> boolean inclusive) throws IOException {
> super.seek(range, columnFamilies, inclusive);
> myFunction();
> }
>
> myFunction()
> {
> while(super.hasTop())
> {
> //matched the condition and put values to holder map.
> }
> entries = holder.entrySet().iterator();//iterate the map holder.
> }
>
> @Override
> public Key getTopKey() {
> return emitKey;
> }
>
> @Override
> public Value getTopValue() {
> return emitValue;
> }
>
> @Override
> public boolean hasTop() {
> return entries.hasNext();
> }
>
> @Override
> public void next() throws IOException {
> try{
> entry = entries.next();
> //put the keys of map to rowid and values of map to columnqualifier
> through emitKey
> emitKey = new Key(new Text(entry.getKey()), new Text(), new
> Text(String.valueOf(entry.getValue())));
> //return 1 in emitValue.
> emitValue = new Value("1".getBytes());
> }
> catch(Exception e)
> {
> e.printStackTrace();
> }
> }
> }
>
> This code returning result while using scanner and but not in case of
> batchscanner.
> And how enable remote debugger in accumulo.
>
> Thanks
> Madhvi
>
> On Monday 15 June 2015 09:21 PM, Josh Elser wrote:
>> It's hard to remotely debug an iterator, especially when we don't know
>> what it's doing. If you can post the code, that would help
>> tremendously. Instead of dumping values to a text file, you may fare
>> better by attaching a remote debugger to the TabletServer and setting
>> a breakpoint on your SKVI.
>>
>> The only thing I can say is that a Scanner and BatchScanner should
>> return the same data, but the invocations in the server to fetch that
>> data are performed differently. It's likely that due to the
>> differences in the implementations, you uncovered a bug in your iterator.
>>
>> One common pitfall is incorrectly handling something we refer to as a
>> "re-seek". Hypothetically, take a query scanning over [0, 9], and we
>> have one key per number in the range (10 keys).
>>
>> As the name implies, the BatchScanner fetches batches from a server,
>> and suppose that after 3 keys, the server-side buffer fills up. Thus,
>> the client will get keys [0,2]. In the server, the next time you fetch
>> a batch, a new instance of the iterator will be constructed (via
>> deepCopy()). Seek() will then be called, but with a new range that
>> represents the previous data that was already returned. Thus, your
>> iterator would be seeked with (2,9] instead of [0,9] again.
>>
>> I can't say whether or not you're actually hitting this case, but it's
>> a common pitfall that affects devs.
>>
>> madhvi wrote:
>>> @josh
>>> If after hasTop and getTopKey, seek would have called then this should
>>> also be written in call hierarchy.
>>> Because i have written all the function hierarchy in a file.
>>> so the problem if i have called myFunction() in seek.
>>> And after seek getTopKey and getTopValue then hasTop and next should be
>>> called but what is happening sometime getTopValue is called sometime
>>> not. This is happening when i am reading entries through batchscanner.
>>> getTopValue function is called while scanning through scanner, Applying
>>> same iterator using scanner and batchsacnner, through scanner getting
>>> returned entries but getting no entries returned while using
>>> batchscanner.
>>>
>>> So can you please explain.
>

Mime
View raw message