When we verified that all data was inserted we found that some data was missing. We added this missing data and on some chunks we got the information that all rows were already present, i.e impala says something like Modified: 0 rows, nnnnnnn errors. Doing the verification again now shows that the Kudu table is complete. So, even though we did not insert any data on some chunks, a count(*) operation over these chunks now returns a different value.
How did you verify that all the data was inserted and how did you find some data missing? I'm wondering if it's possible that the initial "missing" data was data that Kudu was still in the process of inserting (albeit slowly, due to memory backpressure or somesuch).
Now to my question. Will data be inconsistent if we recycle Kudu after seeing soft memory limit warnings?
Your data should be consistently written, even with those warnings. AFAIK they would cause a bit of slowness, not incorrect results.
Is there a way to tell when it is safe to restart Kudu to avoid these issues? Should we use any special procedure when restarting (e.g. only restart the tablet servers, only restart one tablet server at a time or something like that)?