drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Rogers <prog...@mapr.com>
Subject Re: Data types
Date Fri, 27 Jan 2017 07:11:22 GMT
Looks like I gave you advice that as a bit off. The function you want is either:

            this.buffer = fragmentContext.getManagedBuffer();

The above allocates a 256 byte buffer. You can initially allocate a larger one:

            this.buffer = fragmentContext.getManagedBuffer(4096);

Or, to reallocate:

           buffer = fragmentContext.replace(buffer, 8192);

Again, I’ve not used these method myself, but they seem they might do the trick.

- Paul

> On Jan 26, 2017, at 9:51 PM, Charles Givre <cgivre@gmail.com> wrote:
> 
> Thanks!  I’m hoping to submit a PR eventually once I have this all done.  I tried your
changes and now I’m getting this error:
> 
> 0: jdbc:drill:zk=local> select * from dfs.client.`small.misolog`;
> Error: DATA_READ ERROR: Tried to remove unmanaged buffer.
> 
> Fragment 0:0
> 
> [Error Id: 52fc846a-1d94-4300-bcb4-7000d0949b3c on charless-mbp-2.fios-router.home:31010]
(state=,code=0)
> 
> 
> 
> 
>> On Jan 26, 2017, at 23:08, Paul Rogers <progers@mapr.com> wrote:
>> 
>> Hi Charles,
>> 
>> Very cool plugin!
>> 
>> My knowledge in this area is a bit sketchy… That said, the problem appears to be
that the code does not extend the Drillbuf to ensure it has sufficient capacity. Try calling
this method: reallocIfNeeded, something like this:
>> 
>>      this.buffer.reallocIfNeeded(stringLength);
>>      this.buffer.setBytes(0, bytes, 0, stringLength);
>>      map.varChar(fieldName).writeVarChar(0, stringLength, buffer);
>> 
>> Then, comment out the 256 length hack and see if it works.
>> 
>> To avoid memory fragmentation, maybe change your loop as:
>> 
>>           int maxRecords = MAX_RECORDS_PER_BATCH;
>>           int maxWidth = 256;
>>           while(recordCount < maxRecords &&(line = this.reader.readLine())
!= null){
>>           …
>>              if(stringLength > maxWidth) {
>>                 maxWidth = stringLength;
>>                 maxRecords = 16 * 1024 * 1024 / maxWidth;
>>              }
>> 
>> The above is not perfect (the last record added might be much larger than the others,
causing the corresponding vector to grow larger than 16 MB, but the occasional large vector
should be OK.)
>> 
>> Thanks,
>> 
>> - Paul
>> 
>> On Jan 26, 2017, at 5:31 PM, Charles Givre <cgivre@gmail.com<mailto:cgivre@gmail.com>>
wrote:
>> 
>> Hi Paul,
>> Would you mind taking a look at my code?  I’m wondering if I’m doing this correctly.
 Just for context, I’m working on a generic log file reader for drill (https://github.com/cgivre/drill-logfile-plugin
<https://github.com/cgivre/drill-logfile-plugin>), and I encountered some errors when
working with fields that were > 256 characters long.  It isn’t a storage plugin, but
it extends the EasyFormatPlugin.
>> 
>> I added some code to truncate the strings to 256 chars, it worked.  Before this it
was throwing errors as shown below:
>> 
>> 
>> 
>> Error: DATA_READ ERROR: index: 0, length: 430 (expected: range(0, 256))
>> 
>> Fragment 0:0
>> 
>> [Error Id: b2250326-f983-440c-a73c-4ef4a6cf3898 on charless-mbp-2.fios-router.home:31010]
(state=,code=0)
>> 
>> 
>> The query that generated this was just a SELECT * FROM dfs.`file`.  Also, how do
I set the size of each row batch?
>> Thank you for your help.
>> — C
>> 
>> 
>> if (m.find()) {
>>  for( int i = 1; i <= m.groupCount(); i++ )
>>  {
>>      //TODO Add option for date fields
>>      String fieldName  = fieldNames.get(i - 1);
>>      String fieldValue;
>> 
>>      fieldValue = m.group(i);
>> 
>>      if( fieldValue == null){
>>          fieldValue = "";
>>      }
>>      byte[] bytes = fieldValue.getBytes("UTF-8");
>> 
>> //Added this and it worked….
>>      int stringLength = bytes.length;
>>      if( stringLength > 256 ){
>>          stringLength = 256;
>>      }
>> 
>>      this.buffer.setBytes(0, bytes, 0, stringLength);
>>      map.varChar(fieldName).writeVarChar(0, stringLength, buffer);
>>  }
>> 
>> 
>> 
>> 
>> On Jan 26, 2017, at 20:20, Paul Rogers <progers@mapr.com<mailto:progers@mapr.com>>
wrote:
>> 
>> Hi Charles,
>> 
>> The Varchar column can hold any length of data. We’ve recently been working on
tests that have columns up to 8K in length.
>> 
>> The one caveat is that, when working with data larger than 256 bytes, you must be
extremely careful in your reader. The out-of-box text reader will always read 64K rows. This
(due to various issues) can cause memory fragmentation and OOM errors when used with columns
greater than 256 bytes in width.
>> 
>> If you are developing your own storage plugin, then adjust the size of each row batch
so that no single vector is larger than 16 MB in size. Then you can use any size of column.
>> 
>> Suppose your logs contain text lines up to, say, 1K in size. This means that each
record batch your reader produces must be of size less than 16 MB / 1K / row = 1600 rows (rather
than the usual 64K.)
>> 
>> Once the data is in the Varchar column, the rest of Drill should “just work”
on that data.
>> 
>> - Paul
>> 
>> On Jan 26, 2017, at 4:11 PM, Charles Givre <cgivre@gmail.com<mailto:cgivre@gmail.com>>
wrote:
>> 
>> I’m working on a plugin to read log files and the data has some long strings. 
Is there a data type that can hold strings longer than 256 characters?
>> Thanks,
>> — Charles
>> 
>> 
>> 
> 

Mime
View raw message