kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amit Adhau <amit.ad...@globant.com>
Subject Re: Kudu Storage size mismatch
Date Mon, 25 Apr 2016 18:07:58 GMT
Thanks a lot Todd for quick response.

Answers to your queries are inline in green, along with few more queries;

On Mon, Apr 25, 2016 at 10:57 PM, Todd Lipcon <todd@cloudera.com> wrote:

> Hi Amit,
>
> Answers inline below:
>
> On Mon, Apr 25, 2016 at 10:12 AM, Amit Adhau <amit.adhau@globant.com>
> wrote:
>
>> Hi Kudu Team,
>>
>> I have queries related to the kudu storage structure.
>>
>> Few days back, we were able to restore the backup of kudu metadata and
>> data [almost 200GB] with loss of few data.
>>
>>
> Do you mean that you took a backup of the Kudu data folders using normal
> Linux backup tools like rsync/tar/etc? Was this just a test of a backup and
> restore scenario, or did you experience some problem with Kudu and
> therefore have to restore from some backup?
>

    Yes, we took a backup of data folders and for metadata took a backup of
master's directory using simple linux copy command. No, it was not a backup
and restore test neither a kudu issue, was having an serious issues in CDH
and server partition on the kudu master and slave server and forced to
re-install everything including kudu[didn't got chance to take backup using
impala too as it was also not working] at the same time having a challenge
of preserving the kudu data, hence took folders backup as mentioned.


> At present, if we look at the kudu tablet server dashboard the
>> observations are none of the parameters like overall mem-trackers(Memory
>> (detail)), overall memz(Memory (total)) or overall tablets On-disk size not
>> crosses 4-5 GB, however the tablet server folders details are as per below;
>>
>>
> The 'on disk size' is listed per tablet, so if you sum up all of those,
> you should have a total which is similar to the amount of data in the data/
> directories. Is that not the case?
>

    No, as mentioned data folder size is 185 GB, however the 'on disk size'
total sum does not exceed even 4-5 GB and that is the reason we are
wondering why there is a huge gap of 180~ GB between data folder and On
Disk Size, can you please suggest anything? as this can hep us in proper
kudu storage management in production.


> You can also check on a per-tablet-server basis how much space is used on
> disk by looking at the 'log_block_manager_bytes_under_management' metric.
> This is exposed in Cloudera Manager, or you can visit a URL like:
>
> http://my-tablet-server:8051/metrics?metrics=bytes_under
>
> which will dump the metric in JSON.
>

We get below number for the metric

"name": "log_block_manager_bytes_under_management",

         "value": 2421794

One thing
>
>
>> Instance - 4.0k
>> Data - 185GB
>> Wals - 3.5GB
>> Tablet-meta - 472k
>> consensus-meta - 204k
>>
>> Questions:
>>
>> 1. Does the Data folder is the one which holds the actual data, if yes
>> then does the link on tablet server dashboard overall tablets On-disk size
>> should be in sync with data folder size or not?
>>
>
> One thing to note here is that the design of our on-disk storage uses
> sparse files. In other words, the total logical size of the data files can
> be much larger than the actual size. Depending which backup and restore
> process you've used, it's possible that you ended up restoring a non-sparse
> file in place of the original sparse file, which would make it increase in
> size substantially.
>

   Backup and restore was done using the linux copy command, but does the
increase would be so substantial almost 180 GB gap? Is there a way to
understand the .metadata and .data files correctly so that we can either
remove any unwanted data or shrink it somehow.


> 2. What is the significance of folders 'Instance' and 'Consensus-meta'?
>>
>
> 'instance' is a file which uniquely identifies the server that is managing
> that storage directory. This allows us to detect when a drive has been
> reformatted or moved. You can dump the contents with the kudu-pbc-dump
> utility:
>
> [todd@vd0340 ~]$ sudo -u kudu kudu-pbc-dump  /data/1/kudu/instance
> Message 0
> -------
> uuid: "06b13a52b994419f986eee72165c5a0f"
> format_stamp: "Formatted at 2015-10-07 22:33:54 on
> vd0340.halxg.cloudera.com"
>
>
> The 'consensus-meta' files are bits of metadata that are necessary for the
> Raft consensus algorithm. They keep the last committed consensus
> 'configuration'. A simplified explanation of this is that it's the list of
> servers which are replicas of that tablet. You can also use the
> kudu-pbc-dump tool to dump these:
>
> [todd@vd0340 ~]$ sudo -u kudu kudu-pbc-dump
>  /data/14/kudu/consensus-meta/43364d26e2cf4bb48db13b7b29f459be
> Message 0
> -------
> committed_config {
>   opid_index: -1
>   local: false
>   peers {
>     permanent_uuid: "2fb5cdac22b0418bb2df456906e42eb4"
>     member_type: VOTER
>     last_known_addr {
>       host: "vd0238.halxg.cloudera.com"
>       port: 7050
>     }
>   }
>   peers {
>     permanent_uuid: "3c305734ab9d4e0ebfbd0def74841a5d"
>     member_type: VOTER
>     last_known_addr {
>       host: "vd0240.halxg.cloudera.com"
>       port: 7050
>     }
>   }
>   peers {
>     permanent_uuid: "06b13a52b994419f986eee72165c5a0f"
>     member_type: VOTER
>     last_known_addr {
>       host: "vd0340.halxg.cloudera.com"
>       port: 7050
>     }
>   }
> }
> current_term: 21
> voted_for: "06b13a52b994419f986eee72165c5a0f"
>
>
>
>> 3. Using impala-shell or kudu properties can we find out the size of the
>> specific tables directly?
>>
>>
> Unfortunately not yet. This would be useful for things like optimizing
> join order, but not currently supported.
>
> -Todd
>



-- 
Thanks & Regards,

*Amit Adhau* | Data Architect

*GLOBANT* | IND:+91 9821518132

[image: Facebook] <https://www.facebook.com/Globant>

[image: Twitter] <http://www.twitter.com/globant>

[image: Youtube] <http://www.youtube.com/Globant>

[image: Linkedin] <http://www.linkedin.com/company/globant>

[image: Pinterest] <http://pinterest.com/globant/>

[image: Globant] <http://www.globant.com/>

-- 


The information contained in this e-mail may be confidential. It has been 
sent for the sole use of the intended recipient(s). If the reader of this 
message is not an intended recipient, you are hereby notified that any 
unauthorized review, use, disclosure, dissemination, distribution or 
copying of this communication, or any of its contents, 
is strictly prohibited. If you have received it by mistake please let us 
know by e-mail immediately and delete it from your system. Many thanks.

 

La información contenida en este mensaje puede ser confidencial. Ha sido 
enviada para el uso exclusivo del destinatario(s) previsto. Si el lector de 
este mensaje no fuera el destinatario previsto, por el presente queda Ud. 
notificado que cualquier lectura, uso, publicación, diseminación, 
distribución o copiado de esta comunicación o su contenido está 
estrictamente prohibido. En caso de que Ud. hubiera recibido este mensaje 
por error le agradeceremos notificarnos por e-mail inmediatamente y 
eliminarlo de su sistema. Muchas gracias.


Mime
View raw message