hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wellington Chevreuil <wellington.chevre...@gmail.com>
Subject Re: How to use webhdfs CONCAT?
Date Thu, 27 Jul 2017 09:12:09 GMT
Yes, all the files passed must pre-exist. In this case, you would need to run something as
follows:

curl -i -X POST "http://HOST/webhdfs/v1/PATH_TO_YOUR_HDFS_FOLDER/part-01-000000-000?user.name=hadoop&op=CONCAT&sources=PATH_TO_YOUR_HDFS_FOLDER/part-02-000000-000,PATH_TO_YOUR_HDFS_FOLDER/part-04-000000-000"

Where these 3 files would be concatenated into PATH_TO_YOUR_HDFS_FOLDER/part-01-000000-000
file. Note that this will only work if the file sizes are exact multiples of "dfs.block.size".
If not, you may get another error.

> On 27 Jul 2017, at 10:06, Cinyoung Hur <cinyoung.hur@gmail.com> wrote:
> 
> Hi, Wellington
> 
> All the source parts are:
> -rw-r--r--	hadoop	supergroup	2.43 KB	2	32 MB	part-01-000000-000
> -rw-r--r--	hadoop	supergroup	21.14 MB	2	32 MB	part-02-000000-000
> -rw-r--r--	hadoop	supergroup	22.1 MB	2	32 MB	part-04-000000-000
> -rw-r--r--	hadoop	supergroup	22.29 MB	2	32 MB	part-05-000000-000
> -rw-r--r--	hadoop	supergroup	22.29 MB	2	32 MB	part-06-000000-000
> -rw-r--r--	hadoop	supergroup	22.56 MB	2	32 MB	part-07-000000-000
> 
> 
> I got this exception. It seems like I have to create target file before concatenation.
> 
> curl -i -X POST "http://HOST/webhdfs/v1/tajo/warehouse/hira_analysis/material_usage_concat?user.name=hadoop&op=CONCAT&sources=/tajo/warehouse/hira_analysis/material_usage
<http://host/webhdfs/v1/tajo/warehouse/hira_analysis/material_usage_concat?user.name=hadoop&op=CONCAT&sources=/tajo/warehouse/hira_analysis/material_usage>"
> HTTP/1.1 404 Not Found
> Date: Thu, 27 Jul 2017 09:05:48 GMT
> Server: Jetty(6.1.26)
> Content-Type: application/json
> Cache-Control: no-cache
> Expires: Thu, 27 Jul 2017 09:05:48 GMT
> Pragma: no-cache
> Expires: Thu, 27 Jul 2017 09:05:48 GMT
> Pragma: no-cache
> Set-Cookie: hadoop.auth="u=hadoop&p=hadoop&t=simple&e=1501182348739&s=o02nv4on4FXbhlijJ+R/KXvhooQ=";
Path=/; Expires=Thu, 27-Jul-2017 19:05:48 GMT; HttpOnly
> Transfer-Encoding: chunked
> 
> {"RemoteException":{"exception":"FileNotFoundException","javaClassName":"java.io.FileNotFoundException","message":"File
does not exist: /tajo/warehouse/hira_analysis/material_usage_concat"}}%        
> 
> Thanks!
> 
> 2017-07-26 0:54 GMT+09:00 Wellington Chevreuil <wellington.chevreuil@gmail.com <mailto:wellington.chevreuil@gmail.com>>:
> Hi Cinyoung, 
> 
> Concat has some restrictions, like the need for src file having last block size to be
the same as the configured dfs.block.size. If all the conditions are met, below command example
should work (where we are concatenating /user/root/file-2 into /user/root/file-1):
> 
> curl -i -X POST "http:HTTPFS_HOST:14000/webhdfs/v1/user/root/file-1?user.name <http://user.name/>=root&op=CONCAT&sources=/user/root/file-2"
> 
> Is this similar to what you had tried? Can you share the resulting output you are getting?
> 
> 
> 
>> On 25 Jul 2017, at 09:00, Cinyoung Hur <cinyoung.hur@gmail.com <mailto:cinyoung.hur@gmail.com>>
wrote:
>> 
>> https://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#Concat_Files
<https://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#Concat_Files>
>> 
>> I tried to concat multiple parts to single target file through webhdfs. 
>> But, I couldn't do it. 
>> Could you give me examples concatenating parts?
> 
> 


Mime
View raw message