解决方案 »
- 首批可信云服务认证名单出炉 天翼云领跑政务云应用
- 智慧城市 简单生活
- 想请教一个问题,比如nginx在一个container中正在运行,现在我要修改容器中的nginx.conf,增加一个子域名,那么我该怎么做? 难道新建一个ima
- hadoop安装问题
- 自己搭建视频服务器后,外网用户访问卡
- 如何配置才能使用域名解析
- 想咨询下RAID可以热拔看硬盘型号吗?
- ambari 安装的spark 怎么使用Standalone方式提交任务
- 为什么在Flume中加入自定义interceptor会报出ClassNotFoundException
- 有没有人spark on yarn 模式,开发spark streaming
- Swift应用的案例
- 云服务:增长迅速,运营商可发展泛终端、富媒体经营模式
This is where the magic happened - let’s look at that class again.static class KeyBasedMultipleTextOutputFormat extends MultipleTextOutputFormat<Text, Text> {
@Override
protected String generateFileNameForKeyValue(Text key, Text value, String name) {
return key.toString() + "/" + name;
}
}
You are working with text, which is why you extended MultipleTextOutputFormat, a class that in turn extends MultipleOutputFormat. MultipleTextOutputFormat is a simple class which instructs the MultipleOutputFormat to use TextOutputFormat as the underlying output format for writing out the records. If you were to use MultipleOutputFormat as-is it behaves as if you were using the regular TextOutputFormat, which is to say that it’ll only write to a single output file. To write data to multiple files you had to extend it, as with the example above.The generateFileNameForKeyValue method allows you to return the output path for an input record. The third argument, name, is the original FileOutputFormat-created filename, which is in the form “part-NNNNN”, where “NNNNN” is the task index, to ensure uniqueness. To avoid file collisions, it’s a good idea to make sure your generated output paths are unique, and leveraging the original output file is certainly a good way of doing this. In our example we’re using the key as the directory name, and then writing to the original FileOutputFormat filename within that directory.