有个大文件的每一行是中文或者英文单词混合(或者说key是中文),相对其进行全局排序,使用hadoop-example terasort时出现如下错误,是不是中文key排序要修改Trie树的实现啊....
14/03/06 16:13:46 INFO terasort.TeraSort: starting
14/03/06 16:13:47 INFO mapred.FileInputFormat: Total input paths to process : 1
14/03/06 16:13:47 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
14/03/06 16:13:47 INFO compress.CodecPool: Got brand-new compressor [.deflate]
Making 2 from 13 records
Step size is 6.5
14/03/06 16:13:47 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/03/06 16:13:47 INFO mapred.JobClient: Running job: job_201403061609_0004
14/03/06 16:13:48 INFO mapred.JobClient: map 0% reduce 0%
14/03/06 16:13:55 INFO mapred.JobClient: map 50% reduce 0%
14/03/06 16:13:57 INFO mapred.JobClient: Task Id : attempt_201403061609_0004_m_000000_0, Status : FAILED
java.lang.ArrayIndexOutOfBoundsException: -23
at org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner$InnerTrieNode.findPartition(TeraSort.java:91)
at org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner.getPartition(TeraSort.java:221)
at org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner.getPartition(TeraSort.java:57)
at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:526)
at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:38)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method) [59/1965]
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child
14/03/06 16:14:04 INFO mapred.JobClient: Task Id : attempt_201403061609_0004_m_000000_1, Status : FAILED
java.lang.ArrayIndexOutOfBoundsException: -23
at org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner$InnerTrieNode.findPartition(TeraSort.java:91)
at org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner.getPartition(TeraSort.java:221)
at org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner.getPartition(TeraSort.java:57)
at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:526)
at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:38)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child
14/03/06 16:14:09 INFO mapred.JobClient: Task Id : attempt_201403061609_0004_m_000000_2, Status : FAILED
java.lang.ArrayIndexOutOfBoundsException: -23 //
at org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner$InnerTrieNode.findPartition(TeraSort.java:91)
at org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner.getPartition(TeraSort.java:221)
at org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner.getPartition(TeraSort.java:57)
at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:526)
at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:38)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at java.security.AccessController.doPrivileged(Native Method) [29/1965]
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child
14/03/06 16:14:15 INFO mapred.JobClient: Job complete: job_201403061609_0004
14/03/06 16:14:15 INFO mapred.JobClient: Counters: 29
14/03/06 16:14:15 INFO mapred.JobClient: File System Counters
14/03/06 16:14:15 INFO mapred.JobClient: FILE: Number of bytes read=156
14/03/06 16:14:15 INFO mapred.JobClient: FILE: Number of bytes written=171585
14/03/06 16:14:15 INFO mapred.JobClient: FILE: Number of read operations=0
14/03/06 16:14:15 INFO mapred.JobClient: FILE: Number of large read operations=0
14/03/06 16:14:15 INFO mapred.JobClient: FILE: Number of write operations=0
14/03/06 16:14:15 INFO mapred.JobClient: HDFS: Number of bytes read=266
14/03/06 16:14:15 INFO mapred.JobClient: HDFS: Number of bytes written=0
14/03/06 16:14:15 INFO mapred.JobClient: HDFS: Number of read operations=2
14/03/06 16:14:15 INFO mapred.JobClient: HDFS: Number of large read operations=0
14/03/06 16:14:15 INFO mapred.JobClient: HDFS: Number of write operations=0
14/03/06 16:14:15 INFO mapred.JobClient: Job Counters
14/03/06 16:14:15 INFO mapred.JobClient: Failed map tasks=1
14/03/06 16:14:15 INFO mapred.JobClient: Launched map tasks=5
14/03/06 16:14:15 INFO mapred.JobClient: Data-local map tasks=5
14/03/06 16:14:15 INFO mapred.JobClient: Total time spent by all maps in occupied slots (ms)=26032
14/03/06 16:14:15 INFO mapred.JobClient: Total time spent by all reduces in occupied slots (ms)=0
14/03/06 16:14:15 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
14/03/06 16:14:15 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
14/03/06 16:14:15 INFO mapred.JobClient: Map-Reduce Framework
14/03/06 16:14:15 INFO mapred.JobClient: Map input records=7
14/03/06 16:14:15 INFO mapred.JobClient: Map output records=7
14/03/06 16:14:15 INFO mapred.JobClient: Map output bytes=95
14/03/06 16:14:15 INFO mapred.JobClient: Input split bytes=105 [0/1965]
14/03/06 16:14:15 INFO mapred.JobClient: Combine input records=0
14/03/06 16:14:15 INFO mapred.JobClient: Combine output records=0
14/03/06 16:14:15 INFO mapred.JobClient: Spilled Records=7
14/03/06 16:14:15 INFO mapred.JobClient: CPU time spent (ms)=410
14/03/06 16:14:15 INFO mapred.JobClient: Physical memory (bytes) snapshot=376647680
14/03/06 16:14:15 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1845735424
14/03/06 16:14:15 INFO mapred.JobClient: Total committed heap usage (bytes)=331022336
14/03/06 16:14:15 INFO mapred.JobClient: org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter
14/03/06 16:14:15 INFO mapred.JobClient: BYTES_READ=88
14/03/06 16:14:15 INFO mapred.JobClient: Job Failed: NA
java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1388)
at org.apache.hadoop.examples.terasort.TeraSort.run(TeraSort.java:248)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.examples.terasort.TeraSort.main(TeraSort.java:257)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:144)
at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:64)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
14/03/06 16:13:46 INFO terasort.TeraSort: starting
14/03/06 16:13:47 INFO mapred.FileInputFormat: Total input paths to process : 1
14/03/06 16:13:47 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
14/03/06 16:13:47 INFO compress.CodecPool: Got brand-new compressor [.deflate]
Making 2 from 13 records
Step size is 6.5
14/03/06 16:13:47 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/03/06 16:13:47 INFO mapred.JobClient: Running job: job_201403061609_0004
14/03/06 16:13:48 INFO mapred.JobClient: map 0% reduce 0%
14/03/06 16:13:55 INFO mapred.JobClient: map 50% reduce 0%
14/03/06 16:13:57 INFO mapred.JobClient: Task Id : attempt_201403061609_0004_m_000000_0, Status : FAILED
java.lang.ArrayIndexOutOfBoundsException: -23
at org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner$InnerTrieNode.findPartition(TeraSort.java:91)
at org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner.getPartition(TeraSort.java:221)
at org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner.getPartition(TeraSort.java:57)
at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:526)
at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:38)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method) [59/1965]
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child
14/03/06 16:14:04 INFO mapred.JobClient: Task Id : attempt_201403061609_0004_m_000000_1, Status : FAILED
java.lang.ArrayIndexOutOfBoundsException: -23
at org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner$InnerTrieNode.findPartition(TeraSort.java:91)
at org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner.getPartition(TeraSort.java:221)
at org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner.getPartition(TeraSort.java:57)
at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:526)
at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:38)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child
14/03/06 16:14:09 INFO mapred.JobClient: Task Id : attempt_201403061609_0004_m_000000_2, Status : FAILED
java.lang.ArrayIndexOutOfBoundsException: -23 //
at org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner$InnerTrieNode.findPartition(TeraSort.java:91)
at org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner.getPartition(TeraSort.java:221)
at org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner.getPartition(TeraSort.java:57)
at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:526)
at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:38)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at java.security.AccessController.doPrivileged(Native Method) [29/1965]
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child
14/03/06 16:14:15 INFO mapred.JobClient: Job complete: job_201403061609_0004
14/03/06 16:14:15 INFO mapred.JobClient: Counters: 29
14/03/06 16:14:15 INFO mapred.JobClient: File System Counters
14/03/06 16:14:15 INFO mapred.JobClient: FILE: Number of bytes read=156
14/03/06 16:14:15 INFO mapred.JobClient: FILE: Number of bytes written=171585
14/03/06 16:14:15 INFO mapred.JobClient: FILE: Number of read operations=0
14/03/06 16:14:15 INFO mapred.JobClient: FILE: Number of large read operations=0
14/03/06 16:14:15 INFO mapred.JobClient: FILE: Number of write operations=0
14/03/06 16:14:15 INFO mapred.JobClient: HDFS: Number of bytes read=266
14/03/06 16:14:15 INFO mapred.JobClient: HDFS: Number of bytes written=0
14/03/06 16:14:15 INFO mapred.JobClient: HDFS: Number of read operations=2
14/03/06 16:14:15 INFO mapred.JobClient: HDFS: Number of large read operations=0
14/03/06 16:14:15 INFO mapred.JobClient: HDFS: Number of write operations=0
14/03/06 16:14:15 INFO mapred.JobClient: Job Counters
14/03/06 16:14:15 INFO mapred.JobClient: Failed map tasks=1
14/03/06 16:14:15 INFO mapred.JobClient: Launched map tasks=5
14/03/06 16:14:15 INFO mapred.JobClient: Data-local map tasks=5
14/03/06 16:14:15 INFO mapred.JobClient: Total time spent by all maps in occupied slots (ms)=26032
14/03/06 16:14:15 INFO mapred.JobClient: Total time spent by all reduces in occupied slots (ms)=0
14/03/06 16:14:15 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
14/03/06 16:14:15 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
14/03/06 16:14:15 INFO mapred.JobClient: Map-Reduce Framework
14/03/06 16:14:15 INFO mapred.JobClient: Map input records=7
14/03/06 16:14:15 INFO mapred.JobClient: Map output records=7
14/03/06 16:14:15 INFO mapred.JobClient: Map output bytes=95
14/03/06 16:14:15 INFO mapred.JobClient: Input split bytes=105 [0/1965]
14/03/06 16:14:15 INFO mapred.JobClient: Combine input records=0
14/03/06 16:14:15 INFO mapred.JobClient: Combine output records=0
14/03/06 16:14:15 INFO mapred.JobClient: Spilled Records=7
14/03/06 16:14:15 INFO mapred.JobClient: CPU time spent (ms)=410
14/03/06 16:14:15 INFO mapred.JobClient: Physical memory (bytes) snapshot=376647680
14/03/06 16:14:15 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1845735424
14/03/06 16:14:15 INFO mapred.JobClient: Total committed heap usage (bytes)=331022336
14/03/06 16:14:15 INFO mapred.JobClient: org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter
14/03/06 16:14:15 INFO mapred.JobClient: BYTES_READ=88
14/03/06 16:14:15 INFO mapred.JobClient: Job Failed: NA
java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1388)
at org.apache.hadoop.examples.terasort.TeraSort.run(TeraSort.java:248)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.examples.terasort.TeraSort.main(TeraSort.java:257)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:144)
at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:64)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
解决方案 »
- openstack 创建windows实例硬盘写性能低下
- gsdfgsdg
- 参与HP Helion有奖问答活动 赢京东卡
- 老调重弹,该怎么做呢?类似IM的软件,架构上面怎么设计?
- 云计算教程Hadoop视频教程完全版免费下载
- docker run找不到容器?报no such file or directory
- 请问大神docker stack volumes怎么配置共享同一目录
- 部署OS时总是提示警告没有找到介质
- 如何将docker的容器的卷空间值扩容(不要求扩容文 件系统),最后查看容器的卷空间值?
- spark远程连接运行程序报错,有哪位大佬能说下怎么解决么?
- 华为fusionCompute和Vmware虚拟化能力哪个更强?
- 关于外网IP地址不变实现冗余备份
这个是在findPartition中出现的,实际上是对中文key 处理时超过acscii值导致的