按官方文档写了以下代码: //读取数据并转化成rdd
val hBaseRDD = sc.newAPIHadoopRDD(conf, classOf[TableInputFormat],
classOf[org.apache.hadoop.hbase.io.ImmutableBytesWritable],
classOf[org.apache.hadoop.hbase.client.Result]) val count = hBaseRDD.count()
println(count) hBaseRDD.foreach{case (_,result) =>{
//获取行键
val key = Bytes.toString(result.getRow)
//通过列族和列名获取列
val name = Bytes.toString(result.getValue("A".getBytes,"姓名".getBytes))
val age = Bytes.toInt(result.getValue("A".getBytes,"年龄".getBytes))
println("Row key:"+key+"FileName :"+姓名+" 年龄:"+age)
}}执行成功。但是未能打印出每条数据的具体信息,日志如下:18/05/14 16:26:50 INFO DAGScheduler: ResultStage 0 (count at test.scala:64) finished in 2.515 s
18/05/14 16:26:50 INFO DAGScheduler: Job 0 finished: count at test.scala:64, took 2.642359 s
38
18/05/14 16:26:50 INFO SparkContext: Starting job: foreach at test.scala:71
18/05/14 16:26:50 INFO DAGScheduler: Got job 1 (foreach at test.scala:71) with 1 output partitions
18/05/14 16:26:50 INFO DAGScheduler: Final stage: ResultStage 1 (foreach at test.scala:71)
18/05/14 16:26:50 INFO DAGScheduler: Parents of final stage: List()
18/05/14 16:26:50 INFO DAGScheduler: Missing parents: List()
18/05/14 16:26:50 INFO DAGScheduler: Submitting ResultStage 1 (NewHadoopRDD[0] at newAPIHadoopRDD at test.scala:60), which has no missing parents
18/05/14 16:26:50 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 2.1 KB, free 897.2 MB)
18/05/14 16:26:50 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 1334.0 B, free 897.2 MB)
18/05/14 16:26:50 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 10.251.6.153:56001 (size: 1334.0 B, free: 897.6 MB)
18/05/14 16:26:50 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1006
18/05/14 16:26:50 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 1 (NewHadoopRDD[0] at newAPIHadoopRDD at test.scala:60) (first 15 tasks are for partitions Vector(0))
18/05/14 16:26:50 INFO TaskSchedulerImpl: Adding task set 1.0 with 1 tasks
18/05/14 16:26:50 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, 10.124.130.14, executor 2, partition 0, ANY, 4919 bytes)
18/05/14 16:26:50 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 10.124.130.14:54410 (size: 1334.0 B, free: 366.3 MB)
18/05/14 16:26:51 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 10.124.130.14:54410 (size: 29.8 KB, free: 366.3 MB)
18/05/14 16:26:52 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in 2453 ms on 10.124.130.14 (executor 2) (1/1)
18/05/14 16:26:52 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool
18/05/14 16:26:52 INFO DAGScheduler: ResultStage 1 (foreach at test.scala:71) finished in 2.454 s
18/05/14 16:26:52 INFO DAGScheduler: Job 1 finished: foreach at test.scala:71, took 2.469872 s
红色字体是HBASE表的总行数,但是下面的foreach没有任何数据打印出来,查了好几天了没找到问题所在
请大神赐教
val hBaseRDD = sc.newAPIHadoopRDD(conf, classOf[TableInputFormat],
classOf[org.apache.hadoop.hbase.io.ImmutableBytesWritable],
classOf[org.apache.hadoop.hbase.client.Result]) val count = hBaseRDD.count()
println(count) hBaseRDD.foreach{case (_,result) =>{
//获取行键
val key = Bytes.toString(result.getRow)
//通过列族和列名获取列
val name = Bytes.toString(result.getValue("A".getBytes,"姓名".getBytes))
val age = Bytes.toInt(result.getValue("A".getBytes,"年龄".getBytes))
println("Row key:"+key+"FileName :"+姓名+" 年龄:"+age)
}}执行成功。但是未能打印出每条数据的具体信息,日志如下:18/05/14 16:26:50 INFO DAGScheduler: ResultStage 0 (count at test.scala:64) finished in 2.515 s
18/05/14 16:26:50 INFO DAGScheduler: Job 0 finished: count at test.scala:64, took 2.642359 s
38
18/05/14 16:26:50 INFO SparkContext: Starting job: foreach at test.scala:71
18/05/14 16:26:50 INFO DAGScheduler: Got job 1 (foreach at test.scala:71) with 1 output partitions
18/05/14 16:26:50 INFO DAGScheduler: Final stage: ResultStage 1 (foreach at test.scala:71)
18/05/14 16:26:50 INFO DAGScheduler: Parents of final stage: List()
18/05/14 16:26:50 INFO DAGScheduler: Missing parents: List()
18/05/14 16:26:50 INFO DAGScheduler: Submitting ResultStage 1 (NewHadoopRDD[0] at newAPIHadoopRDD at test.scala:60), which has no missing parents
18/05/14 16:26:50 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 2.1 KB, free 897.2 MB)
18/05/14 16:26:50 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 1334.0 B, free 897.2 MB)
18/05/14 16:26:50 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 10.251.6.153:56001 (size: 1334.0 B, free: 897.6 MB)
18/05/14 16:26:50 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1006
18/05/14 16:26:50 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 1 (NewHadoopRDD[0] at newAPIHadoopRDD at test.scala:60) (first 15 tasks are for partitions Vector(0))
18/05/14 16:26:50 INFO TaskSchedulerImpl: Adding task set 1.0 with 1 tasks
18/05/14 16:26:50 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, 10.124.130.14, executor 2, partition 0, ANY, 4919 bytes)
18/05/14 16:26:50 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 10.124.130.14:54410 (size: 1334.0 B, free: 366.3 MB)
18/05/14 16:26:51 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 10.124.130.14:54410 (size: 29.8 KB, free: 366.3 MB)
18/05/14 16:26:52 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in 2453 ms on 10.124.130.14 (executor 2) (1/1)
18/05/14 16:26:52 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool
18/05/14 16:26:52 INFO DAGScheduler: ResultStage 1 (foreach at test.scala:71) finished in 2.454 s
18/05/14 16:26:52 INFO DAGScheduler: Job 1 finished: foreach at test.scala:71, took 2.469872 s
红色字体是HBASE表的总行数,但是下面的foreach没有任何数据打印出来,查了好几天了没找到问题所在
请大神赐教
解决方案 »
- openstack虚拟机(windows2008)经常僵死???
- Openstack(Nova)只有在虚拟机实例(instance)销毁删除(terminated)的时候才会释放它占用的资源,是否有人知道这是为什么?
- 云计算价格战让亚马逊伤痕累累
- 持分求教两个地方的服务器之间实现数据共享的问题?
- 推荐一篇译文:《玩转 Dcoker:Hello World, 开发环境和你的应用》
- Docker新手问题
- hadoop namenode出现问题
- 不备案的网站需要用什么用的服务器
- hadoop 巨大二叉树
- 求助大神!搞了好几天 docker会自动把容器暂停的问题!!
- vMware 6.5出现存储链路掉线,多路径失效,需要重新插拔光纤跳线才能恢复,求大神分析日志。。。
- 请问一下,在使用idea中maven配置spark项目的时候,导包会出现这个问题
是什么鬼,你result里什么值都没吧,你怎么取值
我也怀疑过这个问题,但是应该不是:
1、上面的count可以得到正确的行数
2、网上的例子都是这样的https://blog.csdn.net/u013468917/article/details/52822074还有可能是什么原因?
hBaseRDD.collect().foreach{case (_,result) 应该就可以了