代码如下,我使用的工具是IDEA,语言python,使用python访问hbase提示如下错误
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD.
: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.io.ImmutableBytesWritable
from pyspark import SparkContext, SparkConf
import os
os.environ['JAVA_HOME'] = 'D:\Java\jdk1.8.0_92'
conf = SparkConf().setMaster("local").setAppName("spark_hbase_test")
sc = SparkContext(conf=conf)
host = 'devhadoop3.reachauto.com,devhadoop2.reachauto.com,devhadoop1.reachauto.com'
table = '2:IndexMessage'
conf = {"hbase.zookeeper.quorum": host, "hbase.mapreduce.inputtable": table}
keyConv = "org.apache.spark.examples.pythonconverters.ImmutableBytesWritableToStringConverter"
valueConv = "org.apache.spark.examples.pythonconverters.HBaseResultToStringConverter"
hbase_rdd = sc.newAPIHadoopRDD("org.apache.hadoop.hbase.mapreduce.TableInputFormat",
"org.apache.hadoop.hbase.io.ImmutableBytesWritable",
"org.apache.hadoop.hbase.client.Result", keyConverter=keyConv, valueConverter=valueConv,
conf=conf)
count = hbase_rdd
print(count)
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD.
: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.io.ImmutableBytesWritable
from pyspark import SparkContext, SparkConf
import os
os.environ['JAVA_HOME'] = 'D:\Java\jdk1.8.0_92'
conf = SparkConf().setMaster("local").setAppName("spark_hbase_test")
sc = SparkContext(conf=conf)
host = 'devhadoop3.reachauto.com,devhadoop2.reachauto.com,devhadoop1.reachauto.com'
table = '2:IndexMessage'
conf = {"hbase.zookeeper.quorum": host, "hbase.mapreduce.inputtable": table}
keyConv = "org.apache.spark.examples.pythonconverters.ImmutableBytesWritableToStringConverter"
valueConv = "org.apache.spark.examples.pythonconverters.HBaseResultToStringConverter"
hbase_rdd = sc.newAPIHadoopRDD("org.apache.hadoop.hbase.mapreduce.TableInputFormat",
"org.apache.hadoop.hbase.io.ImmutableBytesWritable",
"org.apache.hadoop.hbase.client.Result", keyConverter=keyConv, valueConverter=valueConv,
conf=conf)
count = hbase_rdd
print(count)
可行的话再使用pyspark应该就没问题了