如题,如何利用spark去除重复列数据
例如下面的五行五列的数据集,可以判断出重复列为第3,第4列,
然后去除第4列。
用java spark如何来实现,求高手帮忙,不胜感激。
{"ab","12","12","12","12"},
{"12","10","10","10","11"},
{"11","09","08","08","10"},
{"10","09","08","08","10"},
{"10","09","08","08","10"}
例如下面的五行五列的数据集,可以判断出重复列为第3,第4列,
然后去除第4列。
用java spark如何来实现,求高手帮忙,不胜感激。
{"ab","12","12","12","12"},
{"12","10","10","10","11"},
{"11","09","08","08","10"},
{"10","09","08","08","10"},
{"10","09","08","08","10"}
解决方案 »
免费领取超大流量手机卡,每月29元包185G流量+100分钟通话, 中国电信官方发货