如题,如何利用spark去除重复列数据
例如下面的五行五列的数据集,可以判断出重复列为第3,第4列,
然后去除第4列。
用java spark如何来实现,求高手帮忙,不胜感激。
{"ab","12","12","12","12"},
{"12","10","10","10","11"},
{"11","09","08","08","10"},
{"10","09","08","08","10"},
{"10","09","08","08","10"}