100分求助sql语言问题

mysql数据库中存在两张表，一张是由（源ip，目的ip，协议，源端口，目的端口）五元组构成的流表（flow_table）
另一张是由（ip，端口，所连ip数量）三元组构成的ip_port表，第二张表主要是根据第一张表生成，
即得出第一张表中不同的{ip，port}有那些，以及每个{ip，port}连接了多少个不同的ip。由于两张表都较大，都是数百万行的，如果采用如下sql语句，执行时间都在3小时左右，不能忍受。
insert into ip_port
select a.ip, a.port, count(*) from
(select sip as ip, sport as port, dip from flow_table
union
select dip as ip, dport as port, sip from flow_table) a
group by a.ip, a.port;
经过分析最耗时间的语句在于union，这时是将两个数百万行的表进行对比，去除重复表项。
曾经试图采用以下方法来获取那个中间表a,但是结果一样很糟糕。create table temp (ip int unsigned,port smallint unsigned,pairip int unsigned);
insert into temp select sip , sport , dip from flow_table；
insert into temp select dip , dport , sip from flow_table；
create tabel a (ip int unsigned,port smallint unsigned,pairip int unsigned);
insert into a select distinct * from temp;不知哪位高人有什么好的方法，可以优化，缩短执行时间，谢谢！

解决方案 »

免费领取超大流量手机卡，每月29元包185G流量+100分钟通话, 中国电信官方发货

如果sql语言本身不能解决问题，也可以考虑使用编程解决，但是利用mysql的c语言api，很难处理sql语句中的变量。
用sprintf可以组织sql语句中的变量啊，
int sprintf(char *str, const char *format, ...);
当然，最好是优化sql语句，考虑能不能优化表1的索引？