目前有三个表(其中一个是临时表),结构如下:
create table T_MH_MSISDN_HOSTIDS
(
LOG_DATE NUMBER(10),
MSISDN VARCHAR2 (20) NOT NULL,
HOSTID VARCHAR2 (30) NOT NULL,
PROC_FLAG NUMBER(1) default 0,
IN_TIME DATE default SYSDATE
);
create global temporary table TMP_MSISDN_HOSTIDS
(
MSISDN VARCHAR2 (20) NOT NULL,
HOSTIDS VARCHAR2 (4000) NOT NULL
)
on commit delete rows;
create table MSISDN_HOSTIDS
(
MSISDN VARCHAR2(20) not null,
HOSTIDS VARCHAR2(4000) not null
)
PRIMARY KEY ( MSISDN )
目前的需求是这样:
把T_MH_MSISDN_HOSTIDS中的所有记录(字段MSISDN和HOSTID)合并到表MSISDN_HOSTIDS(字段MSISDN和HOSTIDS)中,要求是:如果表MSISDN_HOSTIDS中对应的手机号码不存在,则直接插入,如果存在则更新字段HOSTIDS(多个HOSTID用','分割后存入HOSTIDS 如111,222,333最后没有,符号)
目前当一个小时的数据量达到400万的时候性能很差,不知道那位高手能搞个优化的算法,谢谢!
我写的代码如下:
PROCEDURE main(logdate IN NUMBER, username IN VARCHAR2)
IS
CURSOR cur_mh IS
SELECT msisdn, hostid
FROM t_mh_msisdn_hostids
WHERE log_date = logdate
AND proc_flag = 0;
tmp_sql VARCHAR2(1024);
BEGIN
/*put values into temporary table*/
FOR rec_mh IN cur_mh LOOP
MERGE INTO tmp_msisdn_hostids dest
USING (SELECT rec_mh.msisdn AS msisdn, rec_mh.hostid AS hostid
FROM dual) src
ON (src.msisdn = dest.msisdn)
WHEN MATCHED THEN
UPDATE SET dest.hostids = dest.hostids | | ',' | | src.hostid
WHEN NOT MATCHED THEN
INSERT VALUES (src.msisdn, src.hostid);
END LOOP;
/*merge into msisdn_hostids*/
tmp_sql := 'MERGE INTO ' | | username | |
'.msisdn_hostids dest
USING (SELECT msisdn, hostids FROM tmp_msisdn_hostids) src
ON (src.msisdn = dest.msisdn)
WHEN MATCHED THEN
UPDATE SET dest.hostids = dest.hostids | | '','' | | src.hostids
WHEN NOT MATCHED THEN
INSERT VALUES (src.msisdn, src.hostids)';
EXECUTE IMMEDIATE tmp_sql;
/*update proc_flag*/
UPDATE t_mh_msisdn_hostids SET proc_flag = 1 WHERE log_date = logdate;
COMMIT;
EXCEPTION
WHEN OTHERS THEN
BEGIN
ROLLBACK;
END;
END main;
create table T_MH_MSISDN_HOSTIDS
(
LOG_DATE NUMBER(10),
MSISDN VARCHAR2 (20) NOT NULL,
HOSTID VARCHAR2 (30) NOT NULL,
PROC_FLAG NUMBER(1) default 0,
IN_TIME DATE default SYSDATE
);
create global temporary table TMP_MSISDN_HOSTIDS
(
MSISDN VARCHAR2 (20) NOT NULL,
HOSTIDS VARCHAR2 (4000) NOT NULL
)
on commit delete rows;
create table MSISDN_HOSTIDS
(
MSISDN VARCHAR2(20) not null,
HOSTIDS VARCHAR2(4000) not null
)
PRIMARY KEY ( MSISDN )
目前的需求是这样:
把T_MH_MSISDN_HOSTIDS中的所有记录(字段MSISDN和HOSTID)合并到表MSISDN_HOSTIDS(字段MSISDN和HOSTIDS)中,要求是:如果表MSISDN_HOSTIDS中对应的手机号码不存在,则直接插入,如果存在则更新字段HOSTIDS(多个HOSTID用','分割后存入HOSTIDS 如111,222,333最后没有,符号)
目前当一个小时的数据量达到400万的时候性能很差,不知道那位高手能搞个优化的算法,谢谢!
我写的代码如下:
PROCEDURE main(logdate IN NUMBER, username IN VARCHAR2)
IS
CURSOR cur_mh IS
SELECT msisdn, hostid
FROM t_mh_msisdn_hostids
WHERE log_date = logdate
AND proc_flag = 0;
tmp_sql VARCHAR2(1024);
BEGIN
/*put values into temporary table*/
FOR rec_mh IN cur_mh LOOP
MERGE INTO tmp_msisdn_hostids dest
USING (SELECT rec_mh.msisdn AS msisdn, rec_mh.hostid AS hostid
FROM dual) src
ON (src.msisdn = dest.msisdn)
WHEN MATCHED THEN
UPDATE SET dest.hostids = dest.hostids | | ',' | | src.hostid
WHEN NOT MATCHED THEN
INSERT VALUES (src.msisdn, src.hostid);
END LOOP;
/*merge into msisdn_hostids*/
tmp_sql := 'MERGE INTO ' | | username | |
'.msisdn_hostids dest
USING (SELECT msisdn, hostids FROM tmp_msisdn_hostids) src
ON (src.msisdn = dest.msisdn)
WHEN MATCHED THEN
UPDATE SET dest.hostids = dest.hostids | | '','' | | src.hostids
WHEN NOT MATCHED THEN
INSERT VALUES (src.msisdn, src.hostids)';
EXECUTE IMMEDIATE tmp_sql;
/*update proc_flag*/
UPDATE t_mh_msisdn_hostids SET proc_flag = 1 WHERE log_date = logdate;
COMMIT;
EXCEPTION
WHEN OTHERS THEN
BEGIN
ROLLBACK;
END;
END main;
我觉得你这个问题可以主要考虑如下两点:
1.一定要处理增量的变化数据,比如,只记录那些新增的,删除的,修改的数据的id
2.如果增量数据仍然有很多,可以先把这些增量数据做一次merge,即把id 相同的多行处理成一行,然后再把这部分数据合并到MSISDN_HOSTIDS 中去;另外,上面提到的forall 也是个不错的选择