chrom相等,strand='+',如果【start,stop】与区间【txstart-length,txstart+len】相交,则该记录就是我们需要的。
chrom相等,strand='-',如果【start,stop】与区间【txend-len,txend+length】相交,则该记录就是我们需要的。确定就是现在的条件,不会再变了。。
谢谢。
chrom相等,strand='-',如果【start,stop】与区间【txend-len,txend+length】相交,则该记录就是我们需要的。确定就是现在的条件,不会再变了。。
谢谢。
chrom相等,strand='-',如果【start,stop】与区间【txend-len,txend+length】相交,则该记录就是我们需要的。===
这个len刚才没有吧,是不是应该是length ?
from refgene,g20100806_wjh_h_
where chrom=chromosome
and strand='+' and start<=txstart+len and stop>=txstart-length
union all
select chrom,start,stop,name,name2,txstart as tss,strand,seq_id,probeids,txstart,txend,cdsstart,cdsend
from refgene,g20100806_wjh_h_
where chrom=chromosome
and strand='-' and start<=txend+length and stop>txend-len
from refgene,g20100806_wjh_h_
where chrom=chromosome
and strand='+' and start-len<=txstart and stop+length>=txstart
union all
select chrom,start,stop,name,name2,txstart as tss,strand,seq_id,probeids,txstart,txend,cdsstart,cdsend
from refgene,g20100806_wjh_h_
where chrom=chromosome
and strand='-' and start-length<=txend and stop+len>txend
另外可以试这个的速度
select chrom,start,stop,name,name2,txstart as tss,strand,seq_id,probeids,txstart,txend,cdsstart,cdsend
from refgene,g20100806_wjh_h_
where chrom=chromosome
and (
(strand='+' and start-len<=txstart and stop+length>=txstart)
or
(strand='-' and start-length<=txend and stop+len>txend)
)
而且,如果条件是你上面的写法,索引为
refGene(chrom,strand,txstart)
refGene(chrom,strand,txend)
那么g20100806_wjh_h_表是用不上索引的。(这是最关键的,记录多,效率慢)我将所有情况列举出来供你参考:
当strand为‘+’:
start<=txstart-length and stop>=txstart-length or (相交的第一种形式)
start>=txstart-length and stop<=txstart+len or ((相交的第二种形式))
start<=txstart+len and stop>=txstart+len((相交的第三种形式))当strand为‘-’:
start<=txend-len and stop>=txend-len or (相交的第一种形式)
start>=txend-len and stop<=txend+length or (相交的第二种形式)
start<=txend+length and stop>=txend+length (相交的第三种形式)
如果我将refgene表按照strand的‘+/-’将表分为 两张表,然后再和g20100806_wjh_h_表进行连接,得到我要的数据,如果是这样你看可以利用上g20100806_wjh_h_表的索引么?
从你描述的逻辑上来说,你还少了一种相交模式!
start<=txstart-length and stop>=txstart+len这也是我为什么不愿意从你的代码中去分析你的业务逻辑的原因。 很显然你的代码和你的说明
chrom相等,strand='+',如果【start,stop】与区间【txstart-length,txstart+len】相交,则该记录就是我们需要的。 在逻辑上不不匹配的。现在我不知道以哪个为准了。
当strand为‘+’:
start<=txstart-length and stop>=txstart-length or (相交的第一种形式)
start>=txstart-length and stop<=txstart+len or ((相交的第二种形式))
start<=txstart+len and stop>=txstart+len((相交的第三种形式))当strand为‘-’:
start<=txend-len and stop>=txend-len or (相交的第一种形式)
start>=txend-len and stop<=txend+length or (相交的第二种形式)
start<=txend+length and stop>=txend+length (相交的第三种形式)
如果我将refgene表按照strand的‘+/-’将表分为 两张表(refgene_positive,refgene_negative),然后再和g20100806_wjh_h_表进行连接,得到我要的数据,如果是这样你看可以利用上g20100806_wjh_h_表的索引么?索引如下:
refgene_positive(chrom,txstart,txend);
refgene_negative(chrom,txstart,txend);
g20100806_wjh_h_ (chromosome,start,stop);sql语句:
select chrom,start,stop,name,name2,txstart as tss,strand,seq_id,probeids,txstart,txend,cdsstart,cdsend from refgene_positive,g20100806_wjh_h_
where refGene_positive.chrom=g20100806_wjh_h_.chromosome and start<=txstart-length and stop>=txstart-length
union
select chrom,start,stop,name,name2,txstart as tss,strand,seq_id,probeids,txstart,txend,cdsstart,cdsend from refgene_positive,g20100806_wjh_h_
where refGene_positive.chrom=g20100806_wjh_h_.chromosome and start>=txstart-length and stop<=txstart+len
union
select chrom,start,stop,name,name2,txstart as tss,strand,seq_id,probeids,txstart,txend,cdsstart,cdsend from refgene_positive,g20100806_wjh_h_
where refGene_positive.chrom=g20100806_wjh_h_.chromosome and start<=txstart+len and stop>=txstart+len
into outfile 'd:/temp_positive.txt';
select chrom,start,stop,name,name2,txstart as tss,strand,seq_id,probeids,txstart,txend,cdsstart,cdsend from refgene_negative,g20100806_wjh_h_
where refGene_negative.chrom=g20100806_wjh_h_.chromosome and start<=txend-len and stop>=txend-len
union
select chrom,start,stop,name,name2,txstart as tss,strand,seq_id,probeids,txstart,txend,cdsstart,cdsend from refgene_negative,g20100806_wjh_h_
where refGene_negative.chrom=g20100806_wjh_h_.chromosome and start>=txend-len and stop<=txend+length
union
select chrom,start,stop,name,name2,txstart as tss,strand,seq_id,probeids,txstart,txend,cdsstart,cdsend from refgene_negative,g20100806_wjh_h_
where refGene_negative.chrom=g20100806_wjh_h_.chromosome and start<=txend+length and stop>=txend+length
into outfile 'd:/temp_negative.txt';
不知道这样能否利用上大表的索引,速度是否会快很多?
你有什么其他的方法么???
strand='+' and start<=txstart+len and stop>=txstart-length这样,会方便很多。
from refgene,g20100806_wjh_h_
where chrom=chromosome
and strand='+' and start<=txstart+len and stop>=txstart-length
union all
select STRAIGHT_JOIN chrom,start,stop,name,name2,txstart as tss,strand,seq_id,probeids,txstart,txend,cdsstart,cdsend
from refgene,g20100806_wjh_h_
where chrom=chromosome
and strand='-' and start<=txend+length and stop>txend-len
以前利用小表的索引*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: g20100806_wjh_h_
type: ALL
possible_keys: ix_all,ix_chr
key: NULL
key_len: NULL
ref: NULL
rows: 385019
Extra:
*************************** 2. row ***************************
id: 1
select_type: PRIMARY
table: refgene
type: ref
possible_keys: ix_start,ix_end
key: ix_start
key_len: 258
ref: tss.g20100806_wjh_h_.chromosome,const
rows: 434
Extra: Using where
耗时为:1hour 40minute现在利用大表的索引为:*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: refgene_positive
type: ALL
possible_keys: ix_txstart,ix_txend
key: NULL
key_len: NULL
ref: NULL
rows: 18353
Extra:
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: g20100806_wjh_h_
type: ref
possible_keys: ix_all,ix_chr,ix_start,ix_stop
key: ix_all
key_len: 258
ref: tss.refgene_positive.chrom
rows: 8954
Extra: Using where
2 rows in set (0.00 sec)耗时为:1 hour 20 minute这样的速度无法接受啊......
请问有没有什么好的方法啊?其中的条件len和length都是变量,只能用表达式啊,这样的话也只能利用上其中的一个索引,不管怎么优化耗时也要1个小时,真慢。。
请指教?
另外以同样的方式贴 EXPLAIN, 不要用\G
root@localhost : tss Mon Sep 27 16:11:36 2010>show index from refgene_positive;
+------------------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+------------------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| refgene_positive | 1 | ix_txstart | 1 | chrom | A | 41 | NULL | NULL | | BTREE | |
| refgene_positive | 1 | ix_txstart | 2 | txStart | A | 18353 | NULL | NULL | | BTREE | |
| refgene_positive | 1 | ix_txend | 1 | chrom | A | 41 | NULL | NULL | | BTREE | |
| refgene_positive | 1 | ix_txend | 2 | txEnd | A | 18353 | NULL | NULL | | BTREE | |
+------------------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
4 rows in set (0.00 sec)root@localhost : tss Mon Sep 27 16:12:24 2010>show index from refgene_negative;
+------------------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+------------------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| refgene_negative | 1 | ix_txstart | 1 | chrom | A | 41 | NULL | NULL | | BTREE | |
| refgene_negative | 1 | ix_txstart | 2 | txStart | A | 17687 | NULL | NULL | | BTREE | |
| refgene_negative | 1 | ix_txend | 1 | chrom | A | 41 | NULL | NULL | | BTREE | |
| refgene_negative | 1 | ix_txend | 2 | txEnd | A | 17687 | NULL | NULL | | BTREE | |
+------------------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
4 rows in set (0.00 sec)root@localhost : tss Mon Sep 27 16:12:36 2010>show index from g20160806_wjh_h_;
ERROR 1146 (42S02): Table 'tss.g20160806_wjh_h_' doesn't exist
root@localhost : tss Mon Sep 27 16:13:05 2010>show index from g20100806_wjh_h_;
+------------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+------------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| g20100806_wjh_h_ | 1 | ix_all | 1 | chromosome | A | 42 | NULL | NULL | YES | BTREE | |
| g20100806_wjh_h_ | 1 | ix_all | 2 | start | A | 385019 | NULL | NULL | YES | BTREE | |
| g20100806_wjh_h_ | 1 | ix_all | 3 | stop | A | 385019 | NULL | NULL | YES | BTREE | |
| g20100806_wjh_h_ | 1 | ix_chr | 1 | chromosome | A | 42 | NULL | NULL | YES | BTREE | |
| g20100806_wjh_h_ | 1 | ix_start | 1 | start | A | 385019 | NULL | NULL | YES | BTREE | |
| g20100806_wjh_h_ | 1 | ix_stop | 1 | stop | A | 385019 | NULL | NULL | YES | BTREE | |
+------------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
6 rows in set (0.00 sec)
root@localhost : tss Mon Sep 27 16:13:31 2010>explain select STRAIGHT_JOIN chrom,start,stop,name,name2,txstart as tss,strand,seq_id,probeids,txstart,txend,cdsstart,cdsend from refgene_positive,g20100806_wjh_h_
-> where refGene_positive.chrom=g20100806_wjh_h_.chromosome and start<=txstart+20 and stop>=txstart-20;
+----+-------------+------------------+------+--------------------------------+--------+---------+----------------------------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------+------+--------------------------------+--------+---------+----------------------------+-------+-------------+
| 1 | SIMPLE | refgene_positive | ALL | ix_txstart,ix_txend | NULL | NULL | NULL | 18353 | |
| 1 | SIMPLE | g20100806_wjh_h_ | ref | ix_all,ix_chr,ix_start,ix_stop | ix_all | 258 | tss.refgene_positive.chrom | 8954 | Using where |
+----+-------------+------------------+------+--------------------------------+--------+---------+----------------------------+-------+-------------+
2 rows in set (0.00 sec)利用小表索引的explain语句我写不出,因为他不会利用小表的索引。上一楼说的小表索引是以前测试用的结果。
select chrom,start,stop,name,name2,txstart as tss,strand,seq_id,probeids,txstart,txend,cdsstart,cdsend from refgene_positive,g20100806_wjh_h_
where refGene_positive.chrom=g20100806_wjh_h_.chromosome and start<=txstart+len and stop>=txstart-length
into outfile 'f:/temp_positive.txt';
select chrom,start,stop,name,name2,txstart as tss,strand,seq_id,probeids,txstart,txend,cdsstart,cdsend from refgene_negative,g20100806_wjh_h_
where refGene_negative.chrom=g20100806_wjh_h_.chromosome and start<=txend+length and stop>=txend-len
into outfile 'f:/temp_negative.txt';
from refgene_positive,g20100806_wjh_h_
where refGene_positive.chrom=g20100806_wjh_h_.chromosome
and start<=txstart+len
and stop>=txstart-length
strand='+' 哪去了? 不需要了?
这样主要是为了能够利用上大表的ix_all索引啊如果加上strand字段的话,大表的索引就用不上。
from refgene_positive,g20100806_wjh_h_
where refGene_positive.chrom=g20100806_wjh_h_.chromosome
and start<=txstart+20
and stop>=txstart-20; +----+-------------+------------------+------+--------------------------------+--------+---------+----------------------------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------+------+--------------------------------+--------+---------+----------------------------+-------+-------------+
| 1 | SIMPLE | refgene_positive | ALL | ix_txstart,ix_txend | NULL | NULL | NULL | 18353 | |
| 1 | SIMPLE | g20100806_wjh_h_ | ref | ix_all,ix_chr,ix_start,ix_stop | ix_all | 258 | tss.refgene_positive.chrom | 8954 | Using where |
+----+-------------+------------------+------+--------------------------------+--------+---------+----------------------------+-------+-------------+
这已经是正确的方法了。 在这个业务处理中,不可避免的需要对某个表进行全面扫描。 所以从SQL语句和索引方面来说,已经无法进一步优化了。
我可不可以创建这样的一个索引呢????
create index ix_diff on refgene_positive(chrom,(txstart-length),(txstart_len))呢??
请指教?
原则就是:以最小结果集驱动最大结果集是吧?