sql优化-续

chrom相等，strand='+',如果【start,stop】与区间【txstart-length,txstart+len】相交，则该记录就是我们需要的。
chrom相等，strand='-',如果【start,stop】与区间【txend-len,txend+length】相交，则该记录就是我们需要的。＝＝＝
这个len刚才没有吧，是不是应该是length ？

索引不变，还是那两个复合索引。语句可以试一下。select chrom,start,stop,name,name2,txstart as tss,strand,seq_id,probeids,txstart,txend,cdsstart,cdsend
from refgene,g20100806_wjh_h_
where chrom=chromosome
and strand='+' and start<=txstart+len and stop>=txstart-length
union all
select chrom,start,stop,name,name2,txstart as tss,strand,seq_id,probeids,txstart,txend,cdsstart,cdsend
from refgene,g20100806_wjh_h_
where chrom=chromosome
and strand='-' and start<=txend+length and stop>txend-len

改一下select chrom,start,stop,name,name2,txstart as tss,strand,seq_id,probeids,txstart,txend,cdsstart,cdsend
from refgene,g20100806_wjh_h_
where chrom=chromosome
and strand='+' and start-len<=txstart and stop+length>=txstart
union all
select chrom,start,stop,name,name2,txstart as tss,strand,seq_id,probeids,txstart,txend,cdsstart,cdsend
from refgene,g20100806_wjh_h_
where chrom=chromosome
and strand='-' and start-length<=txend and stop+len>txend
另外可以试这个的速度
select chrom,start,stop,name,name2,txstart as tss,strand,seq_id,probeids,txstart,txend,cdsstart,cdsend
from refgene,g20100806_wjh_h_
where chrom=chromosome
and (
(strand='+' and start-len<=txstart and stop+length>=txstart)
or
(strand='-' and start-length<=txend and stop+len>txend)
)

刚才的len不是length，从你的where条件中可以看出，两个区间的相交，你只考虑了一种情况。
而且，如果条件是你上面的写法，索引为
refGene(chrom,strand,txstart)
refGene(chrom,strand,txend)
那么g20100806_wjh_h_表是用不上索引的。（这是最关键的，记录多，效率慢）我将所有情况列举出来供你参考：
当strand为‘+’：
start<=txstart-length and stop>=txstart-length  or    （相交的第一种形式）
start>=txstart-length and stop<=txstart+len    or      （（相交的第二种形式））
start<=txstart+len and stop>=txstart+len（（相交的第三种形式））当strand为‘-’：
start<=txend-len and stop>=txend-len   or    （相交的第一种形式）
start>=txend-len and stop<=txend+length    or  （相交的第二种形式）
start<=txend+length and stop>=txend+length   （相交的第三种形式）
如果我将refgene表按照strand的‘+/-’将表分为两张表，然后再和g20100806_wjh_h_表进行连接，得到我要的数据，如果是这样你看可以利用上g20100806_wjh_h_表的索引么？

从你描述的逻辑上来说，你还少了一种相交模式！
start<=txstart-length and stop>=txstart+len这也是我为什么不愿意从你的代码中去分析你的业务逻辑的原因。很显然你的代码和你的说明
chrom相等，strand='+',如果【start,stop】与区间【txstart-length,txstart+len】相交，则该记录就是我们需要的。在逻辑上不不匹配的。现在我不知道以哪个为准了。

以现在的为准，我确定了，不会再改了，你说的哪种相交情况在我的业务逻辑中不会出现，所以我没有列出来。
当strand为‘+’：
start<=txstart-length and stop>=txstart-length or （相交的第一种形式）
start>=txstart-length and stop<=txstart+len or （（相交的第二种形式））
start<=txstart+len and stop>=txstart+len（（相交的第三种形式））当strand为‘-’：
start<=txend-len and stop>=txend-len or （相交的第一种形式）
start>=txend-len and stop<=txend+length or （相交的第二种形式）
start<=txend+length and stop>=txend+length （相交的第三种形式）
如果我将refgene表按照strand的‘+/-’将表分为两张表（refgene_positive,refgene_negative），然后再和g20100806_wjh_h_表进行连接，得到我要的数据，如果是这样你看可以利用上g20100806_wjh_h_表的索引么？索引如下：
refgene_positive(chrom,txstart,txend);
refgene_negative(chrom,txstart,txend);
g20100806_wjh_h_ (chromosome,start,stop);sql语句：
select chrom,start,stop,name,name2,txstart as tss,strand,seq_id,probeids,txstart,txend,cdsstart,cdsend from refgene_positive,g20100806_wjh_h_
  where refGene_positive.chrom=g20100806_wjh_h_.chromosome  and start<=txstart-length and stop>=txstart-length
  union
  select chrom,start,stop,name,name2,txstart as tss,strand,seq_id,probeids,txstart,txend,cdsstart,cdsend from refgene_positive,g20100806_wjh_h_
  where refGene_positive.chrom=g20100806_wjh_h_.chromosome  and start>=txstart-length and stop<=txstart+len
  union
  select chrom,start,stop,name,name2,txstart as tss,strand,seq_id,probeids,txstart,txend,cdsstart,cdsend from refgene_positive,g20100806_wjh_h_
  where refGene_positive.chrom=g20100806_wjh_h_.chromosome  and start<=txstart+len and stop>=txstart+len
  into outfile 'd:/temp_positive.txt';


  select chrom,start,stop,name,name2,txstart as tss,strand,seq_id,probeids,txstart,txend,cdsstart,cdsend from refgene_negative,g20100806_wjh_h_
  where refGene_negative.chrom=g20100806_wjh_h_.chromosome  and start<=txend-len and stop>=txend-len
  union
  select chrom,start,stop,name,name2,txstart as tss,strand,seq_id,probeids,txstart,txend,cdsstart,cdsend from refgene_negative,g20100806_wjh_h_
  where refGene_negative.chrom=g20100806_wjh_h_.chromosome  and start>=txend-len and stop<=txend+length
  union
  select chrom,start,stop,name,name2,txstart as tss,strand,seq_id,probeids,txstart,txend,cdsstart,cdsend from refgene_negative,g20100806_wjh_h_
  where refGene_negative.chrom=g20100806_wjh_h_.chromosome  and start<=txend+length and stop>=txend+length
  into outfile 'd:/temp_negative.txt';
不知道这样能否利用上大表的索引，速度是否会快很多？
你有什么其他的方法么？？？

出不出现和逻辑没有关系，如果你能加上那条，那么你的条件就可以简化为
strand='+' and start<=txstart+len and stop>=txstart-length这样，会方便很多。

加上这个试试。从refgene开始。这样它应该会利用 ix_all 索引。select STRAIGHT_JOIN chrom,start,stop,name,name2,txstart as tss,strand,seq_id,probeids,txstart,txend,cdsstart,cdsend
from refgene,g20100806_wjh_h_
where chrom=chromosome
and strand='+' and start<=txstart+len and stop>=txstart-length
union all
select STRAIGHT_JOIN chrom,start,stop,name,name2,txstart as tss,strand,seq_id,probeids,txstart,txend,cdsstart,cdsend
from refgene,g20100806_wjh_h_
where chrom=chromosome
and strand='-' and start<=txend+length and stop>txend-len

问题又出来了，这次ix_all的索引用上了，不过小表的索引用不上，结构速度和以前差不多。
以前利用小表的索引*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: g20100806_wjh_h_
         type: ALL
possible_keys: ix_all,ix_chr
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 385019
        Extra:
*************************** 2. row ***************************
           id: 1
  select_type: PRIMARY
        table: refgene
         type: ref
possible_keys: ix_start,ix_end
          key: ix_start
      key_len: 258
          ref: tss.g20100806_wjh_h_.chromosome,const
         rows: 434
        Extra: Using where
耗时为：1hour 40minute现在利用大表的索引为：*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: refgene_positive
         type: ALL
possible_keys: ix_txstart,ix_txend
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 18353
        Extra:
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: g20100806_wjh_h_
         type: ref
possible_keys: ix_all,ix_chr,ix_start,ix_stop
          key: ix_all
      key_len: 258
          ref: tss.refgene_positive.chrom
         rows: 8954
        Extra: Using where
2 rows in set (0.00 sec)耗时为：1 hour 20 minute这样的速度无法接受啊......
请问有没有什么好的方法啊？其中的条件len和length都是变量，只能用表达式啊，这样的话也只能利用上其中的一个索引，不管怎么优化耗时也要1个小时，真慢。。
请指教？

贴出你的show index
另外以同样的方式贴 EXPLAIN，不要用\G

我的sql语句为：
select chrom,start,stop,name,name2,txstart as tss,strand,seq_id,probeids,txstart,txend,cdsstart,cdsend from refgene_positive,g20100806_wjh_h_
  where refGene_positive.chrom=g20100806_wjh_h_.chromosome  and start<=txstart+len and stop>=txstart-length
  into outfile 'f:/temp_positive.txt';
  select chrom,start,stop,name,name2,txstart as tss,strand,seq_id,probeids,txstart,txend,cdsstart,cdsend from refgene_negative,g20100806_wjh_h_
  where refGene_negative.chrom=g20100806_wjh_h_.chromosome  and start<=txend+length and stop>=txend-len
  into outfile 'f:/temp_negative.txt';

select chrom,start,stop,name,name2,txstart as tss,strand,seq_id,probeids,txstart,txend,cdsstart,cdsend
from refgene_positive,g20100806_wjh_h_
where refGene_positive.chrom=g20100806_wjh_h_.chromosome
and start<=txstart+len
and stop>=txstart-length
strand='+'  哪去了？不需要了？

对啊，前面我不是说了吗？我将refgene表按照strand的‘+/-’将表分为两张表（refgene_positive,refgene_negative），然后再和g20100806_wjh_h_表进行连接
这样主要是为了能够利用上大表的ix_all索引啊如果加上strand字段的话，大表的索引就用不上。

我有一个想法，不知道可不可以，就是利用函数索引。。
我可不可以创建这样的一个索引呢？？？？
create index ix_diff on refgene_positive(chrom,(txstart-length),(txstart_len))呢？？

没有这种索引，因为你的LEN，LENGTH都是变量。另外即使这种索引也无用。因为对其中的一张表来说，一定是全表扫描。基于这个需要全表扫描的表然后逐记录到另一个表中去搜索匹配记录。现在的索引主要是用于第二步（到另一个表中去搜索匹配记录）

说的没错，我们现在主要做的就是要选择用哪张表来作为驱动表？我不知道怎么通过explain来查看哪个表位驱动表，哪个表位被驱动表？
请指教？
原则就是：以最小结果集驱动最大结果集是吧？

这与大小驱动没有关系，关键看数据分布。其实你可以从你的业务逻辑上找一些条件出来。比如同一表中，start－stop的差距最大是多少？

解决方案 »