请教JPEG文件的格式!

JPEG 简易文档 V2.0
------------------------------
        最后修订 2000.3.4
        作者: 云风
        Email: [email protected]
        Homepage: http://member.netease.com/~cloudwu写在前面
--------
1. 为什么写这个文档?
    云风想对 JPEG/MPEG 有一个系统的研究, 但是苦于找到好的资料. 而英文水平又
    不怎样, 所以在学习的过程,    将已经了解了的东西记录下来. 方便自己在编写
    代码的时候查阅. 而且正式的 JPEG 文档非常复杂, 打印出来也有厚厚一本, 就
    是英文底子比较好的朋友, 看起来也会头痛的. 这里写一份精简版本, 对大家都
    有好处的. 同时希望 inet 上中文资料越来越丰富.2. 通过阅读这份文档期望达到的目的.
    能够对 JPEG 图形压缩有一定感性的认识, 但其数学原理不需要搞清. 能够通过这,
    开始写自己的编码/解码程序. 或者看懂以有的代码. 提高对有损图形压缩有进一
    步了解. 自己能够改良 JPEG, 比如增加透明色的支持, 加快 JPEG 的解码速度.3. 为什么用文本格式写, 而不用 HTML?
    个人喜好. 不喜欢有格式编排的电子文档. 纯文本能够更广泛的使用, 而不需要
    HTML 浏览器.4. 读者需要为这个文档付出什么吗?
    您可以自由使用它. 但是由于您是无偿使用, 所以作者不对可能出现的错误和问
    题担负任何责任. 关于相关问题,可以来 email 探讨, 但由于精力有限, 不保证
    回信. 如果你对这有不满意的地方, 云风不接受任何批评.5. 能够转载这篇文档吗?
    欢迎您随意转载, 但不得用它赢利. 而且转载请保留其内容完整. 如果您为它
    制作了诸如 HTML 等别的格式的版本, 也必须同时保留一份纯文本版在一起.JPEG 压缩简介
-------------1. 色彩模型    JPEG 的图片使用的是 YCrCb 颜色模型, 而不是计算机上最常用的 RGB. 关于色
彩模型, 这里不多阐述. 只是说明, YCrCb 模型更适合图形压缩. 因为人眼对图片上
的亮度 Y 的变化远比色度 C 的变化敏感. 我们完全可以每个点保存一个 8bit 的亮
度值, 每 2x2 个点保存一个 Cr Cb 值, 而图象在肉眼中的感觉不会起太大的变化.
所以, 原来用 RGB 模型, 4 个点需要 4x3=12 字节. 而现在仅需要 4+2=6 字节; 平
均每个点占 12bit. 当然 JPEG 格式里允许每个点的 C 值都记录下来; 不过 MPEG 里
都是按 12bit 一个点来存放的, 我们简写为 YUV12.[R G B] -> [Y Cb Cr] 转换
-------------------------(R,G,B 都是 8bit unsigned)        | Y  |     |  0.299       0.587       0.114 |   | R |     | 0 |
        | Cb |  =  |- 0.1687    - 0.3313      0.5   | * | G |   + |128|
        | Cr |     |  0.5       - 0.4187    - 0.0813|   | B |     |128|Y = 0.299*R + 0.587*G + 0.114*B  (亮度)
Cb =  - 0.1687*R - 0.3313*G + 0.5   *B + 128
Cr =    0.5   *R - 0.4187*G - 0.0813*B + 128[Y,Cb,Cr] -> [R,G,B] 转换
-------------------------R = Y                    + 1.402  *(Cr-128)
G = Y - 0.34414*(Cb-128) - 0.71414*(Cr-128)
B = Y + 1.772  *(Cb-128)    一般, C 值 (包括 Cb Cr) 应该是一个有符号的数字, 但这里被处理过了, 方法
是加上了 128. JPEG 里的数据都是无符号 8bit 的.2. DCT (离散余弦变换)    JPEG 里, 要对数据压缩, 先要做一次 DCT 变换. DCT 变换的原理, 涉及到数学
知识, 这里我们不必深究. 反正和傅立叶变换(学过高数的都知道) 是差不多了. 经过
这个变换, 就把图片里点和点间的规律呈现出来了, 更方便压缩.JPEG 里是对每 8x8
个点为一个单位处理的. 所以如果原始图片的长宽不是 8 的倍数, 都需要先补成 8
的倍数, 好一块块的处理. 另外, 记得刚才我说的 Cr Cb 都是 2x2 记录一次吗? 所
以大多数情况, 是要补成 16x16 的整数块.按从左到右, 从上到下的次序排列 (和我
们写字的次序一样). JPEG 里是对 Y Cr Cb 分别做 DCT 变换的. 这里进行 DCT 变换
的 Y, Cr, Cb 值的范围都是 -128~127. (Y 被减去 128)    JPEG 编码时使用的是 Forward DCT (FDCT) 解码时使用的 Inverse DCT (IDCT)
下面给出公式:FDCT:
                             7   7                 2*x+1                2*y+1
F(u,v) = alpha(u)*alpha(v)* sum sum f(x,y) * cos (------- *u*PI)* cos (------ *v*PI)
                            x=0 y=0                 16                   16 u,v = 0,1,...,7           { 1/sqrt(8)  (u==0)
alpha(u) = {
           { 1/2        (u!=0)IDCT:
          7   7                                 2*x+1                2*y+1
f(x,y) = sum sum alpha(u)*alpha(v)*F(u,v)*cos (------- *u*PI)* cos (------ *v*PI)
         u=0 v=0                                 16                   16 x,y=0,1...7    这个步骤很花时间, 另外有种 AA&N 优化算法, 大家可以去 inet 自己找一下.
在 Intel 主页上可以找到 AA&N IDCT 的 MMX 优化代码. ( Intel 主页上的代码,
输入数据为 12.4 的定点数, 输入矩阵需要转置 90 度) 3. 重排列 DCT 结果
     DCT 将一个 8x8 的数组变换成另一个 8x8 的数组. 但是内存里所有数据都是线
形存放的, 如果我们一行行的存放这 64 个数字, 每行的结尾的点和下行开始的点就
没有什么关系, 所以 JPEG 规定按如下次序整理 64 个数字.                  0, 1, 5, 6,14,15,27,28,
                  2, 4, 7,13,16,26,29,42,
                  3, 8,12,17,25,30,41,43,
                  9,11,18,24,31,40,44,53,
                 10,19,23,32,39,45,52,54,
                 20,22,33,38,46,51,55,60,
                 21,34,37,47,50,56,59,61,
                 35,36,48,49,57,58,62,63    这样数列里的相邻点在图片上也是相邻的了. 4. 量化
     对于前面得到的 64 个空间频率振幅值, 我们将对它们作幅度分层量化操作.方
法就是分别除以量化表里对应值并四舍五入. for (i = 0 ; i<=63; i++ )
   vector[i] = (int) (vector[i] / quantization_table[i] + 0.5)    下面有张 JPEG 标准量化表. (按上面同样的弯曲次序排列)    16 11 10 16 24  40  51  61
    12 12 14 19 26  58  60  55
    14 13 16 24 40  57  69  56
    14 17 22 29 51  87  80  62
    18 22 37 56 68  109 103 77
    24 35 55 64 81  104 113 92
    49 64 78 87 103 121 120 101
    72 92 95 98 112 100 103 99    这张表依据心理视觉阀制作, 对 8bit 的亮度和色度的图象的处理效果不错.
当然我们可以使用任意的量化表. 量化表是定义在 jpeg 的 DQT 标记后. 一般
为 Y 值定义一个, 为 C 值定义一个.
    量化表是控制 JPEG 压缩比的关键. 这个步骤除掉了一些高频量, 损失了很高
细节. 但事实上人眼对高空间频率远没有低频敏感.所以处理后的视觉损失很小.
另一个重要原因是所有的图片的点与点之间会有一个色彩过渡的过程. 大量的图象
信息被包含在低空间频率中. 经过量化处理后, 在高空间频率段, 将出现大量连续
的零.
    注意, 量化后的数据有可能超过 2 byte 有符号整数的处理范围.5. 0 RLC 编码
    现在我们矢量中有许多连续的 0. 我们可以使用 RLC 来压缩掉这些 0. 这里我们
将跳过第一个矢量 (后面将解释为什么) 因为它的编码比较特别. 假设有一组矢量
(64 个的后 63 个) 是
    57,45,0,0,0,0,23,0,-30,-16,0,0,1,0,0,0, 0 , 0 ,0 , 0,..,0
经过 RLC 压缩后就是
    (0,57) ; (0,45) ; (4,23) ; (1,-30) ; (0,-16) ; (2,1) ; EOB
EOB 是一个结束标记, 表示后面都是 0 了. 实际上, 我们用 (0,0) 表示 EOB
但是, 如果这组数字不以 0 结束,  那么就不需要 EOB.
    由于后面 huffman 编码的要求, 每组数字前一个表示 0 的数量的必须是 4 bit,
就是说, 只能是 0~15, 所以我们实际这样编码:
    (0,57) ; (15,0) (2,3) ; (4,2) ; (15,0) (15,0) (1,895) , (0,0)
注意 (15,0) 表示了 16 个连续的 0.6. huffman 编码
    为了提高储存效率, JPEG 里并不直接保存数值, 而是将数值按位数分成 16 组:               数值                 组              实际保存值
                0                   0                   -
              -1,1                  1                  0,1
           -3,-2,2,3                2              00,01,10,11
     -7,-6,-5,-4,4,5,6,7            3    000,001,010,011,100,101,110,111
       -15,..,-8,8,..,15            4       0000,..,0111,1000,..,1111
      -31,..,-16,16,..,31           5     00000,..,01111,10000,..,11111
      -63,..,-32,32,..,63           6                   .
     -127,..,-64,64,..,127          7                   .
    -255,..,-128,128,..,255         8                   .
    -511,..,-256,256,..,511         9                   .
   -1023,..,-512,512,..,1023       10                   .
  -2047,..,-1024,1024,..,2047      11                   .
  -4095,..,-2048,2048,..,4095      12                   .
  -8191,..,-4096,4096,..,8191      13                   .
-16383,..,-8192,8192,..,16383     14                   .
-32767,..,-16384,16384,..,32767    15                   .还是来看前面的例子:
    (0,57) ; (0,45) ; (4,23) ; (1,-30) ; (0,-8) ; (2,1) ; (0,0)
只处理每对数右边的那个:
    57 是第 6 组的, 实际保存值为 111001 , 所以被编码为 (6,111001)
    45 , 同样的操作, 编码为 (6,101101)
    23  ->  (5,10111)
   -30  ->  (5,00001)
    -8  ->  (4,0111)
     1  ->  (1,1)前面的那串数字就变成了:
   (0,6), 111001 ; (0,6), 101101 ; (4,5), 10111; (1,5), 00001; (0,4) , 0111 ;
       (2,1), 1 ; (0,0)括号里的数值正好合成一个字节. 后面被编码的数字表示范围是  -32767..32767.
合成的字节里, 高 4 位是前续 0 的个数, 低 4 位描述了后面数字的位数.继续刚才的例子, 如果 06 的 huffman 编码为 111000
             69 = (4,5)    --- 1111111110011001
             21 = (1,5)    ---  11111110110
             4  = (0,4)    ---  1011
             33 = (2,1)    ---  11011
              0 = EOB = (0,0) ---  1010那么最后对于前面的例子表示的 63 个系数 (记得我们将第一个跳过了吗?) 按位流
写入 JPG 文件中就是这样的:
111000 111001  111000 101101  1111111110011001 10111   11111110110 00001
1011 0111   11011 1   1010DC 的编码
---------
记得刚才我们跳过了每组 64 个数据的第一个吧, DC 就是指的这个数字 (后面 63
个简称 AC) 代入前面的 FDCT 公式可以得到
                c(0,0)     7   7
DC = F(0,0) = --------- * sum sum f(x,y) * cos 0 * cos 0 其中 c(0,0) = 1/2
                  4       x=0 y=0

       1     7   7
   =  --- * sum sum f(x,y)
       8    x=0 y=0        即一块图象样本的平均值. 就是说, 它包含了原始 8x8 图象块里的很多能量. (通常
会得到一个很大的数值)JPEG 的作者指出连续块的 DC 率之间有很紧密的联系,  因此他们决定对 8x8 块的
DC 值的差别进行编码. (Y, Cb, Cr 分别有自己的 DC)Diff = DC(i)  - DC(i-1)所以这一块的 DC(i) 就是:  DC(i)  = DC(i-1)  + DiffJPG 从 0 开始对 DC 编码, 所以 DC(0)=0. 然后再将当前 Diff 值加在上一个值上得
到当前值.下面再来看看上面那个例子: (记住我们保存的 DC 是和上一块 DC 的差值 Diff)例如上面例子中, Diff 是 -511, 就编码成                    (9, 000000000)如果 9 的 Huffman 编码是 1111110 (在 JPG 文件中, 一般有两个 Huffman 表, 一
个是 DC 用, 一个是 AC 用) 那么在 JPG 文件中, DC 的 2 进制表示为               1111110 000000000它将放在 63 个 AC 的前面, 上面上个例子的最终 BIT 流如下: 1111110 000000000 111000 111001  111000 101101  1111111110011001 10111
11111110110 00001 1011 0111   11011 1   1010下面简单叙述一下针对一个数据单元的图片 Y 的解码
-----------------------------------------------在整个图片解码的开始, 你需要先初始化 DC 值为 0.1) 先解码 DC:
         a) 取得一个 Huffman 码 (使用 Huffman DC 表)
         b) Huffman解码, 看看后面的数据位数 N
         c) 取得 N 位, 计算 Diff 值
         d) DC + = Diff
         e) 写入 DC 值:      " vector[0]=DC "2) 解码 63 个 AC:------- 循环处理每个 AC 直到 EOB 或者处理到 64 个 AC       a) 取得一个 Huffman 码 (使用 Huffman AC 表)
       b) Huffman 解码, 得到 (前面 0 数量, 组号)
[记住: 如果是(0,0) 就是 EOB 了]       c) 取得 N 位(组号) 计算 AC
       d) 写入相应数量的 0
       e) 接下来写入 AC
-----------------下一步的解码
------------
上一步我们得到了 64 个矢量. 下面我们还需要做一些解码工作:1) 反量化 64 个矢量 : "for (i=0;i<=63;i++) vector[i]*=quant[i]" (注意防止溢出)
2) 重排列 64 个矢量到 8x8 的块中
3) 对 8x8 的块作 IDCT对 8x8 块的 (Y,Cb,Cr) 重复上面的操作 [Huffman 解码, 步骤 1), 2), 3)]4) 将所有的 8bit 数加上 128
5) 转换 YCbCr 到 RGBJPG 文件(Byte 级)里怎样组织图片信息
-----------------------------------
注意 JPEG/JFIF 文件格式使用 Motorola 格式, 而不是 Intel 格式, 就是说, 如果
是一个字的话, 高字节在前, 低字节在后.JPG 文件是由一个个段 (segments) 构成的. 每个段长度 <=65535. 每个段从一个标
记字开始. 标记字都是 0xff 打头的, 以非 0 字节和 0xFF 结束. 例如 'FFDA' ,
'FFC4', 'FFC0'. 每个标记有它特定意义, 这是由第2字节指明的. 例如, SOS (Start
Of Scan = 'FFDA') 指明了你应该开始解码. 另一个标记 DQT (Define Quantization
Table = 0xFFDB) 就是说它后面有 64 字节的 quantization 表在处理 JPG 文件时, 如果你碰到一个 0xFF, 而它后面的字节不是 0, 并且这个字节
没有意义. 那么你遇到的 0xFF 字节必须被忽略. (一些 JPG 里, 常用用 0xFF 做某
些填充用途) 如果你在做 huffman 编码时碰巧产生了一个 0xFF, 那么就用 0xFF
0x00 代替. 就是说在 jpeg 图形解码时碰到 FF00 就把它当作 FF 处理.另外在 huffman 编码区域结束时, 碰到几个 bit 没有用的时候, 应该用 1 去填充.
然后后面跟 FF.下面是几个重要的标记
--------------------SOI = Start Of Image = 'FFD8'
这个标记只在文件开始出现一次
EOI = End Of Image = 'FFD9'
JPG 文件都以 FFD9 结束RSTi = FFDi ( i =  0..7)  [ RST0 = FFD0, RST7=FFD7]
     = 复位标记
通常穿插在数据流里, 我想是担心 JPG 解码出问题吧(应该配合 DRI 使用). 不过很
多 JPG 都不使用它(SOS --- RST0 --- RST1 -- RST2 --...
...-- RST6 --- RST7 -- RST0 --...)----
标记
----
下面是必须处理的标记SOF0 = Start Of Frame 0 = FFC0
SOS  = Start Of Scan    = FFDA
APP0 = it's the er used to identify a JPG file which uses the JFIF
    specification       = FFE0
COM  = Comment          = FFFE
DNL  = Define Number of Lines    = FFDC
DRI  = Define Restart Interval   = FFDD
DQT  = Define Quantization Table = FFDB
DHT  = Define Huffman Table      = FFC4JPG 文件中 Haffman 表的储存
---------------------------
JPEG 里定义了一张表来描述 Haffman 树. 定义在 DHT 标记后面. 注意: Haffman
代码的长度限制在 16bit 内.一般一个 JPG 文件里会有 2 类 Haffman 表: 一个用于 DC 一个用于 AC (实际有 4
个表, 亮度的 DC,AC 两个, 色度的 DC,AC 两个)这张表是这样保存的:
1) 16 字节:
第 i 字节表示了 i 位长的 Huffman 代码的个数 (i= 1 到 16)

2) 这表的长度 (字节数) = 这 16 个数字之和
现在你可以想象这张表怎么存放的吧? 对应字节就是对应 Haffman 代码等价数字. 我
不多解释, 这需要你先了解 Haffman 算法. 这里只举一个例子:Haffman 表的表头是 0,2,3,1,1,1,0,1,0,0,0,0,0,0,0,0
就是说长度为 1 的代码没有
长度为 2 的代码为 00
                  01
长度为 3 的代码是 100
                  101
                  110
长度为 4 的代码是 1110
长度为 5 的代码是 11110
长度为 6 的代码是 111110
长度为 7 的代码没有 (如果有一个的话应该是 1111110)
长度为 8 的代码是 11111100
         .....
后面都没有了.如果表下面的数据是
    45 57 29 17 23 25 34 28就是说
    45 = 00
    57 = 01
    29 = 100
    17 = 101
    23 = 110
等等...如果你懂 Haffman 编码, 这些不难理解采样系数
--------
下面讲解的都是真彩 JPG 的解码, 灰度 JPG 的解码很简单, 因为图形中只有亮度信
息. 而彩色图形由 (Y, Cr, Cb) 构成, 前面提到过, Y 通常是每点采样一次, 而 Cr,
Cb 一般是 2x2 点采样一次, 当然也有的 JPG 是逐点采样, 或者每两点采样 (横向
两点, 纵向一点) 采样系数均被定义成对比最高采样系数的相对值. 一般情况 (即: Y 逐点采样, Cr Cb 每 2x2 点一次) 下: Y 有最高的采样率, 横向采
样系数HY=2 纵向采样系数 VY=2; Cb 的横向采样系数 HCb=1, 纵向采样系数 VCb=1;
同样 HCr=1, VCr=1在 Jpeg 里, 8x8 个原始数据, 经过 RLC, Huffman 编码后的一串数据流称为一个
Data Unit (DU) JPG 里按 DU 为单位的编码次序如下:     1)      for  (counter_y=1;counter_y<=VY;counter_y++)
                  for (counter_x=1;counter_x<=HY;counter_x++)
                     {  对 Y 的 Data Unit 编码 }     2)      for  (counter_y=1;counter_y<=VCb ;counter_y++)
                  for (counter_x=1;counter_x<=HCb;counter_x++)
                     {  对 Cb 的 Data Unit 编码 }     3)      for  (counter_y=1;counter_y<=VCr;counter_y++)
                  for (counter_x=1;counter_x<=HCr;counter_x++)
                     {  对 Cr 的 Data Unit 编码 }按我上面的例子: (HY=2, VY=2 ; HCb=VCb =1, HCr,VCr=1) 就是这样一个次序
    YDU,YDU,YDU,YDU,CbDU,CrDU
这些就描述了一块 16x16 的图形. 16x16 = (Hmax*8 x Vmax*8) 这里 Hmax=HY=2
Vmax=VY=2一个 (Hmax*8,Vmax*8) 的块被称作 MCU (Minimun Coded Unix) 前面例子中一个
MCU = YDU,YDU,YDU,YDU,CbDU,CrDU如果  HY =1, VY=1
      HCb=1, VCb=1
      HCr=1, VCr=1
这样 (Hmax=1,Vmax=1), MCU 只有 8x8 大, MCU = YDU,CbDU,CrDU对于灰度 JPG, MCU 只有一个 DU (MCU = YDU)JPG 文件里, 图象的每个组成部分的采样系数定义在 SOF0 (FFC0) 标记后简单说一下 JPG 文件的解码
-------------------------
解码程序先从 JPG 文件中读出采样系数, 这样就知道了 MCU 的大小, 算出整个图象
有几个 MCU. 解码程序再循环逐个对 MCU 解码, 一直到检查到 EOI 标记. 对于每个
MCU, 按正规的次序解出每个 DU, 然后组合, 转换成 (R,G,B) 就 OK 了附:JPEG 文件格式
~~~~~~~~~~~~~~~~  - 文件头 (2 bytes):  $ff, $d8 (SOI) (JPEG 文件标识)
  - 任意数量的段 , 见后面
  - 文件结束 (2 bytes): $ff, $d9 (EOI)段的格式:
~~~~~~~~~  - header (4 bytes):
       $ff     段标识
        n      段的类型 (1 byte)
       sh, sl  该段长度, 包括这两个字节, 但是不包括前面的 $ff 和 n.
               注意: 长度不是 intel 次序, 而是 Motorola 的, 高字节在前,
       低字节在后!
  - 该段的内容, 最多 65533 字节注意:
  - 有一些无参数的段 (下面那些前面注明星号的)
    这些段没有长度描述 (而且没有内容), 只有 $ff 和类型字节.
  - 段之间无论有多少 $ff 都是合法的, 必须被忽略掉.段的类型:
~~~~~~~~~   *TEM   = $01   可以忽略掉    SOF0  = $c0   帧开始 (baseline JPEG), 细节附后
    SOF1  = $c1   dito
    SOF2  = $c2   通常不支持
    SOF3  = $c3   通常不支持    SOF5  = $c5   通常不支持
    SOF6  = $c6   通常不支持
    SOF7  = $c7   通常不支持    SOF9  = $c9   arithmetic 编码(Huffman 的一种扩展算法), 通常不支持
    SOF10 = $ca   通常不支持
    SOF11 = $cb   通常不支持    SOF13 = $cd   通常不支持
    SOF14 = $ce   通常不支持
    SOF14 = $ce   通常不支持
    SOF15 = $cf   通常不支持    DHT   = $c4   定义 Huffman Table,  细节附后
    JPG   = $c8   未定义/保留 (引起解码错误)
    DAC   = $cc   定义 Arithmetic Table, 通常不支持   *RST0  = $d0   RSTn 用于 resync, 通常被忽略
   *RST1  = $d1
   *RST2  = $d2
   *RST3  = $d3
   *RST4  = $d4
   *RST5  = $d5
   *RST6  = $d6
   *RST7  = $d7    SOI   = $d8   图片开始
    EOI   = $d9   图片结束
    SOS   = $da   扫描行开始, 细节附后
    DQT   = $db   定义 Quantization Table, 细节附后
    DNL   = $dc   通常不支持, 忽略
    DRI   = $dd   定义重新开始间隔, 细节附后
    DHP   = $de   忽略 (跳过)
    EXP   = $df   忽略 (跳过)    APP0  = $e0   JFIF APP0 segment er (细节略)
    APP15 = $ef   忽略    JPG0  = $f0   忽略 (跳过)
    JPG13 = $fd   忽略 (跳过)
    COM   = $fe   注释, 细节附后其它的段类型都保留必须跳过SOF0: Start Of Frame 0:
~~~~~~~~~~~~~~~~~~~~~~~  - $ff, $c0 (SOF0)
  - 长度 (高字节, 低字节), 8+components*3
  - 数据精度 (1 byte) 每个样本位数, 通常是 8 (大多数软件不支持 12 和 16)
  - 图片高度 (高字节, 低字节), 如果不支持 DNL 就必须 >0
  - 图片宽度 (高字节, 低字节), 如果不支持 DNL 就必须 >0
  - components 数量(1 byte), 灰度图是 1, YCbCr/YIQ 彩色图是 3, CMYK 彩色图
    是 4
  - 每个 component: 3 bytes
     - component id (1 = Y, 2 = Cb, 3 = Cr, 4 = I, 5 = Q)
     - 采样系数 (bit 0-3 vert., 4-7 hor.)
     - quantization table 号DRI: Define Restart Interval:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  - $ff, $dd (DRI)
  - 长度 (高字节, 低字节), 必须是 4
  - MCU 块的单元中的重新开始间隔 (高字节, 低字节),
    意思是说, 每 n 个 MCU 块就有一个 RSTn 标记.
    第一个标记是 RST0, 然后是 RST1 等, RST7 后再从 RST0 重复 DQT: Define Quantization Table:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  - $ff, $db (DQT)
  - 长度 (高字节, 低字节)
  - QT 信息 (1 byte):
     bit 0..3: QT 号(0..3, 否则错误)
     bit 4..7: QT 精度, 0 = 8 bit, 否则 16 bit
  - n 字节的 QT, n = 64*(精度+1) 评论:
  - 一个单独的 DQT 段可以包含多个 QT, 每个都有自己的信息字节
  - 当精度=1 (16 bit), 每个字都是高位在前低位在后DAC: Define Arithmetic Table:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
法律原因, 现在的软件不支持 arithmetic 编码.
不能生产使用 arithmetic 编码的 JPEG 文件DHT: Define Huffman Table:
~~~~~~~~~~~~~~~~~~~~~~~~~~  - $ff, $c4 (DHT)
  - 长度 (高字节, 低字节)
  - HT 信息 (1 byte):
     bit 0..3: HT 号 (0..3, 否则错误)
     bit 4   : HT 类型, 0 = DC table, 1 = AC table
     bit 5..7: 必须是 0
  - 16 bytes: 长度是 1..16 代码的符号数. 这 16 个数的和应该 <=256
  - n bytes: 一个包含了按递增次序代码长度排列的符号表
    (n = 代码总数) 评论:
  - 一个单独的 DHT 段可以包含多个 HT, 每个都有自己的信息字节COM: 注释:
~~~~~~~~~~  - $ff, $fe (COM)
  - 注释长度 (高字节, 低字节) = L+2
  - 注释为长度为 L 的字符流SOS: Start Of Scan:
~~~~~~~~~~~~~~~~~~~  - $ff, $da (SOS)
  - 长度 (高字节, 低字节), 必须是 6+2*(扫描行内组件的数量)
  - 扫描行内组件的数量 (1 byte), 必须 >= 1 , <=4 (否则是错的) 通常是 3
  - 每个组件: 2 bytes
     - component id (1 = Y, 2 = Cb, 3 = Cr, 4 = I, 5 = Q), 见 SOF0
     - 使用的 Huffman 表:
- bit 0..3: AC table (0..3)
- bit 4..7: DC table (0..3)
  - 忽略 3 bytes (???) 评论:
  - 图片数据 (一个个扫描行) 紧接着 SOS 段.
CRYX's note about the JPEG decoding algorithm.Copyright 1999 Cristi Cuturicu.DISCLAIMER...........You get this file for free, so you cannot have any legal requests from me.If you don't agree, read no more.No warranty is provided with this doc, there might be bugs or errors in it(although I've tried to avoid them), so use the information contained in thisfile at your own risk.This is NOT an official documentation, for further information please referto the JPEG ISO standard.All product names mentioned in this file are trades or registered tradesof their respective owners.Not for reproduction (electronic or hardcopy) except for personal use.THE JPEG COMPRESSION and THE JPG FILE FORMAT.............................................Long ago, I've been looking on the net a good doc which could have explained tome the JPEG compression, and particularly the JPG file format.And recently I've found the ISO-ITU JPEG standard in a file called itu-1150.ps(JPEG standard = ISO standard 10918-1 or CCITT standard recommendation T.81: "Information Technology - Digital compression and coding of continuous-tonestill images - Requirements and guidelines")Though this standard is quite complete, it has a lot of not interesting partsin its 186 pages, and I had to dig in it, and then write my own JPG viewer,to get from this standard the main stuff a programmer needs :    The Baseline Sequential DCT JPG compression.First a note : Mainly because of the fact that the majority of the JPG files areBaseline Sequential JPGS, this doc concerns only the Baseline Sequential JPGcompression and particularly the JFIF implementation of it.It DOES NOT cover the JPG Progresive or Hierarchical compression.(For more details about these read the itu-1150 standard. It can be found at www.wotsit.org or somewhere at www.jpeg.org/jpeg)I've thought that it would be easier for the reader to understand the JPGcompression if I'll explain the steps of the JPG encoder.(The decoder steps will be the inverse of the encoder's steps, but in reverseorder, of course)THE JPEG ENCODER STEPS----------------------1) The afine transformation in colour space :  [R G B] -> [Y Cb Cr]---------------------------------------------------------------------(It is defined in the CCIR Recommendation 601)(R,G,B are 8-bit unsigned values) | Y  |     |  0.299       0.587       0.114 |   | R |     | 0 | | Cb |  =  |- 0.1687    - 0.3313      0.5   | * | G |   + |128| | Cr |     |  0.5       - 0.4187    - 0.0813|   | B |     |128|The new value Y = 0.299*R + 0.587*G + 0.114*B  is called the luminance.It is the value used by the monochrome monitors to represent an RGB colour.Physiologically, it represents the intensity of an RGB colour perceived bythe eye.You see that the formula for Y it's like a weighted-filter with different weightsfor each spectral component: the eye is most sensitive to the Green componentthen it follows the Red component and the last is the Blue component.The values Cb =  - 0.1687*R - 0.3313*G + 0.5   *B + 128    Cr =    0.5   *R - 0.4187*G - 0.0813*B + 128are called the chromimance values and represent 2 coordinates in a systemwhich measures the nuance and saturation of the colour ([Approximately], thesevalues indicate how much blue and how much red is in that colour).These 2 coordinates are called shortly the chrominance.[Y,Cb,Cr] to [R,G,B] Conversion (The inverse of the previous transform)--------------------------------RGB can be computed directly from YCbCr ( 8-bit unsigned values) as follows:R = Y                    + 1.402  *(Cr-128)G = Y - 0.34414*(Cb-128) - 0.71414*(Cr-128)B = Y + 1.772  *(Cb-128)A note relating Y,Cb,Cr to the human visual system---------------------------------------------------The eye, particulary the retina, has as visual analyzers two kind of cells :Cells for night view which perceive only nuances of gray ranging from intensewhite to the darkest black and cells for the day view which perceive the colornuance.The first cells, given an RGB colour, detect a gray level similar to that givenby the luminance value.The second cells, responsible for the perception of the colour nuance, are thecells which detects a value related to that of the chrominance.2) Sampling------------The JPEG standard takes into account the fact that the eye seems to be moresensitive at the luminance of a colour than at the nuance of that colour.(the white-black view cells have more influence than the day view cells)So, on most JPGS, luminance is taken in every pixel while the chrominance istaken as a medium value for a 2x2 block of pixels.Note that it is not neccessarily that the chrominance to be taken as a mediumvalue for a 2x2 block , it could be taken in every pixel, but good compressionresults are achieved this way, with almost no loss in visual perception of thenew sampled image.A note : The JPEG standard specifies that for every image component (like, forexample Y) must be defined 2 sampling coefficients: one for the horizontalsampling and one for vertical sampling.These sampling coefficients are defined in the JPG file as relative to themaximum sampling coefficient (more on this later).3) Level shift--------------All 8-bit unsigned values (Y,Cb,Cr) in the image are "level shifted": they areconverted to an 8-bit signed representation, by subtracting 128 from their value.4) The 8x8 Discrete Cosine Transform (DCT)------------------------------------------The image is break into 8x8 blocks of pixels, then for each 8x8 block isapplied the DCT transform. Note that if the X dimension of the original imageis not divisible by 8, the encoder should make it divisible, by completing theremaining right columns (until X becomes a multiple of 8) with the right-mostcolumn of the original image.Similar, if the Y dimension is not divisible by 8, the encoder should completethe remaining lines with the bottom-most line of the original image.The 8x8 blocks are processed from left to right and from top to bottom.A note: Since a pixel in the 8x8 block has 3 components (Y,Cb,Cr) the DCTis applied separately to 3 blocks 8x8:  The first 8x8 block is the block which contains the luminance of the pixels   in the original 8x8 block  The second 8x8 block is the block which contains the Cb value of the pixels   in the original 8x8 block  And, similar, the third 8x8 block contains the Cr values.The purpose of the DCT transform is that instead of processing the originalsamples, you work with the spatial frequencies present in the original image.These spatial frequencies are very related to the level of detail present in animage. High spatial frequencies corresponds to high levels of detail, whilelower frequencies corresponds to lower levels of detail.The DCT transform is very similar to the 2D Fourier transform which shifts fromthe time domain (the original 8x8 block) to the frequency domain (the new 8x8=64 coefficients which represents the amplitudes of the spatial frequenciesanalyzed)The mathematical definition of Forward DCT (FDCT) and Inverse DCT (IDCT) is :FDCT:    c(u,v)     7   7                 2*x+1                2*y+1F(u,v) = --------- * sum sum f(x,y) * cos (------- *u*PI)* cos (------ *v*PI)      4       x=0 y=0                 16                   16 u,v = 0,1,...,7   { 1/2 when u=v=0 c(u,v) = {   {  1 otherwiseIDCT:    1     7   7                      2*x+1                2*y+1f(x,y) =  --- * sum sum c(u,v)*F(u,v)*cos (------- *u*PI)* cos (------ *v*PI)    4    u=0 v=0                      16                   16 x,y=0,1...7Applying these formulas directly is computationally expensive, especiallywhen there have been developed faster algorithms for implementing forward orinverse DCT. A notable one called AA&N leaves only 5 multiplies and 29 addsto be done in the DCT itself. More info and an implementation of it can be found in the free software for JPEG encoders/decoders made by Independent JPEG Group (IJG), their C source can be found at www.ijg.org.5) The zig-zag reordering of the 64 DCT coefficients-----------------------------------------------------So, after we performed the DCT transform over a block of 8x8 values, we havea new 8x8 block.Then, this 8x8 block is traversed in zig-zag like this :(The numbers in the 8x8 block indicate the order in which we traverse thebidimensional 8x8 matrix)   0, 1, 5, 6,14,15,27,28,   2, 4, 7,13,16,26,29,42,   3, 8,12,17,25,30,41,43,   9,11,18,24,31,40,44,53, 10,19,23,32,39,45,52,54, 20,22,33,38,46,51,55,60, 21,34,37,47,50,56,59,61, 35,36,48,49,57,58,62,63As you see , first is the upper-left corner (0,0), then the value at (0,1),then (1,0) then (2,0), (1,1), (0,2), (0,3), (1,2),  (2,1), (3,0) etc.After we are done with traversing in zig-zag the 8x8 matrix we have now a vectorwith 64 coefficients (0..63)The reason for this zig-zag traversing is that we traverse the 8x8 DCT coefficientsin the order of increasing the spatial frequencies. So, we get a vector sortedby the criteria of the spatial frequency:  The first value in the vector (atindex 0) corresponds to the lowest spatial frequency present in the image -It's called the DC term. As we increase the index in the vector, we get valuescorresponding to higher frequencies (The value at index 63 corresponds to theamplitude of the highest spatial frequency present in the 8x8 block).The rest of the DCT coefficients are called AC terms.6) Quantization----------------At this stage, we have a sorted vector with 64 values corresponding to theamplitudes of the 64 spatial frequencies present in the 8x8 block.These 64 values are quantized: Each value is divided by a dividend specifiedin a vector with 64 values --- the quantization table , then it's rounded tothe nearest integer. for (i = 0 ; i<=63; i++ )   vector[i] = (int) (vector[i] / quantization_table[i] + 0.5)Here is the example of the quantization table for luminance(Y) given in anannex of the JPEG standard.(It is given in a form of an 8x8 block; in order toobtain a 64 vector it should be zig-zag reordered) 16 11 10 16 24  40  51  61 12 12 14 19 26  58  60  55 14 13 16 24 40  57  69  56 14 17 22 29 51  87  80  62 18 22 37 56 68  109 103 77 24 35 55 64 81  104 113 92 49 64 78 87 103 121 120 101 72 92 95 98 112 100 103 99This table is based upon "psychovisual thresholding" , it has "been used withgood results on 8-bit per sample luminance and chrominance images".Most existing encoders use simple multiples of this example, but the values arenot claimed to be optimal (An encoder can use ANY OTHER quantization table)The table is specified in the JPEG file with the DQT(Define Quantization Table)er.Most commonly there is one table for Y, and another one for thechrominance (Cb and Cr).The quantization process has the key role in the JPEG compression.It is the process which removes the high frequencies present in the originalimage -- in consequence the high detail.We do this because of the fact that the eye is much more sensitive to lowerspatial frequencies than to higher frequencies, so we can remove, with verylittle visual loss, higher frequencies.This is done by dividing values at high indexes in the vector (the amplitudesof higher frequencies) with larger values than the values by which are dividedthe amplitudes of lower frequencies.The bigger the values in the quantization table are, the bigger is the error(in consequence the visual error) introduced by this lossy process, and thesmaller is the visual quality.Another important fact is that in most images the colour varies slow from onepixel to another, so most images will have a small quantity of high detail-> a small amount (small amplitudes) of high spatial frequencies - but they havea lot of image information contained in the low spatial frequencies.In consequence in the new quantized vector, at high spatial frequencies, we'llhave a lot of consecutive zeroes.7)  The Zero Run Length Coding (RLC)-------------------------------Now we have the quantized vector with a lot of consecutive zeroes. We can exploitthis by run length coding the consecutive zeroes.IMPORTANT: You'll see later why, but here we skip the encoding of the first coefficient of the vector (the DC coefficient) which is coded a bit different.(I'll present its coding later on this doc)Let's consider the original 64 vector a 63 vector (it's the 64 vector withoutthe first coefficient)Say that we have 57,45,0,0,0,0,23,0,-30,-16,0,0,1,0,0,0, 0 , 0 ,0 , only 0,..,0Here it is how the RLC JPEG compression is done for this example :(0,57) ; (0,45) ; (4,23) ; (1,-30) ; (0,-16) ; (2,1) ; EOBAs you see, we encode for each value different by 0 the number of consecutivezeroes PRECEDING that value, then we add the value.Another note : EOB is the short form for End of Block, it's a special codedvalue (a er). If we've reached in a position in the vector from which we have till the end of the vector only zeroes, we'll  that positionwith EOB and finish the RLC compression of the quantized vector.[Note that if the quantized vector doesn't finishes with zeroes (has the lastelement not 0) we'll not have the EOB er.]ACTUALLY, EOB has as an equivalent (0,0) and it will be (later) Huffman codedlike (0,0), so we'll encode : (0,57) ; (0,45) ; (4,23) ; (1,-30) ; (0,-16) ; (2,1) ; (0,0)Another MAJOR thing: Say that somewhere in the quantized vector we have: 57, eighteeen zeroes, 3, 0,0 ,0,0 2, thirty-three zeroes, 895, EOBThe JPG Huffman coding makes the restriction (you'll see later why) thatthe number of previous 0's to be coded as a 4-bit value, so it can't overpassthe value 15 (0xF).So, the previous example would be coded as :    (0,57) ; (15,0) (2,3) ; (4,2) ; (15,0) (15,0) (1,895) , (0,0)(15,0) is a special coded value which indicates that there follows 16 consecutivezeroes.Note : 16 zeroes not 15 zeroes.8) The final step === Huffman coding-------------------------------------First an IMPORTANT note : Instead of storing the actual value , the JPEG standardspecifies that we store the minimum size in bits in which we can keep that value(it's called the category of that value) and then a bit-coded representationof that value like this:      Values             Category        Bits for the value 0                   0                   -       -1,1                  1                  0,1    -3,-2,2,3                2              00,01,10,11     -7,-6,-5,-4,4,5,6,7            3    000,001,010,011,100,101,110,111       -15,..,-8,8,..,15            4       0000,..,0111,1000,..,1111      -31,..,-16,16,..,31           5     00000,..,01111,10000,..,11111      -63,..,-32,32,..,63           6                   .     -127,..,-64,64,..,127          7                   .    -255,..,-128,128,..,255         8                   .    -511,..,-256,256,..,511         9                   .   -1023,..,-512,512,..,1023       10                   .  -2047,..,-1024,1024,..,2047      11                   .  -4095,..,-2048,2048,..,4095      12                   .  -8191,..,-4096,4096,..,8191      13                   . -16383,..,-8192,8192,..,16383     14                   .-32767,..,-16384,16384,..,32767    15                   .In consequence for the previous example:    (0,57) ; (0,45) ; (4,23) ; (1,-30) ; (0,-8) ; (2,1) ; (0,0)let's encode ONLY the right value of these pairs, except the pairs that arespecial ers like (0,0) or (if we would have) (15,0)    57 is in the category 6 and it is bit-coded 111001 , so we'll encode itlike (6,111001)    45 , similar, will be coded as (6,101101)    23  ->  (5,10111)   -30  ->  (5,00001)    -8  ->  (4,0111)     1  ->  (1,1)And now , we'll write again the string of pairs:   (0,6), 111001 ; (0,6), 101101 ; (4,5), 10111; (1,5), 00001; (0,4) , 0111 ;       (2,1), 1 ; (0,0)The pairs of 2 values enclosed in bracket paranthesis, can be represented on abyte because of the fact that each of the 2 values can be represented on a nibble(the counter of previous zeroes is always smaller than 15 and so it is thecategory of the numbers [numbers encoded in a JPG file are in range -32767..32767]).In this byte, the high nibble represents the number of previous 0s, and thelower nibble is the category of the new value different by 0.The FINAL step of the encoding consists in Huffman encoding this byte, and thenwriting in the JPG file, as a stream of bits, the Huffman code of this byte,followed by the remaining bit-representation of that number.For example, let's say that for byte 6 ( the equivalent of (0,6) ) we have aHuffman code = 111000;    for byte 69 = (4,5) (for example) we have 1111111110011001             21 = (1,5)    ---  11111110110             4  = (0,4)    ---  1011             33 = (2,1)    ---  11011              0 = EOB = (0,0) ---  1010The final stream of bits written in the JPG file on disk for the previous exampleof 63 coefficients (remember that we've skipped the first coefficient ) is      111000 111001  111000 101101  1111111110011001 10111   11111110110 00001         1011 0111   11011 1   1010The encoding of the DC coefficient-----------------------------------DC is the coefficient in the quantized vector corresponding to the lowestfrequency in the image (it's the 0 frequency) , and (before quantization) ismathematically = (the sum of 8x8 image samples) / 8 .(It's like an average value for that block of image samples).It is said that it contains a lot of energy present in the original 8x8 imageblock. (Usually it gets large values).The authors of the JPEG standard noticed that there's a very close connectionbetween the DC coefficient of consecutive blocks, so they've decided to encodein the JPG file the difference between the DCs of consecutive 8x8 blocks(Note: consecutive 8x8 blocks of the SAME image component, like consecutive8x8 blocks for Y , or consecutive blocks for Cb , or for Cr)Diff = DC  - DC         i     (i-1)So DC of the current block (DC  ) will be equal to :  DC  = DC    + Diff                              i                         i     i-1And in JPG decoding you will start from 0 -- you consider that the firstDC coefficient = 0 ;  DC  = 0                        0And then you'll add to the current value the value decoded from the JPG(the Diff value)SO, in the JPG file , the first coefficient = the DC coefficient is actuallythe difference, and it is Huffman encoded DIFFERENTLY from the encoding of AC coefficients.Here it is how it's done:(Remember that we now code the Diff value)Diff corresponds as you've seen before to a representation made by category andit's bit coded representation.In the JPG file it will be Huffman encoded only the category value, like this:Diff = (category, bit-coded representation)Then Diff will be coded as (Huffman_code(category) , bit-coded representation)For example, if Diff is equal to -511 , then Diff  corresponds to                    (9, 000000000)Say that 9 has a Huffman code = 1111110(In the JPG file, there are 2 Huffman tables for an image component: one for DCand one for AC)In the JPG file, the bits corresponding to the DC coefficient will be:        1111110 000000000And,applied to this example of DC and to the previous example of ACs, for thisvector with 64 coefficients, THE FINAL STREAM OF BITS written in the JPG filewill be:   1111110 000000000 111000 111001  111000 101101  1111111110011001 10111       11111110110 00001 1011 0111   11011 1   1010(In the JPG file , first it's encoded DC then ACs)THE HUFFMAN DECODER (A brief summary) for the 64 coefficients (A Data Unit)of an image component (For example Y)-------------------------------------------------------------So when you decode a stream of bits from the image in the JPG file, you'll do:Init DC with 0.1) First the DC coefficient decode : a) Fetch a valid Huffman code (you check if it exists in the Huffman                                           DC table)         b) See at what category this Huffman code corresponds         c) Fetch N = category bits  , and determine what value is represented           by (category, the N bits fetched) = Diff         d) DC + = Diff         e) write DC in the 64 vector :      " vector[0]=DC "2) The 63 AC coefficients decode :------- FOR every AC coefficient UNTIL (EOB_encountered OR AC_counter=64)       a) Fetch a valid Huffman code (check in the AC Huffman table)       b) Decode that Huffman code : The Huffman code corresponds to                   (nr_of_previous_0,category)[Remember: EOB_encountered = TRUE if (nr_of_previous_0,category) = (0,0) ]       c) Fetch N = category bits, and determine what value is represented by              (category,the N bits fetched) = AC_coefficient       d) Write in the 64 vector, a number of zeroes = nr_of_previous_zero       e) increment the AC_counter with nr_of_previous_0       f) Write AC_coefficient in the vector:                  " vector[AC_counter]=AC_coefficient "-----------------Next Steps-----------So, now we have a 64 elements vector.We'll do the reverse of the steps presentedin this doc:1) Dequantize the 64 vector : "for (i=0;i<=63;i++) vector[i]*=quant[i]"2) Re-order from zig-zag the 64 vector into an 8x8 block3) Apply the Inverse DCT transform to the 8x8 blockRepeat the upper process [ Huffman decoder, steps 1), 2) and 3)]  for every8x8 block of every image component (Y,Cb,Cr).4) Up-sample if it's needed5) Level shift samples (add 128 to the all 8-bit signed values in the 8x8 blocksresulting from the IDCT transform)6) Tranform YCbCr to RGB7--- And VOILA ... the JPG imageThe JPEG ers and/or how it's organized the image information in the JPG file(The Byte level)--------------------------------------------------------------------------------NOTE: The JPEG/JFIF file format uses Motorola format for words, NOT Intel format,i.e. : high byte first, low byte last -- (ex: the word FFA0 will be written inthe JPEG file in the order : FF at the low offset , A0 at the higher offset)The JPG standard specifies that the JPEG file is composed mostly of pieces called"segments".A segment is a stream of bytes with length <= 65535.The segment beginning isspecified with a er.A er = 2 bytes beginning with 0xFF ( the C hexadecimal notation for 255),and ending with a byte different by 0 and 0xFF.Ex: 'FFDA' , 'FFC4', 'FFC0'.Each er has a meaning: the second byte (different by 0 and 0xFF) specifieswhat does that er.For example, there is a er which specifies that you should start the decodingprocess , this is called (the JPG standard's terminology):        SOS=Start Of Scan = 'FFDA'Another er called DQT = Define Quantization Table = 0xFFDB does what thisname says: specifies that in the JPG file, after the er (and after 3 bytes,more on this later) it will follow 64 bytes = the coefficients of the quantizationtable.If, during the processing of the JPG file, you encounter an 0xFF, then again aa byte different by 0 (I've told you that the second byte for a er is not 0)and this byte has no er meaning (you cannot find a er corresponding tothat byte) then the 0xFF byte you encountered must be ignored and skipped.(In some JPGS, sequences of consecutive 0xFF are for some filling purposes andmust be skipped)You see that whenever you encounter 0xFF , you check the next byte and see ifthat 0xFF you encountered has a er meaning or must be skipped.What happens if we actually need to encode the 0xFF byte in the JPG fileas an *usual* byte (not a er, or a filling byte) ?(Say that we need to write a Huffman code which begins with 11111111 (8 bits of1) at a byte alignment)The standard says that we simply make the next byte 0 , and write the sequence'FF00' in the JPG file.So when your JPG decoder meets the 2 byte 'FF00' sequence, it should considerjust a byte: 0xFF as an usual byte.Another thing: You realise that these ers are byte aligned in the JPG file.What happens if during your Huffman encoding and inserting bits in the JPG file'sbytes you have not finished to insert bits in a byte, but you need to write aer which is byte aligned ?For the byte alignment of the ers, you SET THE REMAINING BITS UNTIL THEBEGINNING OF THE NEXT BYTE TO 1, then you write the er at the next byte.A short explanation of some important ers found in a JPG file.-------------------------------------------------------------------SOI = Start Of Image = 'FFD8' This er must be present in any JPG file *once* at the beginning of the file.(Any JPG file starts with the sequence FFD8.)EOI = End Of Image = 'FFD9'  Similar to EOI: any JPG file ends with FFD9.RSTi = FFDi (where i is in range 0..7)  [ RST0 = FFD0, RST7=FFD7]     = Restart MarkersThese restart ers are used for resync. At regular intervals, they appearin the JPG stream of bytes, during the decoding process (after SOS)(They appear in the order: RST0 -- interval -- RST1 -- interval -- RST2 --...                      ...-- RST6 -- interval -- RST7 -- interval -- RST0 --...)(Obs: A lot of JPGs don't have restart ers)The problem with these ers is that they interrupt the normal bit order inthe JPG's Huffman encoded bitstream.Remember that for the byte alignment of the ers the remaining bits are setto 1, so your decoder has to skip at regular intervals the useless fillingbits (those set with 1) and the RST ers.-------Markers...-------At the end of this doc, I've included a very well written technical explanationof the JPEG/JFIF file format, written by Oliver Fromme, the author of the QPEGviewer.There you'll find a pretty good and complete definition for the ers.But, anyway, here is a list of ers you should check:SOF0 = Start Of Frame 0 = FFC0SOS  = Start Of Scan    = FFDAAPP0 = it's the er used to identify a JPG file which uses the JFIF    specification       = FFE0COM  = Comment          = FFFEDNL  = Define Number of Lines    = FFDCDRI  = Define Restart Interval   = FFDDDQT  = Define Quantization Table = FFDBDHT  = Define Huffman Table      = FFC4The Huffman table stored in a JPG file---------------------------------------Here it is how JPEG implements the Huffman tree: instead of a tree, it definesa table in the JPG file after the DHT (Define Huffman Table) er.NOTE: The length of the Huffman codes is restricted to 16 bits.Basically there are 2 types of Huffman tables in a JPG file : one for DC andone for AC (actually there are 4 Huffman tables: 2 for DC,AC of luminance       and 2 for DC,AC of chrominance)They are stored in the JPG file in the same format which consist of:1) 16 bytes :byte i contains the number of Huffman codes of length i (length in bits) i ranges from 1 to 16                                         162) A table with the length (in bytes) =  sum nr_codes_of_length_i                                         i=1which contains at location [k][j]  (k in 1..16, j in 0..(nr_codes_with_length_k-1))the BYTE value associated to the j-th Huffman code of length k.(For a fixed length k, the values are stored sorted by the value of the Huffmancode)From this table you can find the actual Huffman code associated to a particularbyte.Here it is an example of how the actual code values are generated:Ex:  (Note: The number of codes for a given length are here for this particular      example to figure it out, they can have any other values)SAY that,         For length 1 we have nr_codes[1]=0, we skip this length         For length 2 we have 2 codes  00                                       01         For length 3 we have 3 codes  100                                       101                                       110         For length  4 we have 1 code  1110         For length  5 we have 1 code  11110         For length  6 we have 1 code  111110         For length  7 we have 0 codes  -- skip (if we had 1 code for length 7,          we would have                1111110)         For length  8 we have 1 code  11111100 (You see that the code is still                                                 shifted to left though we skipped                                                 the code value for 7)         .....         For length 16, .... (the same thing)I've told you that in the Huffman table in the JPG file are stored the BYTE valuesfor a given code.For this particular example of Huffman codes:Say that in the Huffman table in the JPG file on disk we have (after that 16 byteswhich contains the nr of Huffman codes with a given length):    45 57 29 17 23 25 34 28These values corressponds , given that particular lengths I gave you before ,to the Huffman codes like this :    there's no value for  code of length 1    for codes of length 2 : we have 45 57    for codes of length 3 : 3 values (ex : 29,17,23)    for codes of length 4 : only 1 value (ex: 25)    for codes of length 5 : 1 value ( ex: 34)    ..    for code of length 7, again no value, skip to code with length 8    for code of length 8 : 1 value 28IMPORTANT note:  For codes of length 2:      the value 45 corresponds to code 00                57             to code 01  For codes of length 3:      the value 29 corresponds to code  100                17       ----||---      101                23       ----||---      110  ETC...(I've told you that for a given length the byte values are stored in the orderof increasing the value of the Huffman code.)Four Huffman tables corresponding to DC and AC tables of the luminance, andDC and AC tables for the chrominance, are given in an annex of the JPEGstandard as a suggestion for the encoder. The standard says that these tables have been tested with good compressionresults on a lot of images and reccommends them, but the encoder can use anyother Huffman table. A lot of JPG encoders use these tables. Some of them offeryou an option: entropy optimization - if it's enabled they'll use Huffmantables optimized for that particular image.The JFIF (Jpeg Format Interchange File) file---------------------------------------------    The JPEG standard (that in the itu-1150.ps file) is somehow very general,the JFIF implementation is a particular case of this standard (and it is, of course,compatible with the standard) .      The JPEG standard specifies some ers reserved for applications(by applications I mean particular cases of implementing the standard) Those ers are called APPn , where n ranges from 0 to 0xF ; APPn = FFEn The JFIF specification uses the APP0 er (FFE0) to identify a JPG file whichuses this specification. You'll see in the JPEG standard that it refers to "image components".These image components can be (Y,Cb,Cr) or (YIQ) or whatever. The JFIF implementations uses only (Y,Cb,Cr) for a truecolor JPG, or only Y fora monochrome JPG. You can get the JFIF specification from www.jpeg.orgThe sampling factors--------------------Note: The following explanation covers the encoding of truecolor (3 components)JPGS; for gray-scaled JPGs there is one component (Y) which is usually nodown-sampled at all, and does not require any inverse transformation like theinverse (Y,Cb,Cr) -> (R,G,B). In consequence, the gray-scaled JPGS are thesimplest and easiest to decode: for every 8x8 block in the image you do theHuffman decoding of the RLC coded vector then you reorder it from zig-zag,dequantize the 64 vector and finally you apply to it the inverse DCT and add128 (level shift) to the new 8x8 values.I've told you that image components are sampled. Usually Y is taken every pixel,and Cb, Cr are taken for a block of 2x2 pixels.But there are some JPGs in which  Cb , Cr are taken in every pixel, or someJPGs where Cb, Cr are taken every 2 pixels (a horizontal sampling at 2 pixels,and a vertical sampling in every pixel)The sampling factors for an image component in a JPG file are defined in respect(relative) to the highest sampling factor.Here are the sampling factors for the most usual example:         Y is taken every pixel , and Cb,Cr are taken for a block of 2x2 pixels(The JFIF specification gives a formula for sampling factors which I think thatworks only when the maximum sampling factor for each dimension X or Y is <=2)The JPEG standard does not specify the sampling factors , it's more general).You see that Y will have the highest sampling rate :        Horizontal sampling factor = 2  = HY               Vertical sampling factor   = 2  = VY     For Cb ,  Horizontal sampling factor = 1  = HCb        Vertical sampling factor   = 1  = VCb     For Cr    Horizontal sampling factor = 1  = HCr               Vertical sampling factor   = 1  = VCrActually this form of defining the sampling factors is quite useful.The vector of 64 coefficients for an image component, Huffman encoded, is called    DU = Data Unit (JPEG's standard terminology)In the JPG file , the order of encoding Data Units is :     1) encode Data Units for the first image component:             for  (counter_y=1;counter_y<=VY;counter_y++)                  for (counter_x=1;counter_x<=HY;counter_x++)                     {  encode Data Unit for Y }     2) encode Data Units for the second image component:             for  (counter_y=1;counter_y<=VCb ;counter_y++)                  for (counter_x=1;counter_x<=HCb;counter_x++)                     {  encode Data Unit for Cb }     3) finally, for the third component, similar:             for  (counter_y=1;counter_y<=VCr;counter_y++)                  for (counter_x=1;counter_x<=HCr;counter_x++)                     {  encode Data Unit for Cr }For the example I gave you (HY=2, VY=2 ; HCb=VCb =1, HCr,VCr=1)here it is a figure ( I think it will clear out things for you) :   YDU YDU    CbDU   CrDU          YDU YDU( YDU is a Data unit for Y , and similar CbDU a DU for Cb, CrDU = DU for Cr )This usual combination of sampling factors is referred as 2:1:1 for bothvertical and horizontal sampling factors.And, of course, in the JPG file the encoding order will be :      YDU,YDU,YDU,YDU,CbDU,CrDUYou know that a DU (64 coefficients) defines a block of 8x8 values , so herewe specified the encoding order for a block of 16x16 image pixels(An image pixel = an (Y,Cb,Cr) pixel [my notation]) :  Four 8x8 blocks of Y values (4 YDUs), one 8x8 block of Cb values (1 CbDU)and one 8x8 block of Cr values (1 CrDU)(Hmax = the maximum horizontal sampling factor , Vmax = the maximum verticalsampling factor)In consequence for this example of sampling factors (Hmax = 2, Vmax=2), theencoder should process SEPARATELY every 16x16 = (Hmax*8 x Vmax*8) image pixelsblock in the order mentioned.This block of image pixels with the dimensions (Hmax*8,Vmax*8) is called, inthe JPG's standard terminology, an MCU = Minimum Coded UnitFor the previous example : MCU = YDU,YDU,YDU,YDU,CbDU,CrDUAnother example of sampling factors :      HY =1, VY=1      HCb=1, VCb=1      HCr=1, VCr=1Figure/order :  YDU CbDU CrDUYou see that here is defined an 8x8 image pixel block (MCU) with 3 8x8 blocks:     one for Y, one for Cb and one for Cr (There's no down-sampling at all)Here (Hmax=1,Vmax=1) the MCU has the dimension (8,8), and MCU = YDU,CbDU,CrDUFor gray-scaled JPGs you don't have to worry about the order of encodingdata units in an MCU. For these JPGs, an MCU = 1 Data Unit (MCU = YDU)In the JPG file, the sampling factors for every image component are definedafter the er SOF0 = Start Of Frame 0 = FFC0A brief scheme of decoding a JPG file--------------------------------------The decoder reads from the JPG file the sampling factors, it finds out thedimensions of an MCU (Hmax*8,Vmax*8) => how many MCUs are in the whole image,then decodes every MCU present in the original image (a loop for all theseblocks, or until the EOI er is found [it should be found when the loopfinishes, otherwise you'll get an incomplete image]) - it decodes an MCUby decoding every Data Unit in the MCU in the order mentioned before, andfinally, writes the decoded (Hmax*8 x Vmax*8) truecolor pixel block into the(R,G,B) image buffer.MPEG-1 video and JPEG----------------------The interesting part of the MPEG-1 specification (and probably MPEG-2) is thatit relies heavily on the JPEG specification.It uses a lot of concepts presented here. The reason is that every 15 frames ,or when it's needed, there's an independent frame called I-frame (Intra frame)which is JPEG coded.(By the way, that 16x16 image pixels block example I gave you, is called,in theMPEG's standard terminology, a macroblock)Except the algorithms for motion compensation, MPEG-1 video relies a lot on theJPG specifications (the DCT transform , quantization, etc.)Hope you're ready now to start coding your JPG viewer or encoder.About the author of this doc----------------------------The author of this doc is Cristi Cuturicu, student at University Politehnicain Bucharest (UPB), Department of Computer Science.You can contact him by e-mail:               [email protected]                [email protected] if you are a software company needing a programmer then get in touch.A technical explanation of the JPEG/JFIF file format,written by Oliver Fromme, the author of the QPEG viewer-------------------------------------------------------Legal NOTE: The legal rules mentioned in the Disclaimer in top of this fileapply also to the following informations so neither Oliver Fromme, neither Ican be held responsible for errors or bugs in the following informations.The author of the following informations is:   Oliver Fromme   Leibnizstr. 18-61   38678 Clausthal   GERMANYJPEG/JFIF file format:~~~~~~~~~~~~~~~~~~~~~~  - header (2 bytes):  $ff, $d8 (SOI) (these two identify a JPEG/JFIF file)  - for JFIF files, an APP0 segment is immediately following the SOI er,    see below  - any number of "segments" (similar to IFF chunks), see below  - trailer (2 bytes): $ff, $d9 (EOI)Segment format:~~~~~~~~~~~~~~~  - header (4 bytes):       $ff     identifies segment        n      type of segment (one byte)       sh, sl  size of the segment, including these two bytes, but not               including the $ff and the type byte. Note, not Intel order:               high byte first, low byte last!  - contents of the segment, max. 65533 bytes. Notes:  - There are parameterless segments (denoted with a '*' below) that DON'T    have a size specification (and no contents), just $ff and the type byte.  - Any number of $ff bytes between segments is legal and must be skipped.Segment types:~~~~~~~~~~~~~~   *TEM   = $01   usually causes a decoding error, may be ignored    SOF0  = $c0   Start Of Frame (baseline JPEG), for details see below    SOF1  = $c1   dito    SOF2  = $c2   usually unsupported    SOF3  = $c3   usually unsupported    SOF5  = $c5   usually unsupported    SOF6  = $c6   usually unsupported    SOF7  = $c7   usually unsupported    SOF9  = $c9   for arithmetic coding, usually unsupported    SOF10 = $ca   usually unsupported    SOF11 = $cb   usually unsupported    SOF13 = $cd   usually unsupported    SOF14 = $ce   usually unsupported    SOF14 = $ce   usually unsupported    SOF15 = $cf   usually unsupported    DHT   = $c4   Define Huffman Table, for details see below    JPG   = $c8   undefined/reserved (causes decoding error)    DAC   = $cc   Define Arithmetic Table, usually unsupported   *RST0  = $d0   RSTn are used for resync, may be ignored   *RST1  = $d1   *RST2  = $d2   *RST3  = $d3   *RST4  = $d4   *RST5  = $d5   *RST6  = $d6   *RST7  = $d7    SOI   = $d8   Start Of Image    EOI   = $d9   End Of Image    SOS   = $da   Start Of Scan, for details see below    DQT   = $db   Define Quantization Table, for details see below    DNL   = $dc   usually unsupported, ignore    SOI   = $d8   Start Of Image    EOI   = $d9   End Of Image    SOS   = $da   Start Of Scan, for details see below    DQT   = $db   Define Quantization Table, for details see below    DNL   = $dc   usually unsupported, ignore    DRI   = $dd   Define Restart Interval, for details see below    DHP   = $de   ignore (skip)    EXP   = $df   ignore (skip)    APP0  = $e0   JFIF APP0 segment er, for details see below    APP15 = $ef   ignore    JPG0  = $f0   ignore (skip)    JPG13 = $fd   ignore (skip)    COM   = $fe   Comment, for details see below All other segment types are reserved and should be ignored (skipped).SOF0: Start Of Frame 0:~~~~~~~~~~~~~~~~~~~~~~~  - $ff, $c0 (SOF0)  - length (high byte, low byte), 8+components*3  - data precision (1 byte) in bits/sample, usually 8 (12 and 16 not    supported by most software)  - image height (2 bytes, Hi-Lo), must be >0 if DNL not supported  - image width (2 bytes, Hi-Lo), must be >0 if DNL not supported  - number of components (1 byte), usually 1 = grey scaled, 3 = color YCbCr    or YIQ, 4 = color CMYK)  - for each component: 3 bytes     - component id (1 = Y, 2 = Cb, 3 = Cr, 4 = I, 5 = Q)     - sampling factors (bit 0-3 vert., 4-7 hor.)     - quantization table number Res:  - JFIF uses either 1 component (Y, greyscaled) or 3 components (YCbCr,    sometimes called YUV, colour).APP0: JFIF segment er:~~~~~~~~~~~~~~~~~~~~~~~~~~  - $ff, $e0 (APP0)  - length (high byte, low byte), must be >= 16  - 'JFIF'#0 ($4a, $46, $49, $46, $00), identifies JFIF  - major revision number, should be 1 (otherwise error)  - minor revision number, should be 0..2 (otherwise try to decode anyway)  - units for x/y densities:     0 = no units, x/y-density specify the aspect ratio instead     1 = x/y-density are dots/inch     2 = x/y-density are dots/cm  - x-density (high byte, low byte), should be <> 0  - y-density (high byte, low byte), should be <> 0  - thumbnail width (1 byte)  - thumbnail height (1 byte)  - n bytes for thumbnail (RGB 24 bit), n = width*height*3 Res:  - If there's no 'JFIF'#0, or the length is < 16, then it is probably not    a JFIF segment and should be ignored.  - Normally units=0, x-dens=1, y-dens=1, meaning that the aspect ratio is    1:1 (evenly scaled).  - JFIF files including thumbnails are very rare, the thumbnail can usually    be ignored.  If there's no thumbnail, then width=0 and height=0.  - If the length doesn't match the thumbnail size, a warning may be    printed, then continue decoding.DRI: Define Restart Interval:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  - $ff, $dd (DRI)  - length (high byte, low byte), must be = 4  - restart interval (high byte, low byte) in units of MCU blocks,    meaning that every n MCU blocks a RSTn er can be found.    The first er will be RST0, then RST1 etc, after RST7    repeating from RST0.DQT: Define Quantization Table:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  - $ff, $db (DQT)  - length (high byte, low byte)  - QT information (1 byte):     bit 0..3: number of QT (0..3, otherwise error)     bit 4..7: precision of QT, 0 = 8 bit, otherwise 16 bit  - n bytes QT, n = 64*(precision+1) Res:  - A single DQT segment may contain multiple QTs, each with its own    information byte.  - For precision=1 (16 bit), the order is high-low for each of the 64 words.DAC: Define Arithmetic Table:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Current software does not support arithmetic coding for legal reasons. JPEG files using arithmetic coding can not be processed.DHT: Define Huffman Table:~~~~~~~~~~~~~~~~~~~~~~~~~~  - $ff, $c4 (DHT)  - length (high byte, low byte)  - HT information (1 byte):     bit 0..3: number of HT (0..3, otherwise error)     bit 4   : type of HT, 0 = DC table, 1 = AC table     bit 5..7: not used, must be 0  - 16 bytes: number of symbols with codes of length 1..16, the sum of these    bytes is the total number of codes, which must be <= 256  - n bytes: table containing the symbols in order of increasing code length    (n = total number of codes) Res:  - A single DHT segment may contain multiple HTs, each with its own    information byte.COM: Comment:~~~~~~~~~~~~~  - $ff, $fe (COM)  - length (high byte, low byte) of the comment = L+2  - The comment = a stream of bytes with the length = LSOS: Start Of Scan:~~~~~~~~~~~~~~~~~~~  - $ff, $da (SOS)  - length (high byte, low byte), must be 6+2*(number of components in scan)  - number of components in scan (1 byte), must be >= 1 and <=4 (otherwise    error), usually 1 or 3  - for each component: 2 bytes     - component id (1 = Y, 2 = Cb, 3 = Cr, 4 = I, 5 = Q), see SOF0     - Huffman table to use: - bit 0..3: AC table (0..3) - bit 4..7: DC table (0..3)  - 3 bytes to be ignored (???) Res:  - The image data (scans) is immediately following the SOS segment.
JPEG 标准是静态图像的压缩编码和译码标准。它是第一个压缩静态数字图像的国际标准，既可以用于灰度图像，又可以用于彩色图像。为了适应各种应用的不同要求，包括有两种基本的压缩算法。一种算法是基于 DCT(离散余弦变换) 的有损压缩算法，另一种是基于预测方法的无损压缩算法。　　JPEG 标准具体的编码方式有四种：顺序编码(Sequential encoding)、累进编码(Progressive encoding)、无损编码(Lossless encoding)、分层编码(Hierarchical encoding)。其中，基于 DCT 的顺序编码运行方式又被分成了 Baseline 顺序和扩展顺序两大类，共五种：　　1. Baseline 顺序
　　2. 扩展顺序：Huffman 编码，8 位样本精度
　　3. 扩展顺序：算术编码，8 位样本精度
　　4. 扩展顺序：Huffman 编码，12 位样本精度
　　5. 扩展顺序：算术编码，12 位样本精度　　Baseline 是最基本也是最简单的一种顺序，它把基于 DCT 的顺序编码限制在一个特定的编码顺序里，这个顺序的具体限制，即 Baseline 系统限制为：　　(1) 对每个图像分量，仅使用 8位样本
　　(2) 仅使用 Huffman 编码
　　(3) 最多只使用一套 DC 表和一套 AC 表

调试易

请教JPEG文件的格式!

解决方案 »