DirectShow 声音和视频缓冲的问题

DirectShow 的声音和视频在进行Render的时候DSHOW有没有一个缓冲机制，如果有分别是什么情况，他们是如何运行，如何进行同步，如果声音包来的多了，延迟的包和新的包一快来，那么是如何处理的，有没有丢包的机制。
有没有这方面相关的文章介绍介绍谢谢。

解决方案 »

免费领取超大流量手机卡，每月29元包185G流量+100分钟通话, 中国电信官方发货

内部机制不清楚只知道有个quality control进行视频帧速率的控制
（一）介绍一下微软的DirectShow技术
长久以来，多媒体应用一直面临着挑战，包括多媒体大量的数据传输，快速的数据处理要求，音视频流的同步，媒体流的格式转换等等。DirectShow正是微软针对以上问题设计，并且很好地解决了这些问题的一种应用架构。它的设计目标是，隔离数据传输、硬件兼容、流同步等底层处理，使客户能够轻轻松松地创建Windows应用平台上的多媒体应用程序。
为提高数据处理的效率，DirectShow运用了DirectDraw和DirectSound技术。同时，DirectShow还运用了COM组件技术，它的名字叫作Filter。Filter大致可分为三种：Source Filters、Transform Filters、Rendering Filters。Filter传输数据的端口叫作Pin。Pin有Input Pin和Output Pin两种。在应用中，将各种Filter连接起来，也就是将前面Filter（Upstream Filter）的Output Pin与后面Filter（Downstream Filter）的Input Pin连接起来，即组成一个完整的Filter Graph。运行这个Filter Graph，多媒体数据就开始从Source Filters，经过Transform Filters到Rendering Filters流动。DirectX提供了一个非常实用的工具——GraphEdit.exe——用以可视化调试Filter。在我们后面的应用实例中，我们自己编写了一个Filter（这是一个Source Filter），用于异步读取网络上出来的Mpeg1数据；紧跟这个Filter后面的是Transform Filters，负责Mpeg1数据的音视频分离（Mpeg1 Stream Splitter）、音视频数据的解码（Mpeg Audio Decoder和Mpeg Video Decoder）；然后是音视频流各自的Rendering Filters，一个叫Video Renderer的负责视频的显示，一个叫Default DirectSound Device的负责音频数据的播放。注：Source Filter的改写参考了DirectX 8.0 SDK的例子代码，它的路径为：\samples\Multimedia\DirectShow \Filters\Async\Memfile。（二）介绍一下Windows Socket网络传输技术
我们要处理的数据并不在本地计算机，而是由另外一台视频服务器负责发送出来。那么，我们如何得到这些数据，然后再使用我们的DirectShow Filter进行处理呢？这就用到了Windows Socket技术。
运用Windows Socket技术，能够让两台计算机通过网络建立连接，并且通过定义一些上层协议，实现计算机之间的大量数据传输以及其他一些控制。微软的MFC也提供了两个类：CAsyncSocket和CSocket，用以方便客户使用Socket的特性。CAsyncSocket从较低层次封装了Windows Socket API，并且通过内建一个窗口，实现了适合Windows应用的异步机制。CSocket类从CAsyncSocket中继承而来，更简化了客户对Socket的应用。但是，这两个类均有缺陷，特别是在跨线程使用Socket的时候。
在我们后面的应用实例中，因为处理的数据量较大，我们使用了多线程。在负责数据发送的服务器端，使用专门的线程进行数据的发送；在客户端，使用专门的线程进行数据的接收，并把数据放到缓冲队列中，供DirectShow Filter读取处理。因此，我们自己封装了几个Socket的类：CListenSocket（用于服务器端建立监听客户连接的Socket类）、CWorkerSocket（负责数据传输的Socket基类）。从CWorkerSocket再派生两个类：CMediaSocketServer（用于服务器端数据的发送）和CMediaSocketClient（用于客户端数据的接收）。
我们把连续不断的Mpeg1数据分成一个一个小包的负载数据（取一个PACK的大小，2324字节），加上一定的信息头，在网络两端传输。注：用包在网络上传是会出现丢失的情况
为了避免可以再加一个Filter以stream的形式传送。
网络客户端接收到的Mpeg1数据，必须进行一定量的缓冲，然后才能交给DirectShow解码处理。接着，动态地，一边继续从网络接收数据，一边得到新的数据进行解码回放。
DirectShow uses a modular architecture, where each stage of processing is done by a COM object called a filter. DirectShow provides a set of standard filters for applications to use, and developers can write their own custom filters that extend the functionality of DirectShow. To illustrate, here are the steps needed to play an AVI video file, along with the filters that perform each step: Read the raw data from the file as a byte stream (File Source filter)
Examine the AVI headers, and parse the byte stream into separate video frames and audio samples (AVI Splitter filter)
Decode the video frames (various decoder filters, depending on the compression format).
Draw the video frames (Video Renderer filter)
Send the audio samples to the sound card (Default DirectSound Device filter).
These filters are shown in the following diagram:As the diagram shows, each filter is connected to one or more other filters. The connection points are also COM objects, called pins. Filters use pins to move data from one filter the next. The arrows in the diagram show the direction in which the data travels. In DirectShow, a set of filters is called a filter graph. Filters have three possible states: running, stopped, and paused. When a filter is running, it processes media data. When it is stopped, it stops processing data. The paused state is used to cue data before running; the section Data Flow in the Filter Graph describes this concept in more detail. With very rare exceptions, state changes are coordinated throughout the entire filter graph; all the filters in the graph switch states in unison. Thus, the entire filter graph is also said to be running, stopped, or paused. Filters can be grouped into several broad categories: A source filter introduces data into the graph. The data might come from a file, a network, a camera, or anywhere else. Each source filter handles a different type of data source.
A transform filter takes an input stream, processes the data, and creates an output stream. Encoders and decoders are examples of transform filters.
Renderer filters sit at the end of the chain. They receive data and present it to the user. For example, a video renderer draws video frames on the display; an audio renderer sends audio data to the sound card; and a file-writer filter writes data to a file.
A splitter filter splits an input stream into two or more outputs, typically parsing the input stream along the way. For example, the AVI Splitter parses a byte stream into separate video and audio streams.
A mux filter takes multiple inputs and combines them into a single stream. For example, the AVI Mux performs the inverse operation of the AVI Splitter. It takes audio and video streams and produces an AVI-formatted byte stream.
The distinctions between these categories are not absolute. For example, the ASF Reader filter acts as both a source filter and a splitter filter.All DirectShow filters expose the IBaseFilter interface, and all pins expose the IPin interface. DirectShow also defines many other interfaces that support more specific functionality.
我们再来看一下Sample的时间戳（Time Stamp）。需要注意的是，每个Sample上可以设置两种时间戳：IMediaSample::SetTime和IMediaSample::SetMediaTime。我们通常讲到时间戳，一般是指前者，它又叫Presentation time，Renderer正是根据这个时间戳来控制播放；而后者对于Filter来说不是必须的，Media time有没有用取决于你的实现，比如你给每个发出去的Sample依次打上递增的序号，在后面的Filter接收时就可以判断传输的过程中是否有Sample丢失。我们再看一下IMediaSample::SetTime的参数，两个参数类型都是REFERENCE_TIME，千万不要误解这里的时间是Reference time，其实它们用的是Stream time。还有一点，就是并不是所有的Sample都要求打上时间戳。对于一些压缩数据，时间戳是很难打的，而且意义也不是很大（不过压缩数据经过Decoder出来之后到达Renderer之前，一般都会打好时间戳了）。时间戳包括两个时间，开始时间和结束时间。当Renderer接收到一个Sample时，一般会将Sample的开始时间和当前的Stream time作比较，如果Sample来晚了或者没有时间戳，则马上播放这个Sample；如果Sample来得早了，则通过调用参考时钟的IReferenceClock::AdviseTime等待Sample的开始时间到达后再将这个Sample播放。Sample上的时间戳一般由Source Filter或Parser Filter来设置，设置的方法有如下几种情况：
1. 文件回放（File playback）：第一个Sample的时间戳从0开始打起，后面Sample的时间戳根据Sample有效数据的长度和回放速率来定。
2. 音视频捕捉（Video and audio capture）：原则上，采集到的每一个Sample的开始时间都打上采集时刻的Stream time。对于视频帧，Preview pin出来的Sample是个例外，因为如果按上述方法打时间戳的话，每个Sample通过Filter链路传输，最后到达Video Renderer的时候都将是迟到的；Video Renderer通过Quality Control反馈给Source Filter，会导致Source Filter丢帧。所以，Preview pin出来的Sample都不打时间戳。对于音频采集，需要注意的是，Audio Capture Filter与声卡驱动程序两者各自使用了不同的缓存，采集的数据是定时从驱动程序缓存拷贝到Filter的缓存的，这里面有一定时间的消耗。
3. 合成（Mux Filters）：取决于Mux后输出的数据类型，可以打时间戳，也可以不打时间戳。
大家可以看到，Sample的时间戳对于保证音视频同步是很重要的。Video Renderer和Audio Renderer作为音视频同步的最终执行者，需要做很多工作。我们或许要开发其它各种类型的Filter，但一般这两个Filter是不用再开发的。一是因为Renderer Filter本身的复杂性，二是因为微软会对这两个Filter不断升级，集成DirectX中其它模块的最新技术（如DirectSound、DirectDraw、Direct3D等）。