FFmpeg硬件加速

文档中心

本文 ffmpeg4.4.1 源码为准，用以下命令分析 ffmpeg.c 里面的硬件加速逻辑实现。

命令如下：

ffmpeg.exe -hwaccel cuvid -vcodec h264_cuvid -i juren_10s.mp4 -vcodec h264_nvenc -acodec copy juren_h264_nvenc_10s.mp4 -y

以上命令使用 h264_cuvid 硬件解码 MP4，然后再使用 h264_nvenc 硬件编码成 MP4。juren_10s.mp4 下载地址，百度网盘，提取码：3khn

如何搭建 qt creator 的 ffmpeg 硬件加速调试环境，请看以下文章。

《window10_ffmpeg-with-nvidia-gpu编译》
《ffmpeg-qt-msvc移植调试》

CUDA 硬件加速的代码，貌似不是ABI 兼容的，所以只能用 MSVC 编译出 DLL。然后 qt creator 里面也必须使用 msvc 编译调试，不能用 MinGW ，会报错。

完整项目下载：百度网盘，提取码：9yeu，qt creator 编译 Kits 请选择 MSVC 2019 64 bits ，调试环境如图：

其实ffmpeg.c 工程的硬件加速代码在3地方都有分布，解码，filter，编码。本文分开讲述。

硬件加速，解码的流程图如下：

首先，在 ffmpeg_opt.c 的 add_input_streams() 添加输入流的时候，初始化硬件解码相关变量参数，如下：

ffmpeg_opt.c      if (hwaccel) {  // The NVDEC hwaccels use a CUDA device, so remap the name here.  if (!strcmp(hwaccel, "nvdec") || !strcmp(hwaccel, "cuvid"))      hwaccel = "cuda";  if (!strcmp(hwaccel, "none"))      ist->hwaccel_id = HWACCEL_NONE;  else if (!strcmp(hwaccel, "auto"))      ist->hwaccel_id = HWACCEL_AUTO;  else {      enum AVHWDeviceType type;      int i;      for (i = 0; hwaccels[i].name; i++) {   if (!strcmp(hwaccels[i].name, hwaccel)) {ist->hwaccel_id = hwaccels[i].id;break;   }      }      if (!ist->hwaccel_id) {   type = av_hwdevice_find_type_by_name(hwaccel);   if (type != AV_HWDEVICE_TYPE_NONE) {ist->hwaccel_id = HWACCEL_GENERIC;ist->hwaccel_device_type = type;   }      }      if (!ist->hwaccel_id) {   av_log(NULL, AV_LOG_FATAL, "Unrecognized hwaccel: %s.\n",   hwaccel);   av_log(NULL, AV_LOG_FATAL, "Supported hwaccels: ");   type = AV_HWDEVICE_TYPE_NONE;   while ((type = av_hwdevice_iterate_types(type)) !=   AV_HWDEVICE_TYPE_NONE)av_log(NULL, AV_LOG_FATAL, "%s ",av_hwdevice_get_type_name(type));   av_log(NULL, AV_LOG_FATAL, "\n");   exit_program(1);      }  }     }

上面这段代码主要有以下重点：

解析命令行参数 -hwaccel cuvid 到 hwaccel 变量，所以上图中的 hwaccel 等于 cuvid，后续被合并修改为 cuda。
设置 ist->hwaccel_id ，在本环境中，被设置为 HWACCEL_GENERIC。
设置 ist->hwaccel_device_type，在本环境中，被设置为 AV_HWDEVICE_TYPE_CUDA
命令行没指定 -hwaccel cuvid 会导致 ist->hwaccel_id 没设置，会影响 get_format() 里面的逻辑

然后在 ffmpeg.c 的 init_input_stream() 函数里面，初始化输入流的时候，也有一部分硬件解码相关代码，如下：

ffmpeg.c static int init_input_stream(int ist_index, char *error, int error_len){    //省略代码...    if (ist->decoding_needed) { ist->dec_ctx->opaque  = ist; //注意 get_format ist->dec_ctx->get_format     = get_format; ist->dec_ctx->get_buffer2    = get_buffer; 省略代码...    } ret = hw_device_setup_for_decode(ist);    if (ret file_index, ist->st->index, av_err2str(ret)); return ret;    }    if ((ret = avcodec_open2(ist->dec_ctx, codec, &ist->decoder_opts)) < 0) { //省略代码...    } return 0;}

上面代码，有两个重点。

1，hw_device_setup_for_decode() 初始化硬件解码设备

2，get_format() ，get_format() 这是一个回调函数，在 avcodec_open2() 打开的解码器的时候会调用 get_format()，根据 get_format 的返回值决定解码器输出哪种像素格式，一般解码器支持输出的像素格式有限，例如 h264_cuvid 只支持输出 NV12 跟 CUDA 两种像素格式。

先讲 hw_device_setup_for_decode() 函数，主要代码如下：

int hw_device_setup_for_decode(InputStream *ist){    const AVCodecHWConfig *config;    enum AVHWDeviceType type;    HWDevice *dev = NULL;    int err, auto_device = 0;    if (ist->hwaccel_device) { //省略代码... //命令行没指定 -hwaccel_device，这里逻辑没执行。    } else { if (ist->hwaccel_id == HWACCEL_AUTO) {     auto_device = 1; } else if (ist->hwaccel_id == HWACCEL_GENERIC) {     type = ist->hwaccel_device_type;     dev = hw_device_get_by_type(type);     if (!dev){  //重点代码  err = hw_device_init_from_type(type, NULL, &dev);     } } else {     //省略代码.，逻辑没有执行 }    }    if (auto_device) { //省略代码.，逻辑没有执行    }    if (!dev) { av_log(ist->dec_ctx, AV_LOG_ERROR, "No device available " "for decoder: device type %s needed for codec %s.\n", av_hwdevice_get_type_name(type), ist->dec->name); return err;    }    //重点代码    ist->dec_ctx->hw_device_ctx = av_buffer_ref(dev->device_ref);    if (!ist->dec_ctx->hw_device_ctx) return AVERROR(ENOMEM);    return 0;}

由于我们命令行没使用 -hwaccel_device 指定硬件加速设备，所以 if (ist->hwaccel_device) {xxx} 的条件并没有跑进去。

以上代码都是经过删减的代码，有以下重点。

1，调用 hw_device_init_from_type(type, NULL, &dev); 初始化 dev 变量。

2，ist->dec_ctx->hw_device_ctx 初始化，用了 av_buffer_ref() 函数，AVBuffer 是ffmpeg的一个通用结构，很多字段都是 AVBuffer。C语言就是用一块void *内存来实现泛型，然后做指针强制转换，这块内存就会被解析成相应的类型（struct）。

接着分析 get_format 函数，get_format 是用来给调用层决定解码出来什么样的 pixel format 的。get_format() 的定义如下：

/*** callback to negotiate the pixelFormat* @param fmt is the list of formats which are supported by the codec,* it is terminated by -1 as 0 is a valid format, the formats are ordered by quality.* The first is always the native one.* @note The callback may be called again immediately if initialization for* the selected (hardware-accelerated) pixel format failed.* @warning Behavior is undefined if the callback returns a value not* in the fmt list of formats.* @return the chosen format* - encoding: unused* - decoding: Set by user, if not set the native format will be chosen.*/enum AVPixelFormat (*get_format)(struct AVCodecContext *s, const enum AVPixelFormat * fmt);

第二个参数 const enum AVPixelFormat * fmt 是解码器支持的像素格式。本命令使用的解码器是 h264_cuvid ，只支持 NV12，CUDA 两种像素格式。

get_format 函数的实现在 ffmpeg.c 里面：

static enum AVPixelFormat get_format(AVCodecContext *s, const enum AVPixelFormat *pix_fmts){    InputStream *ist = s->opaque;    const enum AVPixelFormat *p;    int ret;    省略代码...    return *p;}

主要有以下重点：

1，非硬件加速的解码器（NV12 像素格式是非硬件加速的），默认取第一个支持的像素格式作为解码输出。可以看到这里直接 break ，跳过循环。

if (!(desc->flags & AV_PIX_FMT_FLAG_HWACCEL))    break;

2，如果是硬件加速的解码（CUDA 像素格式是硬件加速的），就会继续执行，用 avcodec_get_hw_config() 找出一个 config 是支持 AV_CODEC_HW_CONFIG_METHOD_HW_DEVICE_CTX 的。

if (ist->hwaccel_id == HWACCEL_GENERIC ||    ist->hwaccel_id == HWACCEL_AUTO) {    for (i = 0;; i++) { config = avcodec_get_hw_config(s->codec, i); if (!config)     break; if (!(config->methods &AV_CODEC_HW_CONFIG_METHOD_HW_DEVICE_CTX))     continue; if (config->pix_fmt == *p)     break;    }}

3，尝试初始化硬件解码器。

 ret = hwaccel_decode_init(s); if (ret hwaccel_id == HWACCEL_GENERIC) { av_log(NULL, AV_LOG_FATAL,     "   %s hwaccel requested for input stream #%d:%d, "  "but cannot be initialized.\n",  av_hwdevice_get_type_name(config->device_type), ist->file_index, ist->st->index); return AV_PIX_FMT_NONE;    }    continue; }

4，设置硬件解码输出的格式为 CUDA 格式，break，然后会 return。

ist->hwaccel_pix_fmt = *p;break;

以上就是 ffmpeg.c 里 get_foramt() 对于普通的解码跟硬件解码的区别处理，主要重点如下：

1，普通解码直接返回第一个解码器支持的像素格式。

2，硬件解码会多做一些检测，跟变量初始化。

硬件解码还有一个函数 get_buffer()，也是在 ffmpeg.c 里面，代码如下：

static int get_buffer(AVCodecContext *s, AVFrame *frame, int flags){    InputStream *ist = s->opaque;    if (ist->hwaccel_get_buffer && frame->format == ist->hwaccel_pix_fmt) return ist->hwaccel_get_buffer(s, frame, flags);    return avcodec_default_get_buffer2(s, frame, flags);}

这里面其实是对 qsv 硬件解码做了特殊处理，ist->hwaccel_get_buffer 这个只会在 qsv_init() 里面被初始化赋值。

我们用的是 cuda，会直接走默认的 get_buffer 函数，就是 avcodec_default_get_buffer2()。

至此，ffmpeg 的硬件解码已经分析完毕。

硬件加速 filter的处理如下：

ffmpeg_filter.c 1037行ret = hw_device_setup_for_filter(fg);

int hw_device_setup_for_filter(FilterGraph *fg){    HWDevice *dev;    int i;    // If the user has supplied exactly one hardware device then just    // give it straight to every filter for convenience.  If more than    // one device is available then the user needs to pick one explcitly    // with the filter_hw_device option.    if (filter_hw_device) dev = filter_hw_device;    else if (nb_hw_devices == 1) dev = hw_devices[0];    else dev = NULL;    if (dev) { for (i = 0; i graph->nb_filters; i++) {     fg->graph->filters[i]->hw_device_ctx =  av_buffer_ref(dev->device_ref);     if (!fg->graph->filters[i]->hw_device_ctx)  return AVERROR(ENOMEM); }    }    return 0;}

hw_device_setup_for_filter() 重点就是设置了 filter里面的 hw_device_ctx 变量，估计是用来处理硬件像素格式的 filter 逻辑。

硬件加速，编码流程图如下：

hw_device_setup_for_encode() 函数里的代码就不粘贴了，比较容易理解，在本文命令里主要就设置了一个变量 ost->enc_ctx->hw_frames_ctx

hw_device_setup_for_encode()ost->enc_ctx->hw_frames_ctx = av_buffer_ref(frames_ref);

命令行参数中，有个奇怪的地方， -hwaccel cuvid，我个人比较疑惑，这个参数起到什么样的作用，硬件编解码应该只需要指定解码器是什么就行了，为什么还要多此一举指定 -hwaccel cuvid 呢？带着这个疑问继续研究。接下来分析如果没有指定 -hwaccel cuvid 这个会有何影响，命令如下：

ffmpeg.exe -vcodec h264_cuvid -i juren_10s.mp4 -vcodec h264_nvenc -acodec copy juren_h264_nvenc_10s.mp4 -y

没设置 -hwaccel cuvid 会导致以下变化：

1，导致 add_input_streams() 里面的以下逻辑不会执行，导致 ist->hwaccel_id 没有值。

add_input_streams()if( hwaccel ){     设置 ist->hwaccel_id     设置 ist->hwaccel_device_type}

2，ist->hwaccel_id 没有值，就会导致 get_format() 函数返回的 AVPixelFormat *p 是 NV12，而不是 CUDA。这里 NV12 是没有 AV_PIX_FMT_FLAG_HWACCEL 这个标记的，CUDA有这个标记。所以会导致 h264_cuvid 这个解码器输出的 AVFrame 是 NV12 格式的，不是原来的 CUDA 格式。但 h264_cuvid 依然是一个硬件解码器。

3，影响 hw_device_setup_for_decode() 函数的逻辑，导致 ist->dec_ctx->hw_device_ctx 没有值。

4，影响 hw_device_setup_for_decode() 函数的逻辑，导致 hw_device_init_from_type() 没有执行，所以变量 nb_hw_devices 等于 0，应该是没有硬件设备的意思。

4，变量 nb_hw_devices 等于 0 会影响 hw_device_setup_for_filter() 函数的逻辑，导致 fg->graph->filters[i]->hw_device_ctx 没有赋值，hw_device_setup_for_filte() 函数的代码上面有，不贴了。

5，fg->graph->filters[i]->hw_device_ctx 没有赋值，会导致 hw_device_setup_for_encode() 里面的 av_buffersink_get_hw_frames_ctx() 函数拿不到值，进而导致 ost->enc_ctx->hw_frames_ctx 没有被设置，代码如下：

hw_device_setup_for_encode()frames_ref = av_buffersink_get_hw_frames_ctx(ost->filter->filter);ost->enc_ctx->hw_frames_ctx = av_buffer_ref(frames_ref); //没有执行

做下总结， -hwaccel cuvid 没设置，所以

ist->hwaccel_id 没有值
ist->hwaccel_device_type 没有值
ist->dec_ctx->hw_device_ctx 没有值
nb_hw_devices 等于 0
fg->graph->filters[i]->hw_device_ctx 没有值
ost->enc_ctx->hw_frames_ctx 没有值

重点：解码的时候用的是 dec_ctx->hw_device_ctx ,编码的时候设置的 enc_ctx->hw_frames_ctx，hw_device_ctx 跟 hw_frames_ctx 应该是两个不同的东西，这里埋个坑，后续讲解。

虽然没设置 -hwaccel cuvid 导致这么多变量没有值，但是我看我的GPU，却实实在在跑满了，这个问题，我也百思不得其解，如下图：

从上面的分析看起来，-hwaccel cuvid 貌似并不会影响到使用GPU编解码

讨论补充：

CUDA 跟 CUVID 是 ffmpeg 实现的两种使用硬件加速的方式，主要区别是 frame 怎么解码，然后内存数据怎么转发。

网址：HWAccelIntro – FFmpeg

还有最后一个分析，h264_cuvid 解码器解码出来 CUDA 格式的 AVFrame，因为某些编码器只支持NV12格式，我们想转成 NV12 的AVFrame，再传递给编码器如何操作。可以指定 -hwaccel_output_format nv12 ,命令如下：

ffmpeg.exe -hwaccel cuvid  -hwaccel_output_format nv12 -vcodec h264_cuvid -i juren_10s.mp4 -vcodec h264_nvenc -acodec copy juren_h264_nvenc_10s.mp4 -y

这个功能是由 hwaccel_retrieve_data() 函数实现的，在 hwaccel_retrieve_data() 内部如果 ist->hwaccel_pix_fmt 跟 ist->hwaccel_output_format 不一致，就会进行硬件格式转换。

这里的像素格式转换跟《ffmpeg命令分析-pix_fmt》不太一样，-pix_fmt 是通过 format filter 来实现的，针对的是非硬件像素格式，如果 format filter 的输入是 cuda 像素格式，输出是 nv12 之类的非硬件像素格式，format filter会报错。

总结：

1，-pix_fmt ，通过 format filter 来实现，用于非硬件像素格式的转换。

2，-hwaccel_output_format，通过 hwaccel_retrieve_data() 来实现，用于硬件像素格式的转换。

ffmpeg cuda 硬件加速分析完毕。

由于笔者的水平有限，加之编写的同时还要参与开发工作，文中难免会出现一些错误或者不准确的地方，恳请读者批评指正。如果读者有任何宝贵意见，或者希望交流音视频技术的，可以加我微信 Loken1。

推荐一个零声学院免费公开课程，个人觉得老师讲得不错，分享给大家：

Linux，Nginx，ZeroMQ，MySQL，Redis，fastdfs，MongoDB，ZK，流媒体，CDN，P2P，K8S，Docker，TCP/IP，协程，DPDK等技术内容，立即学习

FFmpeg硬件加速

公告

标签

FFmpeg硬件加速

相关问题

公告

标签