CPU 调度器如何工作?#

NumPy 调度器基于多源编译,这意味着从某个源代码开始,使用不同的编译器标志以及不同的 **C** 定义进行多次编译,这些定义会影响代码路径。这使每个编译后的对象能够根据所需的优化使用特定的指令集,最后将返回的对象链接在一起。

../../_images/opt-infra.png

这种机制应该支持所有编译器,不需要任何特定于编译器的扩展,但同时它在正常编译中添加了一些步骤,这些步骤解释如下。

1- 配置#

在开始通过上面解释的两个命令参数构建源文件之前,用户配置所需的优化。

  • --cpu-baseline: 最小必需的优化集。

  • --cpu-dispatch: 调度后的额外优化集。

2- 发现环境#

在此部分,我们检查编译器和平台架构,并缓存一些中间结果以加快重建速度。

3- 验证请求的优化#

通过针对编译器测试它们,并查看编译器根据请求的优化可以支持什么。

4- 生成主配置头文件#

生成的头部文件 _cpu_dispatch.h 包含所有指令集的定义和头部文件,这些指令集对应于在上一步中验证的所需优化。

它还包含额外的 C 定义,用于定义 NumPy 的 Python 级模块属性 __cpu_baseline____cpu_dispatch__

该头文件包含什么?

该示例头文件是由 gcc 在 X86 机器上动态生成的。编译器支持 --cpu-baseline="sse sse2 sse3"--cpu-dispatch="ssse3 sse41",结果如下。

// The header should be located at numpy/numpy/_core/src/common/_cpu_dispatch.h
/**NOTE
 ** C definitions prefixed with "NPY_HAVE_" represent
 ** the required optimizations.
 **
 ** C definitions prefixed with 'NPY__CPU_TARGET_' are protected and
 ** shouldn't be used by any NumPy C sources.
 */
/******* baseline features *******/
/** SSE **/
#define NPY_HAVE_SSE 1
#include <xmmintrin.h>
/** SSE2 **/
#define NPY_HAVE_SSE2 1
#include <emmintrin.h>
/** SSE3 **/
#define NPY_HAVE_SSE3 1
#include <pmmintrin.h>

/******* dispatch-able features *******/
#ifdef NPY__CPU_TARGET_SSSE3
  /** SSSE3 **/
  #define NPY_HAVE_SSSE3 1
  #include <tmmintrin.h>
#endif
#ifdef NPY__CPU_TARGET_SSE41
  /** SSE41 **/
  #define NPY_HAVE_SSE41 1
  #include <smmintrin.h>
#endif

基线特性 是通过 --cpu-baseline 配置的最小必需优化集。它们没有预处理器保护,始终处于启用状态,这意味着它们可以在任何源文件中使用。

这意味着 NumPy 的基础设施将基线特性的编译器标志传递给所有源代码吗?

绝对正确。但是,可调度源代码 的处理方式不同。

如果用户在构建过程中指定了某些 **基线特性**,但在运行时机器不支持这些特性,那么编译后的代码将通过这些定义中的一个进行调用,或者编译器本身会根据提供的命令行编译器标志自动生成/矢量化某些代码段吗?

在加载 NumPy 模块期间,有一个验证步骤来检测这种行为。它将引发 Python 运行时错误以告知用户。这是为了防止 CPU 遇到非法指令错误导致段错误。

可调度特性 是我们通过 --cpu-dispatch 配置的额外优化调度集。它们默认情况下不会被激活,并且始终受以 NPY__CPU_TARGET_ 为前缀的其它 C 定义的保护。C 定义 NPY__CPU_TARGET_ 仅在 **可调度源代码** 中启用。

5- 可调度源代码和配置语句#

可调度源代码是特殊的 **C** 文件,可以多次使用不同的编译器标志以及不同的 **C** 定义进行编译。这些定义会影响代码路径,以便为每个编译后的对象根据“**配置语句**”启用特定的指令集,这些语句必须在 **C** 注释 (/**/) 之间声明,并在每个可调度源代码的顶部以特殊标记 **@targets** 开头。同时,如果优化被命令参数 --disable-optimization 禁用,则可调度源代码将被视为正常的 **C** 源代码。

什么是配置语句?

配置语句是一种组合在一起的关键字,用于确定可调度源代码所需的优化。

示例

/*@targets avx2 avx512f vsx2 vsx3 asimd asimdhp */
// C code

这些关键字主要代表通过 --cpu-dispatch 配置的额外优化,但它们也可以代表其他选项,例如

  • 目标组:用于管理从可调度源代码外部所需的优化的预配置配置语句。

  • 策略:用于更改默认行为或强制编译器执行某些操作的选项集合。

  • “baseline”:一个唯一的关键字代表通过 --cpu-baseline 配置的最小优化。

Numpy 的基础设施通过四个步骤处理可调度源代码:

  • (A) 识别:与源代码模板和 F2PY 一样,可调度源代码需要特殊的扩展名 *.dispatch.c 来标记 C 可调度源文件,对于 C++,则是 *.dispatch.cpp*.dispatch.cxx **注意**:C++ 尚未支持。

  • (B) 解析和验证:在此步骤中,将逐个解析和验证先前步骤中过滤出的可调度源代码,以确定每个源代码所需的优化。

  • (C) 包装:这是 NumPy 基础设施采用的方法,它已被证明足够灵活,可以多次使用不同的 **C** 定义和标志编译单个源代码,这些定义和标志会影响代码路径。该过程是通过为每个与额外优化相关的所需优化创建一个临时 **C** 源代码来实现的,该源代码包含 **C** 定义的声明,并通过 **C** 指令 **#include** 包含相关源代码。为了更好地理解,请查看以下关于 AVX512F 的代码

    /*
     * this definition is used by NumPy utilities as suffixes for the
     * exported symbols
     */
    #define NPY__CPU_TARGET_CURRENT AVX512F
    /*
     * The following definitions enable
     * definitions of the dispatch-able features that are defined within the main
     * configuration header. These are definitions for the implied features.
     */
    #define NPY__CPU_TARGET_SSE
    #define NPY__CPU_TARGET_SSE2
    #define NPY__CPU_TARGET_SSE3
    #define NPY__CPU_TARGET_SSSE3
    #define NPY__CPU_TARGET_SSE41
    #define NPY__CPU_TARGET_POPCNT
    #define NPY__CPU_TARGET_SSE42
    #define NPY__CPU_TARGET_AVX
    #define NPY__CPU_TARGET_F16C
    #define NPY__CPU_TARGET_FMA3
    #define NPY__CPU_TARGET_AVX2
    #define NPY__CPU_TARGET_AVX512F
    // our dispatch-able source
    #include "/the/absuolate/path/of/hello.dispatch.c"
    
  • (D) 可调度配置头文件:基础设施为每个可调度源代码生成一个配置头文件,该头文件主要包含两个抽象的 **C** 宏,用于识别生成的代码对象,以便它们可以用于在运行时从生成的代码对象调度某些符号,方法是使用任何 **C** 源代码。它也用于前向声明。

    生成的头部文件采用可调度源代码的名称,删除扩展名,并将其替换为 .h,例如假设我们有一个名为 hello.dispatch.c 的可调度源代码,其中包含以下内容

    // hello.dispatch.c
    /*@targets baseline sse42 avx512f */
    #include <stdio.h>
    #include "numpy/utils.h" // NPY_CAT, NPY_TOSTR
    
    #ifndef NPY__CPU_TARGET_CURRENT
      // wrapping the dispatch-able source only happens to the additional optimizations
      // but if the keyword 'baseline' provided within the configuration statements,
      // the infrastructure will add extra compiling for the dispatch-able source by
      // passing it as-is to the compiler without any changes.
      #define CURRENT_TARGET(X) X
      #define NPY__CPU_TARGET_CURRENT baseline // for printing only
    #else
      // since we reach to this point, that's mean we're dealing with
        // the additional optimizations, so it could be SSE42 or AVX512F
      #define CURRENT_TARGET(X) NPY_CAT(NPY_CAT(X, _), NPY__CPU_TARGET_CURRENT)
    #endif
    // Macro 'CURRENT_TARGET' adding the current target as suffux to the exported symbols,
    // to avoid linking duplications, NumPy already has a macro called
    // 'NPY_CPU_DISPATCH_CURFX' similar to it, located at
    // numpy/numpy/_core/src/common/npy_cpu_dispatch.h
    // NOTE: we tend to not adding suffixes to the baseline exported symbols
    void CURRENT_TARGET(simd_whoami)(const char *extra_info)
    {
        printf("I'm " NPY_TOSTR(NPY__CPU_TARGET_CURRENT) ", %s\n", extra_info);
    }
    

    现在假设您将 **hello.dispatch.c** 附加到源代码树中,那么基础设施应该生成一个名为 **hello.dispatch.h** 的临时配置头文件,可以通过源代码树中的任何源代码访问,它应该包含以下代码

    #ifndef NPY__CPU_DISPATCH_EXPAND_
      // To expand the macro calls in this header
        #define NPY__CPU_DISPATCH_EXPAND_(X) X
    #endif
    // Undefining the following macros, due to the possibility of including config headers
    // multiple times within the same source and since each config header represents
    // different required optimizations according to the specified configuration
    // statements in the dispatch-able source that derived from it.
    #undef NPY__CPU_DISPATCH_BASELINE_CALL
    #undef NPY__CPU_DISPATCH_CALL
    // nothing strange here, just a normal preprocessor callback
    // enabled only if 'baseline' specified within the configuration statements
    #define NPY__CPU_DISPATCH_BASELINE_CALL(CB, ...) \
      NPY__CPU_DISPATCH_EXPAND_(CB(__VA_ARGS__))
    // 'NPY__CPU_DISPATCH_CALL' is an abstract macro is used for dispatching
    // the required optimizations that specified within the configuration statements.
    //
    // @param CHK, Expected a macro that can be used to detect CPU features
    // in runtime, which takes a CPU feature name without string quotes and
    // returns the testing result in a shape of boolean value.
    // NumPy already has macro called "NPY_CPU_HAVE", which fits this requirement.
    //
    // @param CB, a callback macro that expected to be called multiple times depending
    // on the required optimizations, the callback should receive the following arguments:
    //  1- The pending calls of @param CHK filled up with the required CPU features,
    //     that need to be tested first in runtime before executing call belong to
    //     the compiled object.
    //  2- The required optimization name, same as in 'NPY__CPU_TARGET_CURRENT'
    //  3- Extra arguments in the macro itself
    //
    // By default the callback calls are sorted depending on the highest interest
    // unless the policy "$keep_sort" was in place within the configuration statements
    // see "Dive into the CPU dispatcher" for more clarification.
    #define NPY__CPU_DISPATCH_CALL(CHK, CB, ...) \
      NPY__CPU_DISPATCH_EXPAND_(CB((CHK(AVX512F)), AVX512F, __VA_ARGS__)) \
      NPY__CPU_DISPATCH_EXPAND_(CB((CHK(SSE)&&CHK(SSE2)&&CHK(SSE3)&&CHK(SSSE3)&&CHK(SSE41)), SSE41, __VA_ARGS__))
    

    以上内容中关于配置头文件的用法示例

    // NOTE: The following macros are only defined for demonstration purposes only.
    // NumPy already has a collections of macros located at
    // numpy/numpy/_core/src/common/npy_cpu_dispatch.h, that covers all dispatching
    // and declarations scenarios.
    
    #include "numpy/npy_cpu_features.h" // NPY_CPU_HAVE
    #include "numpy/utils.h" // NPY_CAT, NPY_EXPAND
    
    // An example for setting a macro that calls all the exported symbols at once
    // after checking if they're supported by the running machine.
    #define DISPATCH_CALL_ALL(FN, ARGS) \
        NPY__CPU_DISPATCH_CALL(NPY_CPU_HAVE, DISPATCH_CALL_ALL_CB, FN, ARGS) \
        NPY__CPU_DISPATCH_BASELINE_CALL(DISPATCH_CALL_BASELINE_ALL_CB, FN, ARGS)
    // The preprocessor callbacks.
    // The same suffixes as we define it in the dispatch-able source.
    #define DISPATCH_CALL_ALL_CB(CHECK, TARGET_NAME, FN, ARGS) \
      if (CHECK) { NPY_CAT(NPY_CAT(FN, _), TARGET_NAME) ARGS; }
    #define DISPATCH_CALL_BASELINE_ALL_CB(FN, ARGS) \
      FN NPY_EXPAND(ARGS);
    
    // An example for setting a macro that calls the exported symbols of highest
    // interest optimization, after checking if they're supported by the running machine.
    #define DISPATCH_CALL_HIGH(FN, ARGS) \
      if (0) {} \
        NPY__CPU_DISPATCH_CALL(NPY_CPU_HAVE, DISPATCH_CALL_HIGH_CB, FN, ARGS) \
        NPY__CPU_DISPATCH_BASELINE_CALL(DISPATCH_CALL_BASELINE_HIGH_CB, FN, ARGS)
    // The preprocessor callbacks
    // The same suffixes as we define it in the dispatch-able source.
    #define DISPATCH_CALL_HIGH_CB(CHECK, TARGET_NAME, FN, ARGS) \
      else if (CHECK) { NPY_CAT(NPY_CAT(FN, _), TARGET_NAME) ARGS; }
    #define DISPATCH_CALL_BASELINE_HIGH_CB(FN, ARGS) \
      else { FN NPY_EXPAND(ARGS); }
    
    // NumPy has a macro called 'NPY_CPU_DISPATCH_DECLARE' can be used
    // for forward declarations any kind of prototypes based on
    // 'NPY__CPU_DISPATCH_CALL' and 'NPY__CPU_DISPATCH_BASELINE_CALL'.
    // However in this example, we just handle it manually.
    void simd_whoami(const char *extra_info);
    void simd_whoami_AVX512F(const char *extra_info);
    void simd_whoami_SSE41(const char *extra_info);
    
    void trigger_me(void)
    {
        // bring the auto-generated config header
        // which contains config macros 'NPY__CPU_DISPATCH_CALL' and
        // 'NPY__CPU_DISPATCH_BASELINE_CALL'.
        // it is highly recommended to include the config header before executing
      // the dispatching macros in case if there's another header in the scope.
        #include "hello.dispatch.h"
        DISPATCH_CALL_ALL(simd_whoami, ("all"))
        DISPATCH_CALL_HIGH(simd_whoami, ("the highest interest"))
        // An example of including multiple config headers in the same source
        // #include "hello2.dispatch.h"
        // DISPATCH_CALL_HIGH(another_function, ("the highest interest"))
    }