CPU 调度程序如何工作？#

NumPy 调度程序基于多源编译，这意味着采用特定源代码并使用不同的编译器标志以及影响代码路径的不同 **C** 定义对其进行多次编译。这使得每个编译对象的指令集能够根据所需的优化进行调整，最终将返回的对象链接在一起。

此机制应该支持所有编译器，并且不需要任何编译器特定的扩展，但同时它在正常的编译过程中增加了一些步骤，如下所述。

1- 配置#

如上所述，在开始通过两个命令参数构建源文件之前，由用户配置所需的优化。

--cpu-baseline：所需的最小优化集。
--cpu-dispatch：分派的附加优化集。

2- 发现环境#

在此部分中，我们检查编译器和平台架构，并缓存一些中间结果以加快重建速度。

3- 验证请求的优化#

通过针对编译器测试它们，并查看根据请求的优化编译器可以支持什么。

4- 生成主要的配置头文件#

生成的标题 _cpu_dispatch.h 包含在上一步骤中已验证的所需优化的指令集的所有定义和头文件。

它还包含用于定义 NumPy 的 Python 级别模块属性 __cpu_baseline__ 和 __cpu_dispatch__ 的额外 C 定义。

此头文件中有什么？

示例头文件是由 gcc 在 X86 机器上动态生成的。编译器支持 --cpu-baseline="sse sse2 sse3" 和 --cpu-dispatch="ssse3 sse41"，结果如下。

// The header should be located at numpy/numpy/_core/src/common/_cpu_dispatch.h
/**NOTE
 ** C definitions prefixed with "NPY_HAVE_" represent
 ** the required optimizations.
 **
 ** C definitions prefixed with 'NPY__CPU_TARGET_' are protected and
 ** shouldn't be used by any NumPy C sources.
 */
/******* baseline features *******/
/** SSE **/
#define NPY_HAVE_SSE 1
#include <xmmintrin.h>
/** SSE2 **/
#define NPY_HAVE_SSE2 1
#include <emmintrin.h>
/** SSE3 **/
#define NPY_HAVE_SSE3 1
#include <pmmintrin.h>

/******* dispatch-able features *******/
#ifdef NPY__CPU_TARGET_SSSE3
  /** SSSE3 **/
  #define NPY_HAVE_SSSE3 1
  #include <tmmintrin.h>
#endif
#ifdef NPY__CPU_TARGET_SSE41
  /** SSE41 **/
  #define NPY_HAVE_SSE41 1
  #include <smmintrin.h>
#endif

**基线特性** 是通过 --cpu-baseline 配置的所需最小优化集。它们没有预处理器保护，并且始终处于启用状态，这意味着它们可以在任何源代码中使用。

这意味着 NumPy 的基础架构会将基线特性的编译器标志传递给所有源代码吗？

是的，绝对如此。但是可调度源代码的处理方式不同。

如果用户在构建期间指定某些 **基线特性**，但在运行时机器甚至不支持这些特性，会调用通过这些定义之一编译的代码吗？或者编译器本身是否会根据提供的命令行编译器标志自动生成/矢量化某些代码？

在加载 NumPy 模块期间，有一个验证步骤可以检测此行为。它将引发 Python 运行时错误以通知用户。这是为了防止 CPU 出现非法指令错误，从而导致段错误。

**可调度特性** 是我们通过 --cpu-dispatch 配置的分派的附加优化集。它们默认情况下未启用，并且始终受以 NPY__CPU_TARGET_ 为前缀的其他 C 定义保护。C 定义 NPY__CPU_TARGET_ 仅在 **可调度源代码** 中启用。

5- 可调度源代码和配置语句#

可调度源代码是特殊的 **C** 文件，可以使用不同的编译器标志以及不同的 **C** 定义进行多次编译。这些会影响代码路径，以便根据必须在 **C** 注释 (/**/) 之间声明的“**配置语句**”为每个编译对象启用某些指令集，并且必须在每个可调度源代码的顶部以特殊标记 **@targets** 开头。同时，如果优化被命令参数 --disable-optimization 禁用，则可调度源代码将被视为正常的 **C** 源代码。

什么是配置语句？

配置语句是一种组合在一起的关键字，用于确定可调度源代码所需的优化。

示例

/*@targets avx2 avx512f vsx2 vsx3 asimd asimdhp */
// C code

关键字主要代表通过 --cpu-dispatch 配置的附加优化，但它也可以代表其他选项，例如

目标组：用于从可调度源代码外部管理所需优化的预配置配置语句。
策略：用于更改默认行为或强制编译器执行某些操作的选项集合。
“baseline”：一个唯一的关键字，代表通过 --cpu-baseline 配置的最小优化。

NumPy 的基础架构分四个步骤处理可调度源代码:

**(A) 识别：** 与源代码模板和 F2PY 一样，可调度源代码需要一个特殊的扩展名 *.dispatch.c 来标记 C 可调度源文件，对于 C++ 则为 *.dispatch.cpp 或 *.dispatch.cxx **注意：** C++ 尚未支持。
**(B) 解析和验证：** 在此步骤中，将逐个解析和验证先前步骤已过滤的可调度源代码的配置语句，以确定所需的优化。

**(C) 包装：** 这是 NumPy 基础架构采用的方法，它已被证明足够灵活，可以多次使用不同的 **C** 定义和标志编译单个源代码，这些标志会影响代码路径。该过程是通过为与附加优化相关的每个所需优化创建一个临时 **C** 源代码来实现的，其中包含 **C** 定义的声明，并通过 **C** 指令 **#include** 包含相关的源代码。有关更多说明，请查看以下 AVX512F 代码

/*
 * this definition is used by NumPy utilities as suffixes for the
 * exported symbols
 */
#define NPY__CPU_TARGET_CURRENT AVX512F
/*
 * The following definitions enable
 * definitions of the dispatch-able features that are defined within the main
 * configuration header. These are definitions for the implied features.
 */
#define NPY__CPU_TARGET_SSE
#define NPY__CPU_TARGET_SSE2
#define NPY__CPU_TARGET_SSE3
#define NPY__CPU_TARGET_SSSE3
#define NPY__CPU_TARGET_SSE41
#define NPY__CPU_TARGET_POPCNT
#define NPY__CPU_TARGET_SSE42
#define NPY__CPU_TARGET_AVX
#define NPY__CPU_TARGET_F16C
#define NPY__CPU_TARGET_FMA3
#define NPY__CPU_TARGET_AVX2
#define NPY__CPU_TARGET_AVX512F
// our dispatch-able source
#include "/the/absolute/path/of/hello.dispatch.c"

**(D) 可调度配置头文件：** 基础架构为每个可调度源代码生成一个配置头文件，此头文件主要包含两个用于标识生成对象的抽象 **C** 宏，因此任何 **C** 源代码都可以使用它们在运行时从生成的对象调度某些符号。它也用于前向声明。

生成的标题采用可调度源代码的名称（排除扩展名并将其替换为 .h），例如，假设我们有一个名为 hello.dispatch.c 的可调度源代码，其中包含以下内容

// hello.dispatch.c
/*@targets baseline sse42 avx512f */
#include <stdio.h>
#include "numpy/utils.h" // NPY_CAT, NPY_TOSTR

#ifndef NPY__CPU_TARGET_CURRENT
  // wrapping the dispatch-able source only happens to the additional optimizations
  // but if the keyword 'baseline' provided within the configuration statements,
  // the infrastructure will add extra compiling for the dispatch-able source by
  // passing it as-is to the compiler without any changes.
  #define CURRENT_TARGET(X) X
  #define NPY__CPU_TARGET_CURRENT baseline // for printing only
#else
  // since we reach to this point, that's mean we're dealing with
    // the additional optimizations, so it could be SSE42 or AVX512F
  #define CURRENT_TARGET(X) NPY_CAT(NPY_CAT(X, _), NPY__CPU_TARGET_CURRENT)
#endif
// Macro 'CURRENT_TARGET' adding the current target as suffix to the exported symbols,
// to avoid linking duplications, NumPy already has a macro called
// 'NPY_CPU_DISPATCH_CURFX' similar to it, located at
// numpy/numpy/_core/src/common/npy_cpu_dispatch.h
// NOTE: we tend to not adding suffixes to the baseline exported symbols
void CURRENT_TARGET(simd_whoami)(const char *extra_info)
{
    printf("I'm " NPY_TOSTR(NPY__CPU_TARGET_CURRENT) ", %s\n", extra_info);
}

现在假设您将 **hello.dispatch.c** 附加到源代码树，则基础架构应生成一个名为 **hello.dispatch.h** 的临时配置头文件，任何源代码树中的源代码都可以访问它，并且它应包含以下代码

#ifndef NPY__CPU_DISPATCH_EXPAND_
  // To expand the macro calls in this header
    #define NPY__CPU_DISPATCH_EXPAND_(X) X
#endif
// Undefining the following macros, due to the possibility of including config headers
// multiple times within the same source and since each config header represents
// different required optimizations according to the specified configuration
// statements in the dispatch-able source that derived from it.
#undef NPY__CPU_DISPATCH_BASELINE_CALL
#undef NPY__CPU_DISPATCH_CALL
// nothing strange here, just a normal preprocessor callback
// enabled only if 'baseline' specified within the configuration statements
#define NPY__CPU_DISPATCH_BASELINE_CALL(CB, ...) \
  NPY__CPU_DISPATCH_EXPAND_(CB(__VA_ARGS__))
// 'NPY__CPU_DISPATCH_CALL' is an abstract macro is used for dispatching
// the required optimizations that specified within the configuration statements.
//
// @param CHK, Expected a macro that can be used to detect CPU features
// in runtime, which takes a CPU feature name without string quotes and
// returns the testing result in a shape of boolean value.
// NumPy already has macro called "NPY_CPU_HAVE", which fits this requirement.
//
// @param CB, a callback macro that expected to be called multiple times depending
// on the required optimizations, the callback should receive the following arguments:
//  1- The pending calls of @param CHK filled up with the required CPU features,
//     that need to be tested first in runtime before executing call belong to
//     the compiled object.
//  2- The required optimization name, same as in 'NPY__CPU_TARGET_CURRENT'
//  3- Extra arguments in the macro itself
//
// By default the callback calls are sorted depending on the highest interest
// unless the policy "$keep_sort" was in place within the configuration statements
// see "Dive into the CPU dispatcher" for more clarification.
#define NPY__CPU_DISPATCH_CALL(CHK, CB, ...) \
  NPY__CPU_DISPATCH_EXPAND_(CB((CHK(AVX512F)), AVX512F, __VA_ARGS__)) \
  NPY__CPU_DISPATCH_EXPAND_(CB((CHK(SSE)&&CHK(SSE2)&&CHK(SSE3)&&CHK(SSSE3)&&CHK(SSE41)), SSE41, __VA_ARGS__))

根据上述内容使用配置头文件的示例

// NOTE: The following macros are only defined for demonstration purposes only.
// NumPy already has a collections of macros located at
// numpy/numpy/_core/src/common/npy_cpu_dispatch.h, that covers all dispatching
// and declarations scenarios.

#include "numpy/npy_cpu_features.h" // NPY_CPU_HAVE
#include "numpy/utils.h" // NPY_CAT, NPY_EXPAND

// An example for setting a macro that calls all the exported symbols at once
// after checking if they're supported by the running machine.
#define DISPATCH_CALL_ALL(FN, ARGS) \
    NPY__CPU_DISPATCH_CALL(NPY_CPU_HAVE, DISPATCH_CALL_ALL_CB, FN, ARGS) \
    NPY__CPU_DISPATCH_BASELINE_CALL(DISPATCH_CALL_BASELINE_ALL_CB, FN, ARGS)
// The preprocessor callbacks.
// The same suffixes as we define it in the dispatch-able source.
#define DISPATCH_CALL_ALL_CB(CHECK, TARGET_NAME, FN, ARGS) \
  if (CHECK) { NPY_CAT(NPY_CAT(FN, _), TARGET_NAME) ARGS; }
#define DISPATCH_CALL_BASELINE_ALL_CB(FN, ARGS) \
  FN NPY_EXPAND(ARGS);

// An example for setting a macro that calls the exported symbols of highest
// interest optimization, after checking if they're supported by the running machine.
#define DISPATCH_CALL_HIGH(FN, ARGS) \
  if (0) {} \
    NPY__CPU_DISPATCH_CALL(NPY_CPU_HAVE, DISPATCH_CALL_HIGH_CB, FN, ARGS) \
    NPY__CPU_DISPATCH_BASELINE_CALL(DISPATCH_CALL_BASELINE_HIGH_CB, FN, ARGS)
// The preprocessor callbacks
// The same suffixes as we define it in the dispatch-able source.
#define DISPATCH_CALL_HIGH_CB(CHECK, TARGET_NAME, FN, ARGS) \
  else if (CHECK) { NPY_CAT(NPY_CAT(FN, _), TARGET_NAME) ARGS; }
#define DISPATCH_CALL_BASELINE_HIGH_CB(FN, ARGS) \
  else { FN NPY_EXPAND(ARGS); }

// NumPy has a macro called 'NPY_CPU_DISPATCH_DECLARE' can be used
// for forward declarations any kind of prototypes based on
// 'NPY__CPU_DISPATCH_CALL' and 'NPY__CPU_DISPATCH_BASELINE_CALL'.
// However in this example, we just handle it manually.
void simd_whoami(const char *extra_info);
void simd_whoami_AVX512F(const char *extra_info);
void simd_whoami_SSE41(const char *extra_info);

void trigger_me(void)
{
    // bring the auto-generated config header
    // which contains config macros 'NPY__CPU_DISPATCH_CALL' and
    // 'NPY__CPU_DISPATCH_BASELINE_CALL'.
    // it is highly recommended to include the config header before executing
  // the dispatching macros in case if there's another header in the scope.
    #include "hello.dispatch.h"
    DISPATCH_CALL_ALL(simd_whoami, ("all"))
    DISPATCH_CALL_HIGH(simd_whoami, ("the highest interest"))
    // An example of including multiple config headers in the same source
    // #include "hello2.dispatch.h"
    // DISPATCH_CALL_HIGH(another_function, ("the highest interest"))
}