CPU 调度器如何工作?#
NumPy 调度器基于多源编译,这意味着从某个源代码开始,使用不同的编译器标志以及不同的 **C** 定义进行多次编译,这些定义会影响代码路径。这使每个编译后的对象能够根据所需的优化使用特定的指令集,最后将返回的对象链接在一起。
这种机制应该支持所有编译器,不需要任何特定于编译器的扩展,但同时它在正常编译中添加了一些步骤,这些步骤解释如下。
1- 配置#
在开始通过上面解释的两个命令参数构建源文件之前,用户配置所需的优化。
--cpu-baseline
: 最小必需的优化集。--cpu-dispatch
: 调度后的额外优化集。
2- 发现环境#
在此部分,我们检查编译器和平台架构,并缓存一些中间结果以加快重建速度。
3- 验证请求的优化#
通过针对编译器测试它们,并查看编译器根据请求的优化可以支持什么。
4- 生成主配置头文件#
生成的头部文件 _cpu_dispatch.h
包含所有指令集的定义和头部文件,这些指令集对应于在上一步中验证的所需优化。
它还包含额外的 C 定义,用于定义 NumPy 的 Python 级模块属性 __cpu_baseline__
和 __cpu_dispatch__
。
该头文件包含什么?
该示例头文件是由 gcc 在 X86 机器上动态生成的。编译器支持 --cpu-baseline="sse sse2 sse3"
和 --cpu-dispatch="ssse3 sse41"
,结果如下。
// The header should be located at numpy/numpy/_core/src/common/_cpu_dispatch.h
/**NOTE
** C definitions prefixed with "NPY_HAVE_" represent
** the required optimizations.
**
** C definitions prefixed with 'NPY__CPU_TARGET_' are protected and
** shouldn't be used by any NumPy C sources.
*/
/******* baseline features *******/
/** SSE **/
#define NPY_HAVE_SSE 1
#include <xmmintrin.h>
/** SSE2 **/
#define NPY_HAVE_SSE2 1
#include <emmintrin.h>
/** SSE3 **/
#define NPY_HAVE_SSE3 1
#include <pmmintrin.h>
/******* dispatch-able features *******/
#ifdef NPY__CPU_TARGET_SSSE3
/** SSSE3 **/
#define NPY_HAVE_SSSE3 1
#include <tmmintrin.h>
#endif
#ifdef NPY__CPU_TARGET_SSE41
/** SSE41 **/
#define NPY_HAVE_SSE41 1
#include <smmintrin.h>
#endif
基线特性 是通过 --cpu-baseline
配置的最小必需优化集。它们没有预处理器保护,始终处于启用状态,这意味着它们可以在任何源文件中使用。
这意味着 NumPy 的基础设施将基线特性的编译器标志传递给所有源代码吗?
绝对正确。但是,可调度源代码 的处理方式不同。
如果用户在构建过程中指定了某些 **基线特性**,但在运行时机器不支持这些特性,那么编译后的代码将通过这些定义中的一个进行调用,或者编译器本身会根据提供的命令行编译器标志自动生成/矢量化某些代码段吗?
在加载 NumPy 模块期间,有一个验证步骤来检测这种行为。它将引发 Python 运行时错误以告知用户。这是为了防止 CPU 遇到非法指令错误导致段错误。
可调度特性 是我们通过 --cpu-dispatch
配置的额外优化调度集。它们默认情况下不会被激活,并且始终受以 NPY__CPU_TARGET_
为前缀的其它 C 定义的保护。C 定义 NPY__CPU_TARGET_
仅在 **可调度源代码** 中启用。
5- 可调度源代码和配置语句#
可调度源代码是特殊的 **C** 文件,可以多次使用不同的编译器标志以及不同的 **C** 定义进行编译。这些定义会影响代码路径,以便为每个编译后的对象根据“**配置语句**”启用特定的指令集,这些语句必须在 **C** 注释 (/**/)
之间声明,并在每个可调度源代码的顶部以特殊标记 **@targets** 开头。同时,如果优化被命令参数 --disable-optimization
禁用,则可调度源代码将被视为正常的 **C** 源代码。
什么是配置语句?
配置语句是一种组合在一起的关键字,用于确定可调度源代码所需的优化。
示例
/*@targets avx2 avx512f vsx2 vsx3 asimd asimdhp */
// C code
这些关键字主要代表通过 --cpu-dispatch
配置的额外优化,但它们也可以代表其他选项,例如
目标组:用于管理从可调度源代码外部所需的优化的预配置配置语句。
策略:用于更改默认行为或强制编译器执行某些操作的选项集合。
“baseline”:一个唯一的关键字代表通过
--cpu-baseline
配置的最小优化。
Numpy 的基础设施通过四个步骤处理可调度源代码:
(A) 识别:与源代码模板和 F2PY 一样,可调度源代码需要特殊的扩展名
*.dispatch.c
来标记 C 可调度源文件,对于 C++,则是*.dispatch.cpp
或*.dispatch.cxx
**注意**:C++ 尚未支持。(B) 解析和验证:在此步骤中,将逐个解析和验证先前步骤中过滤出的可调度源代码,以确定每个源代码所需的优化。
(C) 包装:这是 NumPy 基础设施采用的方法,它已被证明足够灵活,可以多次使用不同的 **C** 定义和标志编译单个源代码,这些定义和标志会影响代码路径。该过程是通过为每个与额外优化相关的所需优化创建一个临时 **C** 源代码来实现的,该源代码包含 **C** 定义的声明,并通过 **C** 指令 **#include** 包含相关源代码。为了更好地理解,请查看以下关于 AVX512F 的代码
/* * this definition is used by NumPy utilities as suffixes for the * exported symbols */ #define NPY__CPU_TARGET_CURRENT AVX512F /* * The following definitions enable * definitions of the dispatch-able features that are defined within the main * configuration header. These are definitions for the implied features. */ #define NPY__CPU_TARGET_SSE #define NPY__CPU_TARGET_SSE2 #define NPY__CPU_TARGET_SSE3 #define NPY__CPU_TARGET_SSSE3 #define NPY__CPU_TARGET_SSE41 #define NPY__CPU_TARGET_POPCNT #define NPY__CPU_TARGET_SSE42 #define NPY__CPU_TARGET_AVX #define NPY__CPU_TARGET_F16C #define NPY__CPU_TARGET_FMA3 #define NPY__CPU_TARGET_AVX2 #define NPY__CPU_TARGET_AVX512F // our dispatch-able source #include "/the/absuolate/path/of/hello.dispatch.c"
(D) 可调度配置头文件:基础设施为每个可调度源代码生成一个配置头文件,该头文件主要包含两个抽象的 **C** 宏,用于识别生成的代码对象,以便它们可以用于在运行时从生成的代码对象调度某些符号,方法是使用任何 **C** 源代码。它也用于前向声明。
生成的头部文件采用可调度源代码的名称,删除扩展名,并将其替换为
.h
,例如假设我们有一个名为hello.dispatch.c
的可调度源代码,其中包含以下内容// hello.dispatch.c /*@targets baseline sse42 avx512f */ #include <stdio.h> #include "numpy/utils.h" // NPY_CAT, NPY_TOSTR #ifndef NPY__CPU_TARGET_CURRENT // wrapping the dispatch-able source only happens to the additional optimizations // but if the keyword 'baseline' provided within the configuration statements, // the infrastructure will add extra compiling for the dispatch-able source by // passing it as-is to the compiler without any changes. #define CURRENT_TARGET(X) X #define NPY__CPU_TARGET_CURRENT baseline // for printing only #else // since we reach to this point, that's mean we're dealing with // the additional optimizations, so it could be SSE42 or AVX512F #define CURRENT_TARGET(X) NPY_CAT(NPY_CAT(X, _), NPY__CPU_TARGET_CURRENT) #endif // Macro 'CURRENT_TARGET' adding the current target as suffux to the exported symbols, // to avoid linking duplications, NumPy already has a macro called // 'NPY_CPU_DISPATCH_CURFX' similar to it, located at // numpy/numpy/_core/src/common/npy_cpu_dispatch.h // NOTE: we tend to not adding suffixes to the baseline exported symbols void CURRENT_TARGET(simd_whoami)(const char *extra_info) { printf("I'm " NPY_TOSTR(NPY__CPU_TARGET_CURRENT) ", %s\n", extra_info); }
现在假设您将 **hello.dispatch.c** 附加到源代码树中,那么基础设施应该生成一个名为 **hello.dispatch.h** 的临时配置头文件,可以通过源代码树中的任何源代码访问,它应该包含以下代码
#ifndef NPY__CPU_DISPATCH_EXPAND_ // To expand the macro calls in this header #define NPY__CPU_DISPATCH_EXPAND_(X) X #endif // Undefining the following macros, due to the possibility of including config headers // multiple times within the same source and since each config header represents // different required optimizations according to the specified configuration // statements in the dispatch-able source that derived from it. #undef NPY__CPU_DISPATCH_BASELINE_CALL #undef NPY__CPU_DISPATCH_CALL // nothing strange here, just a normal preprocessor callback // enabled only if 'baseline' specified within the configuration statements #define NPY__CPU_DISPATCH_BASELINE_CALL(CB, ...) \ NPY__CPU_DISPATCH_EXPAND_(CB(__VA_ARGS__)) // 'NPY__CPU_DISPATCH_CALL' is an abstract macro is used for dispatching // the required optimizations that specified within the configuration statements. // // @param CHK, Expected a macro that can be used to detect CPU features // in runtime, which takes a CPU feature name without string quotes and // returns the testing result in a shape of boolean value. // NumPy already has macro called "NPY_CPU_HAVE", which fits this requirement. // // @param CB, a callback macro that expected to be called multiple times depending // on the required optimizations, the callback should receive the following arguments: // 1- The pending calls of @param CHK filled up with the required CPU features, // that need to be tested first in runtime before executing call belong to // the compiled object. // 2- The required optimization name, same as in 'NPY__CPU_TARGET_CURRENT' // 3- Extra arguments in the macro itself // // By default the callback calls are sorted depending on the highest interest // unless the policy "$keep_sort" was in place within the configuration statements // see "Dive into the CPU dispatcher" for more clarification. #define NPY__CPU_DISPATCH_CALL(CHK, CB, ...) \ NPY__CPU_DISPATCH_EXPAND_(CB((CHK(AVX512F)), AVX512F, __VA_ARGS__)) \ NPY__CPU_DISPATCH_EXPAND_(CB((CHK(SSE)&&CHK(SSE2)&&CHK(SSE3)&&CHK(SSSE3)&&CHK(SSE41)), SSE41, __VA_ARGS__))
以上内容中关于配置头文件的用法示例
// NOTE: The following macros are only defined for demonstration purposes only. // NumPy already has a collections of macros located at // numpy/numpy/_core/src/common/npy_cpu_dispatch.h, that covers all dispatching // and declarations scenarios. #include "numpy/npy_cpu_features.h" // NPY_CPU_HAVE #include "numpy/utils.h" // NPY_CAT, NPY_EXPAND // An example for setting a macro that calls all the exported symbols at once // after checking if they're supported by the running machine. #define DISPATCH_CALL_ALL(FN, ARGS) \ NPY__CPU_DISPATCH_CALL(NPY_CPU_HAVE, DISPATCH_CALL_ALL_CB, FN, ARGS) \ NPY__CPU_DISPATCH_BASELINE_CALL(DISPATCH_CALL_BASELINE_ALL_CB, FN, ARGS) // The preprocessor callbacks. // The same suffixes as we define it in the dispatch-able source. #define DISPATCH_CALL_ALL_CB(CHECK, TARGET_NAME, FN, ARGS) \ if (CHECK) { NPY_CAT(NPY_CAT(FN, _), TARGET_NAME) ARGS; } #define DISPATCH_CALL_BASELINE_ALL_CB(FN, ARGS) \ FN NPY_EXPAND(ARGS); // An example for setting a macro that calls the exported symbols of highest // interest optimization, after checking if they're supported by the running machine. #define DISPATCH_CALL_HIGH(FN, ARGS) \ if (0) {} \ NPY__CPU_DISPATCH_CALL(NPY_CPU_HAVE, DISPATCH_CALL_HIGH_CB, FN, ARGS) \ NPY__CPU_DISPATCH_BASELINE_CALL(DISPATCH_CALL_BASELINE_HIGH_CB, FN, ARGS) // The preprocessor callbacks // The same suffixes as we define it in the dispatch-able source. #define DISPATCH_CALL_HIGH_CB(CHECK, TARGET_NAME, FN, ARGS) \ else if (CHECK) { NPY_CAT(NPY_CAT(FN, _), TARGET_NAME) ARGS; } #define DISPATCH_CALL_BASELINE_HIGH_CB(FN, ARGS) \ else { FN NPY_EXPAND(ARGS); } // NumPy has a macro called 'NPY_CPU_DISPATCH_DECLARE' can be used // for forward declarations any kind of prototypes based on // 'NPY__CPU_DISPATCH_CALL' and 'NPY__CPU_DISPATCH_BASELINE_CALL'. // However in this example, we just handle it manually. void simd_whoami(const char *extra_info); void simd_whoami_AVX512F(const char *extra_info); void simd_whoami_SSE41(const char *extra_info); void trigger_me(void) { // bring the auto-generated config header // which contains config macros 'NPY__CPU_DISPATCH_CALL' and // 'NPY__CPU_DISPATCH_BASELINE_CALL'. // it is highly recommended to include the config header before executing // the dispatching macros in case if there's another header in the scope. #include "hello.dispatch.h" DISPATCH_CALL_ALL(simd_whoami, ("all")) DISPATCH_CALL_HIGH(simd_whoami, ("the highest interest")) // An example of including multiple config headers in the same source // #include "hello2.dispatch.h" // DISPATCH_CALL_HIGH(another_function, ("the highest interest")) }