Array iterator API#

Array iterator#

The array iterator encapsulates many of the key features in ufuncs, allowing user code to support features like output parameters, preservation of memory layouts, and buffering of data with the wrong alignment or type, without requiring difficult coding.

This page documents the API for the iterator. The iterator is named NpyIter and functions are named NpyIter_*.

There is an introductory guide to array iteration which may be of interest for those using this C API. In many instances, testing out ideas by creating the iterator in Python is a good idea before writing the C iteration code.

Iteration example#

The best way to become familiar with the iterator is to look at its usage within the NumPy codebase itself. For example, here is a slightly tweaked version of the code for PyArray_CountNonzero, which counts the number of non-zero elements in an array.

npy_intp PyArray_CountNonzero(PyArrayObject* self)
{
    /* Nonzero boolean function */
    PyArray_NonzeroFunc* nonzero = PyArray_DESCR(self)->f->nonzero;

    NpyIter* iter;
    NpyIter_IterNextFunc *iternext;
    char** dataptr;
    npy_intp nonzero_count;
    npy_intp* strideptr,* innersizeptr;

    /* Handle zero-sized arrays specially */
    if (PyArray_SIZE(self) == 0) {
        return 0;
    }

    /*
     * Create and use an iterator to count the nonzeros.
     *   flag NPY_ITER_READONLY
     *     - The array is never written to.
     *   flag NPY_ITER_EXTERNAL_LOOP
     *     - Inner loop is done outside the iterator for efficiency.
     *   flag NPY_ITER_NPY_ITER_REFS_OK
     *     - Reference types are acceptable.
     *   order NPY_KEEPORDER
     *     - Visit elements in memory order, regardless of strides.
     *       This is good for performance when the specific order
     *       elements are visited is unimportant.
     *   casting NPY_NO_CASTING
     *     - No casting is required for this operation.
     */
    iter = NpyIter_New(self, NPY_ITER_READONLY|
                             NPY_ITER_EXTERNAL_LOOP|
                             NPY_ITER_REFS_OK,
                        NPY_KEEPORDER, NPY_NO_CASTING,
                        NULL);
    if (iter == NULL) {
        return -1;
    }

    /*
     * The iternext function gets stored in a local variable
     * so it can be called repeatedly in an efficient manner.
     */
    iternext = NpyIter_GetIterNext(iter, NULL);
    if (iternext == NULL) {
        NpyIter_Deallocate(iter);
        return -1;
    }
    /* The location of the data pointer which the iterator may update */
    dataptr = NpyIter_GetDataPtrArray(iter);
    /* The location of the stride which the iterator may update */
    strideptr = NpyIter_GetInnerStrideArray(iter);
    /* The location of the inner loop size which the iterator may update */
    innersizeptr = NpyIter_GetInnerLoopSizePtr(iter);

    nonzero_count = 0;
    do {
        /* Get the inner loop data/stride/count values */
        char* data = *dataptr;
        npy_intp stride = *strideptr;
        npy_intp count = *innersizeptr;

        /* This is a typical inner loop for NPY_ITER_EXTERNAL_LOOP */
        while (count--) {
            if (nonzero(data, self)) {
                ++nonzero_count;
            }
            data += stride;
        }

        /* Increment the iterator to the next inner loop */
    } while(iternext(iter));

    NpyIter_Deallocate(iter);

    return nonzero_count;
}

Multi-iteration example#

Here is a copy function using the iterator. The order parameter is used to control the memory layout of the allocated result, typically NPY_KEEPORDER is desired.

PyObject *CopyArray(PyObject *arr, NPY_ORDER order)
{
    NpyIter *iter;
    NpyIter_IterNextFunc *iternext;
    PyObject *op[2], *ret;
    npy_uint32 flags;
    npy_uint32 op_flags[2];
    npy_intp itemsize, *innersizeptr, innerstride;
    char **dataptrarray;

    /*
     * No inner iteration - inner loop is handled by CopyArray code
     */
    flags = NPY_ITER_EXTERNAL_LOOP;
    /*
     * Tell the constructor to automatically allocate the output.
     * The data type of the output will match that of the input.
     */
    op[0] = arr;
    op[1] = NULL;
    op_flags[0] = NPY_ITER_READONLY;
    op_flags[1] = NPY_ITER_WRITEONLY | NPY_ITER_ALLOCATE;

    /* Construct the iterator */
    iter = NpyIter_MultiNew(2, op, flags, order, NPY_NO_CASTING,
                            op_flags, NULL);
    if (iter == NULL) {
        return NULL;
    }

    /*
     * Make a copy of the iternext function pointer and
     * a few other variables the inner loop needs.
     */
    iternext = NpyIter_GetIterNext(iter, NULL);
    innerstride = NpyIter_GetInnerStrideArray(iter)[0];
    itemsize = NpyIter_GetDescrArray(iter)[0]->elsize;
    /*
     * The inner loop size and data pointers may change during the
     * loop, so just cache the addresses.
     */
    innersizeptr = NpyIter_GetInnerLoopSizePtr(iter);
    dataptrarray = NpyIter_GetDataPtrArray(iter);

    /*
     * Note that because the iterator allocated the output,
     * it matches the iteration order and is packed tightly,
     * so we don't need to check it like the input.
     */
    if (innerstride == itemsize) {
        do {
            memcpy(dataptrarray[1], dataptrarray[0],
                                    itemsize * (*innersizeptr));
        } while (iternext(iter));
    } else {
        /* For efficiency, should specialize this based on item size... */
        npy_intp i;
        do {
            npy_intp size = *innersizeptr;
            char *src = dataptrarray[0], *dst = dataptrarray[1];
            for(i = 0; i < size; i++, src += innerstride, dst += itemsize) {
                memcpy(dst, src, itemsize);
            }
        } while (iternext(iter));
    }

    /* Get the result from the iterator object array */
    ret = NpyIter_GetOperandArray(iter)[1];
    Py_INCREF(ret);

    if (NpyIter_Deallocate(iter) != NPY_SUCCEED) {
        Py_DECREF(ret);
        return NULL;
    }

    return ret;
}

Multi index tracking example#

This example shows you how to work with the NPY_ITER_MULTI_INDEX flag. For simplicity, we assume the argument is a two-dimensional array.

int PrintMultiIndex(PyArrayObject *arr) {
    NpyIter *iter;
    NpyIter_IterNextFunc *iternext;
    npy_intp multi_index[2];

    iter = NpyIter_New(
        arr, NPY_ITER_READONLY | NPY_ITER_MULTI_INDEX | NPY_ITER_REFS_OK,
        NPY_KEEPORDER, NPY_NO_CASTING, NULL);
    if (iter == NULL) {
        return -1;
    }
    if (NpyIter_GetNDim(iter) != 2) {
        NpyIter_Deallocate(iter);
        PyErr_SetString(PyExc_ValueError, "Array must be 2-D");
        return -1;
    }
    if (NpyIter_GetIterSize(iter) != 0) {
        iternext = NpyIter_GetIterNext(iter, NULL);
        if (iternext == NULL) {
            NpyIter_Deallocate(iter);
            return -1;
        }
        NpyIter_GetMultiIndexFunc *get_multi_index =
            NpyIter_GetGetMultiIndex(iter, NULL);
        if (get_multi_index == NULL) {
            NpyIter_Deallocate(iter);
            return -1;
        }

        do {
            get_multi_index(iter, multi_index);
            printf("multi_index is [%" NPY_INTP_FMT ", %" NPY_INTP_FMT "]\n",
                   multi_index[0], multi_index[1]);
        } while (iternext(iter));
    }
    if (!NpyIter_Deallocate(iter)) {
        return -1;
    }
    return 0;
}

When called with a 2x3 array, the above example prints

multi_index is [0, 0]
multi_index is [0, 1]
multi_index is [0, 2]
multi_index is [1, 0]
multi_index is [1, 1]
multi_index is [1, 2]

Iterator data types#

The iterator layout is an internal detail, and user code only sees an incomplete struct.

type NpyIter#

This is an opaque pointer type for the iterator. Access to its contents can only be done through the iterator API.

type NpyIter_Type#

This is the type which exposes the iterator to Python. Currently, no API is exposed which provides access to the values of a Python-created iterator. If an iterator is created in Python, it must be used in Python and vice versa. Such an API will likely be created in a future version.

type NpyIter_IterNextFunc#

This is a function pointer for the iteration loop, returned by NpyIter_GetIterNext.

type NpyIter_GetMultiIndexFunc#

This is a function pointer for getting the current iterator multi-index, returned by NpyIter_GetGetMultiIndex.

Construction and destruction#

NpyIter *NpyIter_New(PyArrayObject *op, npy_uint32 flags, NPY_ORDER order, NPY_CASTING casting, PyArray_Descr *dtype)#

Creates an iterator for the given numpy array object op.

Flags that may be passed in flags are any combination of the global and per-operand flags documented in NpyIter_MultiNew, except for NPY_ITER_ALLOCATE.

Any of the NPY_ORDER enum values may be passed to order. For efficient iteration, NPY_KEEPORDER is the best option, and the other orders enforce the particular iteration pattern.

Any of the NPY_CASTING enum values may be passed to casting. The values include NPY_NO_CASTING, NPY_EQUIV_CASTING, NPY_SAFE_CASTING, NPY_SAME_KIND_CASTING, and NPY_UNSAFE_CASTING. To allow the casts to occur, copying or buffering must also be enabled.

If dtype isn’t NULL, then it requires that data type. If copying is allowed, it will make a temporary copy if the data is castable. If NPY_ITER_UPDATEIFCOPY is enabled, it will also copy the data back with another cast upon iterator destruction.

Returns NULL if there is an error, otherwise returns the allocated iterator.

To make an iterator similar to the old iterator, this should work.

iter = NpyIter_New(op, NPY_ITER_READWRITE,
                    NPY_CORDER, NPY_NO_CASTING, NULL);

If you want to edit an array with aligned double code, but the order doesn’t matter, you would use this.

dtype = PyArray_DescrFromType(NPY_DOUBLE);
iter = NpyIter_New(op, NPY_ITER_READWRITE|
                    NPY_ITER_BUFFERED|
                    NPY_ITER_NBO|
                    NPY_ITER_ALIGNED,
                    NPY_KEEPORDER,
                    NPY_SAME_KIND_CASTING,
                    dtype);
Py_DECREF(dtype);
NpyIter *NpyIter_MultiNew(npy_intp nop, PyArrayObject **op, npy_uint32 flags, NPY_ORDER order, NPY_CASTING casting, npy_uint32 *op_flags, PyArray_Descr **op_dtypes)#

Creates an iterator for broadcasting the nop array objects provided in op, using regular NumPy broadcasting rules.

Any of the NPY_ORDER enum values may be passed to order. For efficient iteration, NPY_KEEPORDER is the best option, and the other orders enforce the particular iteration pattern. When using NPY_KEEPORDER, if you also want to ensure that the iteration is not reversed along an axis, you should pass the flag NPY_ITER_DONT_NEGATE_STRIDES.

Any of the NPY_CASTING enum values may be passed to casting. The values include NPY_NO_CASTING, NPY_EQUIV_CASTING, NPY_SAFE_CASTING, NPY_SAME_KIND_CASTING, and NPY_UNSAFE_CASTING. To allow the casts to occur, copying or buffering must also be enabled.

If op_dtypes isn’t NULL, it specifies a data type or NULL for each op[i].

Returns NULL if there is an error, otherwise returns the allocated iterator.

Flags that may be passed in flags, applying to the whole iterator, are

NPY_ITER_C_INDEX#

Causes the iterator to track a raveled flat index matching C order. This option cannot be used with NPY_ITER_F_INDEX.

NPY_ITER_F_INDEX#

Causes the iterator to track a raveled flat index matching Fortran order. This option cannot be used with NPY_ITER_C_INDEX.

NPY_ITER_MULTI_INDEX#

Causes the iterator to track a multi-index. This prevents the iterator from coalescing axes to produce bigger inner loops. If the loop is also not buffered and no index is being tracked (NpyIter_RemoveAxis can be called), then the iterator size can be -1 to indicate that the iterator is too large. This can happen due to complex broadcasting and will result in errors being created when the setting the iterator range, removing the multi index, or getting the next function. However, it is possible to remove axes again and use the iterator normally if the size is small enough after removal.

NPY_ITER_EXTERNAL_LOOP#

Causes the iterator to skip iteration of the innermost loop, requiring the user of the iterator to handle it.

This flag is incompatible with NPY_ITER_C_INDEX, NPY_ITER_F_INDEX, and NPY_ITER_MULTI_INDEX.

NPY_ITER_DONT_NEGATE_STRIDES#

This only affects the iterator when NPY_KEEPORDER is specified for the order parameter. By default with NPY_KEEPORDER, the iterator reverses axes which have negative strides, so that memory is traversed in a forward direction. This disables this step. Use this flag if you want to use the underlying memory-ordering of the axes, but don’t want an axis reversed. This is the behavior of numpy.ravel(a, order='K'), for instance.

NPY_ITER_COMMON_DTYPE#

Causes the iterator to convert all the operands to a common data type, calculated based on the ufunc type promotion rules. Copying or buffering must be enabled.

If the common data type is known ahead of time, don’t use this flag. Instead, set the requested dtype for all the operands.

NPY_ITER_REFS_OK#

Indicates that arrays with reference types (object arrays or structured arrays containing an object type) may be accepted and used in the iterator. If this flag is enabled, the caller must be sure to check whether NpyIter_IterationNeedsAPI(iter) is true, in which case it may not release the GIL during iteration. If you are working with known dtypes NpyIter_GetTransferFlags is a faster and more precise way to check for whether the iterator needs the API due to buffering.

NPY_ITER_ZEROSIZE_OK#

Indicates that arrays with a size of zero should be permitted. Since the typical iteration loop does not naturally work with zero-sized arrays, you must check that the IterSize is larger than zero before entering the iteration loop. Currently only the operands are checked, not a forced shape.

NPY_ITER_REDUCE_OK#

Permits writeable operands with a dimension with zero stride and size greater than one. Note that such operands must be read/write.

When buffering is enabled, this also switches to a special buffering mode which reduces the loop length as necessary to not trample on values being reduced.

Note that if you want to do a reduction on an automatically allocated output, you must use NpyIter_GetOperandArray to get its reference, then set every value to the reduction unit before doing the iteration loop. In the case of a buffered reduction, this means you must also specify the flag NPY_ITER_DELAY_BUFALLOC, then reset the iterator after initializing the allocated operand to prepare the buffers.

NPY_ITER_RANGED#

Enables support for iteration of sub-ranges of the full iterindex range [0, NpyIter_IterSize(iter)). Use the function NpyIter_ResetToIterIndexRange to specify a range for iteration.

This flag can only be used with NPY_ITER_EXTERNAL_LOOP when NPY_ITER_BUFFERED is enabled. This is because without buffering, the inner loop is always the size of the innermost iteration dimension, and allowing it to get cut up would require special handling, effectively making it more like the buffered version.

NPY_ITER_BUFFERED#

Causes the iterator to store buffering data, and use buffering to satisfy data type, alignment, and byte-order requirements. To buffer an operand, do not specify the NPY_ITER_COPY or NPY_ITER_UPDATEIFCOPY flags, because they will override buffering. Buffering is especially useful for Python code using the iterator, allowing for larger chunks of data at once to amortize the Python interpreter overhead.

If used with NPY_ITER_EXTERNAL_LOOP, the inner loop for the caller may get larger chunks than would be possible without buffering, because of how the strides are laid out.

Note that if an operand is given the flag NPY_ITER_COPY or NPY_ITER_UPDATEIFCOPY, a copy will be made in preference to buffering. Buffering will still occur when the array was broadcast so elements need to be duplicated to get a constant stride.

In normal buffering, the size of each inner loop is equal to the buffer size, or possibly larger if NPY_ITER_GROWINNER is specified. If NPY_ITER_REDUCE_OK is enabled and a reduction occurs, the inner loops may become smaller depending on the structure of the reduction.

NPY_ITER_GROWINNER#

When buffering is enabled, this allows the size of the inner loop to grow when buffering isn’t necessary. This option is best used if you’re doing a straight pass through all the data, rather than anything with small cache-friendly arrays of temporary values for each inner loop.

NPY_ITER_DELAY_BUFALLOC#

When buffering is enabled, this delays allocation of the buffers until NpyIter_Reset or another reset function is called. This flag exists to avoid wasteful copying of buffer data when making multiple copies of a buffered iterator for multi-threaded iteration.

Another use of this flag is for setting up reduction operations. After the iterator is created, and a reduction output is allocated automatically by the iterator (be sure to use READWRITE access), its value may be initialized to the reduction unit. Use NpyIter_GetOperandArray to get the object. Then, call NpyIter_Reset to allocate and fill the buffers with their initial values.

NPY_ITER_COPY_IF_OVERLAP#

If any write operand has overlap with any read operand, eliminate all overlap by making temporary copies (enabling UPDATEIFCOPY for write operands, if necessary). A pair of operands has overlap if there is a memory address that contains data common to both arrays.

Because exact overlap detection has exponential runtime in the number of dimensions, the decision is made based on heuristics, which has false positives (needless copies in unusual cases) but has no false negatives.

If any read/write overlap exists, this flag ensures the result of the operation is the same as if all operands were copied. In cases where copies would need to be made, the result of the computation may be undefined without this flag!

Flags that may be passed in op_flags[i], where 0 <= i < nop

NPY_ITER_READWRITE#
NPY_ITER_READONLY#
NPY_ITER_WRITEONLY#

Indicate how the user of the iterator will read or write to op[i]. Exactly one of these flags must be specified per operand. Using NPY_ITER_READWRITE or NPY_ITER_WRITEONLY for a user-provided operand may trigger WRITEBACKIFCOPY semantics. The data will be written back to the original array when NpyIter_Deallocate is called.

NPY_ITER_COPY#

Allow a copy of op[i] to be made if it does not meet the data type or alignment requirements as specified by the constructor flags and parameters.

NPY_ITER_UPDATEIFCOPY#

Triggers NPY_ITER_COPY, and when an array operand is flagged for writing and is copied, causes the data in a copy to be copied back to op[i] when NpyIter_Deallocate is called.

If the operand is flagged as write-only and a copy is needed, an uninitialized temporary array will be created and then copied to back to op[i] on calling NpyIter_Deallocate, instead of doing the unnecessary copy operation.

NPY_ITER_NBO#
NPY_ITER_ALIGNED#
NPY_ITER_CONTIG#

Causes the iterator to provide data for op[i] that is in native byte order, aligned according to the dtype requirements, contiguous, or any combination.

By default, the iterator produces pointers into the arrays provided, which may be aligned or unaligned, and with any byte order. If copying or buffering is not enabled and the operand data doesn’t satisfy the constraints, an error will be raised.

The contiguous constraint applies only to the inner loop, successive inner loops may have arbitrary pointer changes.

If the requested data type is in non-native byte order, the NBO flag overrides it and the requested data type is converted to be in native byte order.

NPY_ITER_ALLOCATE#

This is for output arrays, and requires that the flag NPY_ITER_WRITEONLY or NPY_ITER_READWRITE be set. If op[i] is NULL, creates a new array with the final broadcast dimensions, and a layout matching the iteration order of the iterator.

When op[i] is NULL, the requested data type op_dtypes[i] may be NULL as well, in which case it is automatically generated from the dtypes of the arrays which are flagged as readable. The rules for generating the dtype are the same is for UFuncs. Of special note is handling of byte order in the selected dtype. If there is exactly one input, the input’s dtype is used as is. Otherwise, if more than one input dtypes are combined together, the output will be in native byte order.

After being allocated with this flag, the caller may retrieve the new array by calling NpyIter_GetOperandArray and getting the i-th object in the returned C array. The caller must call Py_INCREF on it to claim a reference to the array.

NPY_ITER_NO_SUBTYPE#

For use with NPY_ITER_ALLOCATE, this flag disables allocating an array subtype for the output, forcing it to be a straight ndarray.

TODO: Maybe it would be better to introduce a function NpyIter_GetWrappedOutput and remove this flag?

NPY_ITER_NO_BROADCAST#

Ensures that the input or output matches the iteration dimensions exactly.

NPY_ITER_ARRAYMASK#

Indicates that this operand is the mask to use for selecting elements when writing to operands which have the NPY_ITER_WRITEMASKED flag applied to them. Only one operand may have NPY_ITER_ARRAYMASK flag applied to it.

具有此标志的操作数的数据类型应为 NPY_BOOLNPY_MASK,或者字段均为有效掩码 dtype 的结构体 dtype。在后一种情况下,它必须与被 WRITEMASKED 的结构体操作数匹配,因为它为该数组的每个字段指定了一个掩码。

此标志仅影响从缓冲区写回数组。这意味着,如果操作数也是 NPY_ITER_READWRITENPY_ITER_WRITEONLY,则进行迭代的代码可以写入此操作数,以控制哪些元素将保持不变,哪些元素将被修改。当掩码应为输入掩码的组合时,这很有用。

NPY_ITER_WRITEMASKED#

此数组是所有 writemasked 操作数的掩码。代码使用 writemasked 标志,该标志指示只写入选定的 ARRAYMASK 操作数为 True 的元素。一般而言,迭代器不强制执行此操作,遵循此承诺取决于进行迭代的代码。

使用 writemasked 标志且此操作数已缓冲时,这会更改从缓冲区复制到数组的数据方式。会使用带掩码的复制例程,该例程仅复制缓冲区中对应于 ARRAYMASK 操作数中相应元素返回 True 的 writemasked 的元素。

NPY_ITER_OVERLAP_ASSUME_ELEMENTWISE#

在内存重叠检查中,假定启用了 NPY_ITER_OVERLAP_ASSUME_ELEMENTWISE 的操作数仅按迭代器顺序访问。

这使迭代器能够推断数据依赖关系,可能避免不必要的复制。

此标志仅在迭代器上启用了 NPY_ITER_COPY_IF_OVERLAP 时才生效。

NpyIter *NpyIter_AdvancedNew(npy_intp nop, PyArrayObject **op, npy_uint32 flags, NPY_ORDER order, NPY_CASTING casting, npy_uint32 *op_flags, PyArray_Descr **op_dtypes, int oa_ndim, int **op_axes, npy_intp const *itershape, npy_intp buffersize)#

扩展了 NpyIter_MultiNew,提供了多种高级选项,用于更精细地控制广播和缓冲。

如果传递给 oa_ndimop_axesitershapebuffersize 的值是 -1/NULL,则等同于 NpyIter_MultiNew

oa_ndim 不为零或 -1 时,该参数指定将通过自定义广播进行迭代的维度数量。如果提供了该参数,则必须提供 op_axes,并且也可以提供 itershapeop_axes 参数允许您详细控制操作数数组的轴如何匹配和迭代。在 op_axes 中,您必须提供一个指向 oa_ndim 大小的 npy_intp 类型数组的 nop 指针的数组。如果 op_axes 中的某个条目为 NULL,则将应用常规广播规则。 op_axes[j][i] 中存储的是 op[j] 的有效轴,或 -1,表示 newaxis。在每个 op_axes[j] 数组中,轴不能重复。以下示例说明了常规广播如何应用于 3D 数组、2D 数组、1D 数组和标量。

注意:在 NumPy 1.8 之前,oa_ndim == 0 用于指示 op_axesitershape 未使用。这已被弃用,应替换为 -1。对于这种情况,使用 NpyIter_MultiNew 可以实现更好的向后兼容性。

int oa_ndim = 3;               /* # iteration axes */
int op0_axes[] = {0, 1, 2};    /* 3-D operand */
int op1_axes[] = {-1, 0, 1};   /* 2-D operand */
int op2_axes[] = {-1, -1, 0};  /* 1-D operand */
int op3_axes[] = {-1, -1, -1}  /* 0-D (scalar) operand */
int* op_axes[] = {op0_axes, op1_axes, op2_axes, op3_axes};

itershape 参数允许您强制迭代器具有特定的迭代形状。它是一个长度为 oa_ndim 的数组。当某个条目为负数时,其值将根据操作数确定。此参数允许自动分配的输出获得与任何输入维度都不匹配的额外维度。

如果 buffersize 为零,则使用默认缓冲区大小,否则指定要使用的缓冲区大小。建议使用 2 的幂次方(例如 4096 或 8192)作为缓冲区。

Returns NULL if there is an error, otherwise returns the allocated iterator.

NpyIter *NpyIter_Copy(NpyIter *iter)#

创建给定迭代器的副本。此函数主要用于实现数据的多线程迭代。

待办:移至关于多线程迭代的部分。

多线程迭代的推荐方法是首先使用标志 NPY_ITER_EXTERNAL_LOOPNPY_ITER_RANGEDNPY_ITER_BUFFEREDNPY_ITER_DELAY_BUFALLOC,以及可能的 NPY_ITER_GROWINNER 来创建迭代器。为每个线程创建一个此迭代器的副本(第一个迭代器除外)。然后,获取迭代索引范围 [0, NpyIter_GetIterSize(iter)) 并将其拆分为任务,例如使用 TBB 的 parallel_for 循环。当一个线程获得要执行的任务时,它通过调用 NpyIter_ResetToIterIndexRange 来使用其迭代器副本,并遍历整个范围。

在多线程代码或未持有 Python GIL 的代码中使用迭代器时,必须小心,只能调用在此上下文中安全的函数。在没有 Python GIL 的情况下调用 NpyIter_Copy 是不安全的,因为它会增加 Python 引用。通过将 errmsg 参数传递为非 NULL,可以安全地调用 Reset* 和其他一些函数,以便这些函数通过它传递错误,而不是设置 Python 异常。

每个副本都必须调用 NpyIter_Deallocate

int NpyIter_RemoveAxis(NpyIter *iter, int axis)#

删除迭代中的一个轴。这要求迭代器创建时设置了 NPY_ITER_MULTI_INDEX,如果启用了缓冲或正在跟踪索引,则此函数无效。此函数还会将迭代器重置为其初始状态。

这对于设置累加循环等很有用。迭代器可以首先用所有维度(包括累加轴)创建,以便正确创建输出。然后,可以删除累加轴,并以嵌套方式进行计算。

警告:此函数可能会更改迭代器的内部内存布局。任何缓存的函数或指针都必须重新获取!迭代器范围也会重置。

返回 NPY_SUCCEEDNPY_FAIL

int NpyIter_RemoveMultiIndex(NpyIter *iter)#

如果迭代器正在跟踪多索引,此函数将剥离对它们的支持,并执行不需要多索引时可进行的进一步迭代器优化。此函数还会将迭代器重置为其初始状态。

警告:此函数可能会更改迭代器的内部内存布局。任何缓存的函数或指针都必须重新获取!

调用此函数后,NpyIter_HasMultiIndex(iter) 将返回 false。

返回 NPY_SUCCEEDNPY_FAIL

int NpyIter_EnableExternalLoop(NpyIter *iter)#

如果调用了 NpyIter_RemoveMultiIndex,您可能希望启用标志 NPY_ITER_EXTERNAL_LOOP。此标志不允许与 NPY_ITER_MULTI_INDEX 同时使用,因此提供了此函数以在调用 NpyIter_RemoveMultiIndex 后启用该功能。此函数还会将迭代器重置为其初始状态。

警告:此函数会更改迭代器的内部逻辑。任何缓存的函数或指针都必须重新获取!

返回 NPY_SUCCEEDNPY_FAIL

int NpyIter_Deallocate(NpyIter *iter)#

释放迭代器对象并解决所有必要的写回。

返回 NPY_SUCCEEDNPY_FAIL

NPY_ARRAYMETHOD_FLAGS NpyIter_GetTransferFlags(NpyIter *iter)#

版本 2.3 新增。

获取 NPY_METH_RUNTIME_FLAGS,它提供有关缓冲是否需要 Python GIL(NPY_METH_REQUIRES_PYAPI)或是否可能设置浮点数错误(NPY_METH_NO_FLOATINGPOINT_ERRORS)的信息。

在 NumPy 2.3 之前,可用的公共函数是 NpyIter_IterationNeedsAPI,它仍然可用,并且除了检查缓冲/迭代需求本身之外,还会检查对象(或类似)dtype。一般而言,应优先使用此函数。

int NpyIter_Reset(NpyIter *iter, char **errmsg)#

将迭代器重置回其初始状态,位于迭代范围的开头。

返回 NPY_SUCCEEDNPY_FAIL。如果 errmsg 非 NULL,则在返回 NPY_FAIL 时不会设置 Python 异常。而是将 *errmsg 设置为错误消息。当 errmsg 非 NULL 时,可以安全地调用该函数而无需持有 Python GIL。

int NpyIter_ResetToIterIndexRange(NpyIter *iter, npy_intp istart, npy_intp iend, char **errmsg)#

重置迭代器,并将其限制在 iterindex 范围 [istart, iend) 内。有关如何将此用于多线程迭代的说明,请参见 NpyIter_Copy。这要求将 NPY_ITER_RANGED 标志传递给迭代器构造函数。

如果您想同时重置 iterindex 范围和基本指针,您可以执行以下操作以避免额外的缓冲区复制(复制此代码时请务必添加返回码错误检查)。

/* Set to a trivial empty range */
NpyIter_ResetToIterIndexRange(iter, 0, 0);
/* Set the base pointers */
NpyIter_ResetBasePointers(iter, baseptrs);
/* Set to the desired range */
NpyIter_ResetToIterIndexRange(iter, istart, iend);

返回 NPY_SUCCEEDNPY_FAIL。如果 errmsg 非 NULL,则在返回 NPY_FAIL 时不会设置 Python 异常。而是将 *errmsg 设置为错误消息。当 errmsg 非 NULL 时,可以安全地调用该函数而无需持有 Python GIL。

int NpyIter_ResetBasePointers(NpyIter *iter, char **baseptrs, char **errmsg)#

将迭代器重置回其初始状态,但使用 baseptrs 中的值作为数据,而不是被迭代数组的指针。此函数旨在与 op_axes 参数一起,由具有两个或多个迭代器的嵌套迭代代码使用。

返回 NPY_SUCCEEDNPY_FAIL。如果 errmsg 非 NULL,则在返回 NPY_FAIL 时不会设置 Python 异常。而是将 *errmsg 设置为错误消息。当 errmsg 非 NULL 时,可以安全地调用该函数而无需持有 Python GIL。

待办:将以下内容移至关于嵌套迭代器的特殊部分。

创建嵌套迭代器的迭代器需要一些技巧。所有迭代器操作数必须完全匹配,否则调用 NpyIter_ResetBasePointers 将无效。这意味着不应随意使用自动复制和输出分配。仍然可以通过创建一个启用了所有转换参数的迭代器来使用迭代器的自动数据转换和转换功能,然后使用 NpyIter_GetOperandArray 函数获取已分配的操作数,并将它们传递给其余迭代器的构造函数。

警告:在创建嵌套迭代器的迭代器时,代码不得在不同的迭代器中多次使用同一维度。如果这样做,嵌套迭代将在迭代期间产生越界指针。

警告:在创建嵌套迭代器的迭代器时,缓冲只能应用于最内层的迭代器。如果使用已缓冲的迭代器作为 baseptrs 的源,它将指向一个小缓冲区而不是数组,内部迭代将无效。

使用嵌套迭代器的模式如下。

NpyIter *iter1, *iter1;
NpyIter_IterNextFunc *iternext1, *iternext2;
char **dataptrs1;

/*
 * With the exact same operands, no copies allowed, and
 * no axis in op_axes used both in iter1 and iter2.
 * Buffering may be enabled for iter2, but not for iter1.
 */
iter1 = ...; iter2 = ...;

iternext1 = NpyIter_GetIterNext(iter1);
iternext2 = NpyIter_GetIterNext(iter2);
dataptrs1 = NpyIter_GetDataPtrArray(iter1);

do {
    NpyIter_ResetBasePointers(iter2, dataptrs1);
    do {
        /* Use the iter2 values */
    } while (iternext2(iter2));
} while (iternext1(iter1));
int NpyIter_GotoMultiIndex(NpyIter *iter, npy_intp const *multi_index)#

将迭代器调整到由 multi_index 指向的 ndim 个索引。如果未跟踪多索引、索引越界或禁用了内部循环迭代,则返回错误。

返回 NPY_SUCCEEDNPY_FAIL

int NpyIter_GotoIndex(NpyIter *iter, npy_intp index)#

将迭代器调整到指定的 index。如果迭代器是使用标志 NPY_ITER_C_INDEX 构建的,则 index 是 C 顺序索引;如果迭代器是使用标志 NPY_ITER_F_INDEX 构建的,则 index 是 Fortran 顺序索引。如果未跟踪索引、索引越界或禁用了内部循环迭代,则返回错误。

返回 NPY_SUCCEEDNPY_FAIL

npy_intp NpyIter_GetIterSize(NpyIter *iter)#

返回正在迭代的元素数量。这是形状中所有维度的乘积。当正在跟踪多索引(并且可能调用 NpyIter_RemoveAxis)时,大小可能为 -1,表示迭代器过大。这种迭代器是无效的,但在调用 NpyIter_RemoveAxis 后可能会变得有效。没有必要检查这种情况。

npy_intp NpyIter_GetIterIndex(NpyIter *iter)#

获取迭代器的 iterindex,这是一个与迭代器迭代顺序匹配的索引。

void NpyIter_GetIterIndexRange(NpyIter *iter, npy_intp *istart, npy_intp *iend)#

获取正在迭代的 iterindex 子范围。如果未指定 NPY_ITER_RANGED,则此函数始终返回范围 [0, NpyIter_IterSize(iter))

int NpyIter_GotoIterIndex(NpyIter *iter, npy_intp iterindex)#

将迭代器调整到指定的 iterindex。IterIndex 是与迭代器迭代顺序匹配的索引。如果 iterindex 超出范围、启用了缓冲或禁用了内部循环迭代,则返回错误。

返回 NPY_SUCCEEDNPY_FAIL

npy_bool NpyIter_HasDelayedBufAlloc(NpyIter *iter)#

如果传递给迭代器构造函数的标志为 NPY_ITER_DELAY_BUFALLOC 并且尚未调用任一 Reset 函数,则返回 1,否则返回 0。

npy_bool NpyIter_HasExternalLoop(NpyIter *iter)#

返回 1(如果调用者需要处理最内层的 1 维循环)或 0(如果迭代器处理所有循环)。这由构造函数标志 NPY_ITER_EXTERNAL_LOOPNpyIter_EnableExternalLoop 控制。

npy_bool NpyIter_HasMultiIndex(NpyIter *iter)#

如果迭代器是用 NPY_ITER_MULTI_INDEX 标志创建的,则返回 1,否则返回 0。

npy_bool NpyIter_HasIndex(NpyIter *iter)#

如果迭代器是用 NPY_ITER_C_INDEXNPY_ITER_F_INDEX 标志创建的,则返回 1,否则返回 0。

npy_bool NpyIter_RequiresBuffering(NpyIter *iter)#

如果迭代器需要缓冲,则返回 1,这发生在操作数需要转换或对齐,因此无法直接使用时。

npy_bool NpyIter_IsBuffered(NpyIter *iter)#

如果迭代器是用 NPY_ITER_BUFFERED 标志创建的,则返回 1,否则返回 0。

npy_bool NpyIter_IsGrowInner(NpyIter *iter)#

如果迭代器是用 NPY_ITER_GROWINNER 标志创建的,则返回 1,否则返回 0。

npy_intp NpyIter_GetBufferSize(NpyIter *iter)#

如果迭代器已缓冲,则返回正在使用的缓冲区的长度;否则返回 0。

int NpyIter_GetNDim(NpyIter *iter)#

返回正在迭代的维度数。如果在迭代器构造函数中未请求多索引,则此值可能小于原始对象的维度数。

int NpyIter_GetNOp(NpyIter *iter)#

返回迭代器中的操作数数量。

npy_intp *NpyIter_GetAxisStrideArray(NpyIter *iter, int axis)#

获取指定轴的步幅数组。要求迭代器正在跟踪多索引,并且未启用缓冲。

当您想以某种方式匹配操作数轴,然后使用 NpyIter_RemoveAxis 手动处理它们时,可以使用此功能。通过在移除轴之前调用此函数,您可以获取用于手动处理的步幅。

发生错误时返回 NULL

int NpyIter_GetShape(NpyIter *iter, npy_intp *outshape)#

outshape 中返回迭代器的广播形状。仅当迭代器正在跟踪多索引时才能调用此函数。

返回 NPY_SUCCEEDNPY_FAIL

PyArray_Descr **NpyIter_GetDescrArray(NpyIter *iter)#

返回一个指向正在迭代的对象的数据类型描述符(nop 个)的指针。结果指向 iter,因此调用者不会获得描述符的任何引用。

此指针可以在迭代循环之前缓存,调用 iternext 不会改变它。

PyObject **NpyIter_GetOperandArray(NpyIter *iter)#

返回一个指向正在迭代的操作数 PyObjectnop 个)的指针。结果指向 iter,因此调用者不会获得 PyObject 的任何引用。

PyObject *NpyIter_GetIterView(NpyIter *iter, npy_intp i)#

返回一个指向新 ndarray 视图的引用,该视图是 NpyIter_GetOperandArray 数组中第 i 个对象的视图,其维度和步幅与内部优化的迭代模式匹配。此视图的 C 顺序迭代等同于迭代器的迭代顺序。

例如,如果迭代器是用单个数组作为输入创建的,并且可以重新排列所有轴然后将其折叠成单个跨步迭代,那么这将返回一个视图,该视图是一个一维数组。

void NpyIter_GetReadFlags(NpyIter *iter, char *outreadflags)#

填充 nop 个标志。如果 op[i] 可以读取,则将 outreadflags[i] 设置为 1,否则设置为 0。

void NpyIter_GetWriteFlags(NpyIter *iter, char *outwriteflags)#

填充 nop 个标志。如果 op[i] 可以写入,则将 outwriteflags[i] 设置为 1,否则设置为 0。

int NpyIter_CreateCompatibleStrides(NpyIter *iter, npy_intp itemsize, npy_intp *outstrides)#

构建一组与使用 NPY_ITER_ALLOCATE 标志创建的输出数组的步幅相同的步幅,其中 `op_axes` 为 NULL。这是为了打包连续的数据,但不一定是 C 或 Fortran 顺序。应与 NpyIter_GetShapeNpyIter_GetNDim 一起使用,并将 NPY_ITER_MULTI_INDEX 标志传递给构造函数。

此函数的一个用例是匹配迭代器的形状和布局,然后添加一个或多个维度。例如,为了生成每个输入值的向量以进行数值梯度计算,将 `ndim*itemsize` 作为 `itemsize` 传递,然后添加一个大小为 `ndim` 且步幅为 `itemsize` 的新维度到末尾。为了计算 Hessian 矩阵,可以执行相同的操作,但添加两个维度,或者利用对称性,将其打包到一个具有特定编码的维度中。

只能在迭代器跟踪多索引并且使用了 NPY_ITER_DONT_NEGATE_STRIDES 以防止某个轴反向迭代时调用此函数。

如果使用此方法创建数组,则只需在每次迭代中增加“itemsize”即可遍历与迭代器匹配的新数组。

返回 NPY_SUCCEEDNPY_FAIL

npy_bool NpyIter_IsFirstVisit(NpyIter *iter, int iop)#

检查迭代器指向的指定归约操作数元素是否是第一次被看到。对于归约操作数和禁用缓冲时,此函数会返回合理的结果。对于已缓冲的非归约操作数,结果可能不正确。

此函数仅用于 EXTERNAL_LOOP 模式,当未启用该模式时会产生错误结果。

如果此函数返回 true,调用者还应检查操作数的内部循环步幅,因为如果该步幅为 0,则仅会第一次访问最内层外部循环的第一个元素。

警告:出于性能原因,不进行 'iop' 的边界检查,不确认 'iop' 实际上是归约操作数,也不确认是否启用了 EXTERNAL_LOOP 模式。这些检查由调用者负责,并且应在任何内部循环之外进行。

迭代函数#

NpyIter_IterNextFunc *NpyIter_GetIterNext(NpyIter *iter, char **errmsg)#

返回一个用于迭代的函数指针。此函数可能会计算一个专门的函数指针版本,而不是将其存储在迭代器结构中。因此,为了获得良好的性能,要求将函数指针保存在一个变量中,而不是在每次循环迭代时都检索它。

如果发生错误,则返回 NULL。如果 errmsg 非 NULL,则在 NPY_FAIL 返回时不会设置 Python 异常。相反,*errmsg 会被设置为错误消息。当 errmsg 非 NULL 时,可以安全地调用该函数而不持有 Python GIL。

典型的循环结构如下。

NpyIter_IterNextFunc *iternext = NpyIter_GetIterNext(iter, NULL);
char** dataptr = NpyIter_GetDataPtrArray(iter);

do {
    /* use the addresses dataptr[0], ... dataptr[nop-1] */
} while(iternext(iter));

当指定 NPY_ITER_EXTERNAL_LOOP 时,典型的内部循环结构如下。

NpyIter_IterNextFunc *iternext = NpyIter_GetIterNext(iter, NULL);
char** dataptr = NpyIter_GetDataPtrArray(iter);
npy_intp* stride = NpyIter_GetInnerStrideArray(iter);
npy_intp* size_ptr = NpyIter_GetInnerLoopSizePtr(iter), size;
npy_intp iop, nop = NpyIter_GetNOp(iter);

do {
    size = *size_ptr;
    while (size--) {
        /* use the addresses dataptr[0], ... dataptr[nop-1] */
        for (iop = 0; iop < nop; ++iop) {
            dataptr[iop] += stride[iop];
        }
    }
} while (iternext());

请注意,我们正在使用迭代器内部的 `dataptr` 数组,而不是将值复制到本地临时变量。这是可能的,因为当调用 iternext() 时,这些指针将被新的值覆盖,而不是递增更新。

如果正在使用编译时固定的缓冲区(同时设置了 NPY_ITER_BUFFEREDNPY_ITER_EXTERNAL_LOOP 标志),则内部大小也可以用作信号。当 iternext() 返回 false 时,大小保证会变为零,从而启用以下循环结构。请注意,如果您使用此结构,则不应将 NPY_ITER_GROWINNER 作为标志传递,因为它在某些情况下会导致更大的尺寸。

/* The constructor should have buffersize passed as this value */
#define FIXED_BUFFER_SIZE 1024

NpyIter_IterNextFunc *iternext = NpyIter_GetIterNext(iter, NULL);
char **dataptr = NpyIter_GetDataPtrArray(iter);
npy_intp *stride = NpyIter_GetInnerStrideArray(iter);
npy_intp *size_ptr = NpyIter_GetInnerLoopSizePtr(iter), size;
npy_intp i, iop, nop = NpyIter_GetNOp(iter);

/* One loop with a fixed inner size */
size = *size_ptr;
while (size == FIXED_BUFFER_SIZE) {
    /*
     * This loop could be manually unrolled by a factor
     * which divides into FIXED_BUFFER_SIZE
     */
    for (i = 0; i < FIXED_BUFFER_SIZE; ++i) {
        /* use the addresses dataptr[0], ... dataptr[nop-1] */
        for (iop = 0; iop < nop; ++iop) {
            dataptr[iop] += stride[iop];
        }
    }
    iternext();
    size = *size_ptr;
}

/* Finish-up loop with variable inner size */
if (size > 0) do {
    size = *size_ptr;
    while (size--) {
        /* use the addresses dataptr[0], ... dataptr[nop-1] */
        for (iop = 0; iop < nop; ++iop) {
            dataptr[iop] += stride[iop];
        }
    }
} while (iternext());
NpyIter_GetMultiIndexFunc *NpyIter_GetGetMultiIndex(NpyIter *iter, char **errmsg)#

返回一个用于获取迭代器当前多索引的函数指针。如果迭代器未跟踪多索引,则返回 NULL。建议在迭代循环之前将此函数指针缓存到局部变量中。

如果发生错误,则返回 NULL。如果 errmsg 非 NULL,则在 NPY_FAIL 返回时不会设置 Python 异常。相反,*errmsg 会被设置为错误消息。当 errmsg 非 NULL 时,可以安全地调用该函数而不持有 Python GIL。

char **NpyIter_GetDataPtrArray(NpyIter *iter)#

返回一个指向数据指针数组的指针(nop 个)。如果未指定 NPY_ITER_EXTERNAL_LOOP,则每个数据指针指向迭代器的当前数据项。如果没有指定内部迭代,它将指向内部循环的第一个数据项。

此指针可以在迭代循环之前缓存,调用 iternext 不会改变它。可以安全地调用此函数而不持有 Python GIL。

char **NpyIter_GetInitialDataPtrArray(NpyIter *iter)#

获取数据指针数组,直接指向数组(绝不会指向缓冲区),对应于迭代索引 0。

这些指针与 NpyIter_ResetBasePointers 接受的指针不同,因为某些轴的方向可能已被反转。

可以安全地调用此函数而不持有 Python GIL。

npy_intp *NpyIter_GetIndexPtr(NpyIter *iter)#

返回一个指向正在跟踪的索引的指针,如果没有跟踪索引则返回 NULL。只有在构造期间指定了 NPY_ITER_C_INDEXNPY_ITER_F_INDEX 标志之一时,才能使用此函数。

当使用 NPY_ITER_EXTERNAL_LOOP 标志时,代码需要知道用于内部循环的参数。这些函数提供了这些信息。

npy_intp *NpyIter_GetInnerStrideArray(NpyIter *iter)#

返回一个指向 nop 个步幅的数组的指针,每个迭代对象一个,供内部循环使用。

此指针可以在迭代循环之前缓存,调用 iternext 不会改变它。可以安全地调用此函数而不持有 Python GIL。

警告:虽然指针可以被缓存,但如果迭代器已缓冲,其值可能会改变。

npy_intp *NpyIter_GetInnerLoopSizePtr(NpyIter *iter)#

返回一个指向内部循环应执行的迭代次数的指针。

此地址可以在迭代循环之前缓存,调用 iternext 不会改变它。该值本身在迭代期间可能会改变,尤其是在启用缓冲的情况下。可以安全地调用此函数而不持有 Python GIL。

void NpyIter_GetInnerFixedStrideArray(NpyIter *iter, npy_intp *out_strides)#

获取一个步幅数组,这些步幅是固定的,或者在整个迭代过程中不会改变。对于可能改变的步幅,值 NPY_MAX_INTP 会被放入步幅中。

一旦迭代器准备好进行迭代(如果在 NPY_ITER_DELAY_BUFALLOC 使用后进行了重置),请调用此函数以获取可能用于选择快速内部循环函数的步幅。例如,如果步幅为 0,则意味着内部循环可以始终将值加载到一个变量中,然后整个循环使用该变量,或者如果步幅等于 itemsize,则可以对该操作数使用连续的版本。

可以安全地调用此函数而不持有 Python GIL。

从之前的 NumPy 迭代器转换#

旧的迭代器 API 包括 PyArrayIter_Check、PyArray_Iter* 和 PyArray_ITER_* 等函数。多迭代器数组包括 PyArray_MultiIter*、PyArray_Broadcast 和 PyArray_RemoveSmallest。新的迭代器设计用一个对象和关联的 API 替换了所有这些功能。新 API 的一个目标是,现有的迭代器的所有用法都可以与新迭代器无缝替换。在 1.6 版本中,主要例外是邻域迭代器,它在此迭代器中没有对应的功能。

以下是新迭代器要使用的函数转换表