修改 reduce ops
reduce_ops.py 里的max、min、sum等操作改成Functor直接导出。
问题1
ninja 突然出现编译错误问题,重新clone之后切换到当前分支也会编译报错,最终删除分支重建。
/home/zhongshanshan/workspace/f/oneflow/user/kernels/eager_nccl_kernels.cu(275): warning: overloaded virtual function "oneflow::user_op::OpKernel::Compute" is only partially overridden in class "oneflow::EagerNcclS2SKernel<oneflow::float16>"
detected during:
instantiation of class "oneflow::EagerNcclS2SKernel<T> [with T=oneflow::float16]"
/home/zhongshanshan/workspace/f/oneflow/core/framework/op_kernel.h(332): here
instantiation of "oneflow::user_op::OpKernel *oneflow::user_op::NewOpKernel<T,Args...>(Args &&...) [with T=oneflow::EagerNcclS2SKernel<oneflow::float16>, Args=<>]"
/home/zhongshanshan/workspace/f/oneflow/core/framework/user_op_kernel_registry.h(84): here
instantiation of "oneflow::user_op::OpKernelRegistry &oneflow::user_op::OpKernelRegistry::SetCreateFn<T>() [with T=oneflow::EagerNcclS2SKernel<oneflow::float16>]"
(400): here
/home/zhongshanshan/workspace/f/oneflow/user/kernels/eager_nccl_kernels.cu(75): warning: overloaded virtual function "oneflow::user_op::OpKernel::Compute" is only partially overridden in class "oneflow::EagerNcclAllReduceKernel"
/home/zhongshanshan/workspace/f/oneflow/user/kernels/eager_nccl_kernels.cu(108): warning: overloaded virtual function "oneflow::user_op::OpKernel::Compute" is only partially overridden in class "oneflow::EagerNcclBroadcastKernel"
/home/zhongshanshan/workspace/f/oneflow/user/kernels/eager_nccl_kernels.cu(147): warning: overloaded virtual function "oneflow::user_op::OpKernel::Compute" is only partially overridden in class "oneflow::EagerNcclTouchKernel"
/home/zhongshanshan/workspace/f/oneflow/user/kernels/eager_nccl_kernels.cu(164): warning: overloaded virtual function "oneflow::user_op::OpKernel::Compute" is only partially overridden in class "oneflow::EagerNcclReduceKernel"
/home/zhongshanshan/workspace/f/oneflow/user/kernels/eager_nccl_kernels.cu(202): warning: overloaded virtual function "oneflow::user_op::OpKernel::Compute" is only partially overridden in class "oneflow::EagerNcclReduceScatterKernel"
/home/zhongshanshan/workspace/f/oneflow/user/kernels/eager_nccl_kernels.cu(244): warning: overloaded virtual function "oneflow::user_op::OpKernel::Compute" is only partially overridden in class "oneflow::EagerNcclAllGatherKernel"
[188/407] Building CUDA object CMakeFiles/oneflow.dir...core/ndarray/ndarray_apply_broadcast_binary_core.cu.o
ninja: build stopped: subcommand failed.
问题2
如何让同一个接口传入int 和 list?
在接口文件 yaml 中用 Int32List[1] axis 即可同时接收 int 和 list。
如何初始化一个同时接受int和list的接口?
使用 Int32List[1] axis,则无法在缺失 axis 的情况下直接计算结果。我看到有两种解决方案:
- Int32List[1] axis=None
- 写两个接口,例如
signature: [
"TensorTuple (Tensor input, Int32 indices_or_sections) => HsplitInt",
"TensorTuple (Tensor input, Int32List indices_or_sections) => HsplitVec",
]
选择 Int32List[1] axis=None时, /oneflow/core/functional/impl/nn_functor.cpp
会报错,显示参数不匹配,这个文件调用和初始化了 ReduceSum。即使将nn_functor.cpp
的 axis 从开始的 {} 写成 {NULL},仍然会出现奇怪的其他文件报错。但看到其他接口有这种用法。为了避免修改其他文件导致其他问题,最终选择写两个接口。写两个接口同样存在瑕疵,即在python tensor sum 接口中不可以使用 axis=None,应当使用 axis=[]。
review 的时候有同学建议同类中实现两次不同参数的 operator ,但是阅读 功能接口 发现接口必须是正交的。
问题3
使用匿名空间封装:
namespace {
std::string exception_check(int32_t base, int32_t value, bool check_ge = true,
bool check_le = true) {
printf("%d, %d, %d, %d\n", base, value, check_ge, check_le);
if (check_ge) {
printf("ttt: %d, %d, %d, %d\n", base, value, check_ge, check_le);
CHECK_GE_OR_RETURN(base, value) << "Dimension out of range, expected to be in range of ["
<< -base << ", " << base - 1 << "], but got " << value;
}
if (check_le) {
CHECK_LE_OR_RETURN(-base, value) << "Dimension out of range, expected to be in range of ["
<< -base << ", " << base - 1 << "], but got " << value;
}
return "";
}
} // namespace
exception_check(naxis, axis[i]);
报错信息不显示。通过了解 OneFlow中的错误处理:Maybe 解决问题。