a NSicã@s`UddlZddlmZddlmZmZddlZgZeeed<Gdd„dƒZ eedœdd „Z dS) éN)ÚTensor)ÚCallableÚListÚ__all__c@seZdZedœdd„ZdS)Ú_CodeParser©Úcode_stringcCs¬d}d}d}d}d}d}d}|d|||||||||||} t | |tj¡} | durvtd |›ƒ‚| d |_| d|_| d|_| d |_| d|_dS)Nz\s*z\s+z(?P\<.+\>)z(?P\w+)z(?P\w+)z(?P$.+$)z(?P\{.+\})Útemplatez0Couldn't parse code, please check correctness: Útemplate_paramsÚreturn_typeÚ function_nameÚfunction_paramsÚ function_body) ÚreÚmatchÚDOTALLÚ Exceptionr rrr r)ÚselfrZoptional_wsZrequired_wsr rrr rÚpatternÚresult©rúP/var/www/html/django/DPS/env/lib/python3.9/site-packages/torch/cuda/jiterator.pyÚ__init__ sRÿþþýýüüûûúúùÿ z_CodeParser.__init__N)Ú__name__Ú __module__Ú__qualname__Ústrrrrrrr sr)rÚreturncKsGdd„dƒ}||fi|¤ŽS)aœ Create a jiterator-generated cuda kernel for an elementwise op. The code string has to be a valid CUDA function that describes the computation for a single element. The code string has to follow the c++ template pattern, as shown in the example below. This function will be inlined into elementwise kernel template, and compiled on the fly. Compiled kernel will be cached in memory, as well as local temp dir. Jiterator-generated kernels accepts noncontiguous tensors, and supports boardcasting and type promotion. Args: code_string (string): CUDA code string to be compiled by jiterator. kwargs (Dict, optional): Keyword arguments for generated function Example: >>> code_string = "template T my_kernel(T x, T y, T alpha) { return -x + alpha * y; }" >>> jitted_fn = create_jit_fn(code_string, alpha=1.0) >>> a = torch.rand(3, device='cuda') >>> b = torch.rand(3, device='cuda') >>> # invoke jitted function like a regular python function >>> result = jitted_fn(a, b, alpha=3.14) Jiterator can be used together with python registration to override an operator's cuda kernel Following example is overriding gelu's cuda kernel with relu: >>> code_string = "template T my_gelu(T a) { return a > 0 ? a : 0; }" >>> my_gelu = create_jit_fn(code_string) >>> my_lib = torch.library.Library("aten", "IMPL") >>> my_lib.impl('aten::gelu', my_gelu, "CUDA") >>> # torch.nn.GELU and torch.nn.function.gelu are now overridden >>> a = torch.rand(3, device='cuda') >>> torch.allclose(torch.nn.functional.gelu(a), torch.nn.functional.relu(a)) .. warning:: This API is in beta and may change in future releases. .. warning:: Jiterator only supports up to 8 tensor inputs .. warning:: All input tensors must live in CUDA device c@s(eZdZedœdd„Zedœdd„ZdS)z&_create_jit_fn..JittedFunctionrc[s,||_t|ƒ}|j|_||_tj ¡|_dS)N) rrrÚkernel_nameÚkwargs_dictÚtorchÚcudaÚis_availableÚis_cuda_available)rrÚkwargsZparsed_coderrrrXs z/_create_jit_fn..JittedFunction.__init__)Útensorsc_sv|jsJdƒ‚t|ƒdks"Jdƒ‚|j ¡}| ¡D]*\}}||jvrP|||<q4t|›dƒ‚q4tj |j |j ||¡S)NzEJiterator is only supported on CUDA GPUs, no CUDA GPUs are available.éz.jiterator only supports up to 8 tensor inputs.z' is not declared in function definition)r#ÚlenrÚcopyÚitemsÚKeyErrorr Ú_CÚ)_cuda_jiterator_compile_and_launch_kernelrr)rr%r$Zexpanded_kwargsÚkeyÚvaluerrrÚ__call__as üz/_create_jit_fn..JittedFunction.__call__N)rrrrrrr/rrrrÚJittedFunctionWs r0r)rr$r0rrrÚ_create_jit_fn)s.r1)r rÚtypingrrrrrÚ__annotations__rr1rrrrÚs