thunder

Compiling functions and modules

jit(fn, /, *[, langctx, executors, ...])

Just-in-time compile a callable (function or model).

Querying information on compiled functions and modules

`DebugOptions`(**options)	options can be dynamically registered, currently registered ones are below
`compile_data`(fn)	Obtains the compilation data from a JITed function.
`compile_stats`(fn)	Obtains the compilation statistics from a JITed function.
`last_traces`(fn)	Obtains the list of computation traces that have been produced for the last run of the function.
`last_backward_traces`(fn)	Obtains the list of backward traces that have been produced for the last run of the function and the selected prologue.
`last_prologue_traces`(fn)	Obtains the list of prologue traces that have been produced for the last run of the function and the selected prologue.
`cache_option`(fn)	Returns the cache options set when JITting the function.
`cache_hits`(fn)	Returns the number of cache hits we found when running the function.
`cache_misses`(fn)	Returns the number of cache misses we found when running the function.
`list_transforms`(fn)	Returns the list of (explicit) transforms applied to the JITed function.
`last_interpreted_instructions`(fn)	Returns the list of instructions the interpreter encountered while tracing through the user program (on the last cache miss).
`last_interpreter_log`(fn)	Returns the list of instructions and other information the interpreter encountered while tracing through the user program (on the last cache miss).
`last_compile_options`(fn, /)	Prints how compiled options were used (or not)

JITed Model wrapper

class thunder.ThunderModule(model, compiled_model_call)[source]

Bases: Module

A wrapper nn.Module subclass.

This wrapper is returned by thunder.jit, you would typically not instantiate it manually.

get_buffer(name)[source]

Return the buffer given by target if it exists, otherwise throw an error.

See the docstring for get_submodule for a more detailed explanation of this method’s functionality as well as how to correctly specify target.

Parameters:: target – The fully-qualified string name of the buffer to look for. (See get_submodule for how to specify a fully-qualified string.)
Returns:: The buffer referenced by target
Return type:: torch.Tensor
Raises:: AttributeError – If the target string references an invalid path or resolves to something that is not a buffer

get_parameter(name)[source]

Return the parameter given by target if it exists, otherwise throw an error.

See the docstring for get_submodule for a more detailed explanation of this method’s functionality as well as how to correctly specify target.

Parameters:: target – The fully-qualified string name of the Parameter to look for. (See get_submodule for how to specify a fully-qualified string.)
Returns:: The Parameter referenced by target
Return type:: torch.nn.Parameter
Raises:: AttributeError – If the target string references an invalid path or resolves to something that is not an nn.Parameter

get_submodule(name)[source]

Return the submodule given by target if it exists, otherwise throw an error.

For example, let’s say you have an nn.Module A that looks like this:

A(
    (net_b): Module(
        (net_c): Module(
            (conv): Conv2d(16, 33, kernel_size=(3, 3), stride=(2, 2))
        )
        (linear): Linear(in_features=100, out_features=200, bias=True)
    )
)

(The diagram shows an nn.Module A. A which has a nested submodule net_b, which itself has two submodules net_c and linear. net_c then has a submodule conv.)

To check whether or not we have the linear submodule, we would call get_submodule("net_b.linear"). To check whether we have the conv submodule, we would call get_submodule("net_b.net_c.conv").

The runtime of get_submodule is bounded by the degree of module nesting in target. A query against named_modules achieves the same result, but it is O(N) in the number of transitive modules. So, for a simple check to see if some submodule exists, get_submodule should always be used.

Parameters:: target – The fully-qualified string name of the submodule to look for. (See above example for how to specify a fully-qualified string.)
Returns:: The submodule referenced by target
Return type:: torch.nn.Module
Raises:: AttributeError – If the target string references an invalid path or resolves to something that is not an nn.Module

load_state_dict(state_dict, strict=True, assign=False)[source]

Loads the state dict to a transformed module.

Parameters:

state_dict (Mapping[str, Any]) – the state dict to load.
strict (bool) – error on missing / unused state dict members
assign (bool) – assign the state dict tensors instead of copying the data
state_dict (Mapping[str, Any]) –
strict (bool) –
assign (bool) –

This is similar much more simple than the original load_state_dict. (Regarding hooks, customization etc.)

named_buffers(prefix='', recurse=True, remove_duplicate=True, *, persistent=None)[source]

Return an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.

Parameters:

prefix (str) – prefix to prepend to all buffer names.
recurse (bool, optional) – if True, then yields buffers of this module and all submodules. Otherwise, yields only buffers that are direct members of this module. Defaults to True.
remove_duplicate (bool, optional) – whether to remove the duplicated buffers in the result. Defaults to True.

Yields:

(str, torch.Tensor) – Tuple containing the name and buffer

Example:

>>> # xdoctest: +SKIP("undefined vars")
>>> for name, buf in self.named_buffers():
>>>     if name in ['running_var']:
>>>         print(buf.size())

named_parameters(prefix='', recurse=True, remove_duplicate=True)[source]

Return an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.

Parameters:

prefix (str) – prefix to prepend to all parameter names.
recurse (bool) – if True, then yields parameters of this module and all submodules. Otherwise, yields only parameters that are direct members of this module.
remove_duplicate (bool, optional) – whether to remove the duplicated parameters in the result. Defaults to True.

Yields:

(str, Parameter) – Tuple containing the name and parameter

Example:

>>> # xdoctest: +SKIP("undefined vars")
>>> for name, param in self.named_parameters():
>>>     if name in ['bias']:
>>>         print(param.size())

no_sync()[source]

Context manager to disable gradient synchronization in data parallel mode.

This context manager is intended to be used in conjunction with torch.nn.parallel.DistributedDataParallel to disable gradient synchronization in the backward pass. It will not have any effect when used with other modules.

Note

This could lead to different accumulated gradients with torch.nn.parallel.distributed.DistributedDataParallel.no_sync. PyTorch’s gradient synchronization is implemented by applying all-reduce to gradient buckets of torch.nn.Parameter.grad. Thus the no_sync context leads to $AllReduce (\sum_{i = 0}^{ga_steps} g_{i})$ where $ga_steps$ means the number of gradient accumulation steps. In contrast, this synchronizes accumulated gradients when exiting, leading to $AllReduce (\sum_{i = 0}^{ga_steps - 1} g_{i}) + AllReduce (g_{ga_steps})$ .

Warning

You must reuse this context manager in each group of gradient accumulation iterations since gradients will get synchronized on context manager exit.

with model.no_sync():
    for _ in range(len(gradient_accumulation_iters)):
        loss(model(x)).backward()  # uses no-sync-backward trace
loss(model(x)).backward()  # uses the regular backward trace
optimizer.step()

original_state_dict(*, destination=None, prefix='', keep_vars=False)[source]

Returns the state dict of the transformed ThunderModule with reverse transform applied.

For example, ThunderModule.state_dict() returns a state dict of sharded tensors if a model is thunder.distributed.fsdp() applied while ThunderModule.original_state_dict() returns a state dict of unsharded tensors.

Parameters:

destination (Optional[dict[str, Any]]) – if given, use this mutable mapping as the dict container.
prefix (str) – a prefix for the keys.
keep_vars (bool) – do not detach
destination (Union[None, dict[str, Any]]) –
prefix (str) –
keep_vars (bool) –

Return type:

dict[str, Any]

state_dict(*, destination=None, prefix='', keep_vars=False)[source]

Returns the state dict of the (transformed) Thunder module.

Parameters:

destination – if given, use this mutable mapping as the dict container.
prefix – a prefix for the keys.
keep_vars – do not detach

Note that this is similar but rather more rudimentary than the original state_dict (e.g. no hook suport yet).