Hej,
I'm wondering about the reasoning behind this constraint in grads_and_grad_moms.
The given variables are indeed used in multiple operations in the loss computation graph, nonetheless that shouldn't hinder it to compute the grad.
Could you shed some light on this?
Hej,
I'm wondering about the reasoning behind this constraint in
grads_and_grad_moms.The given variables are indeed used in multiple operations in the loss computation graph, nonetheless that shouldn't hinder it to compute the grad.
Could you shed some light on this?