python-How can I customize the gradient computation at training time in keras?

David Vander Mijnsbrugge 2020-11-29 01:26:56

Tensorflow 2.x allows functions to be executed as tf.graphs [1]. So decorating grad(dy) with @tf.function should work, but you'll run into a new error since MyOp takes nat_grad as an input it will expect a gradient for this variable [2].

@tf.custom_gradient
def MyOp(inputs, w, nat_grad=False):
    output = w*inputs
    @tf.function
    def grad(dy):
        if nat_grad:
            return dy, 1.0, 0.
        else:
            return -dy, -1.0, 0.
    return output, grad

It seems to me that this is not the way to do this and rather split the gradient op in 2 parts and call them seperately in call.

@tf.custom_gradient
def NatOp(inputs, w):
    output = w*inputs
    def grad(dy):
        return dy, 1.0     
    return output, grad

@tf.custom_gradient
def RegOp(inputs, w):
    output = w*inputs
    def grad(dy):
        return -dy, -1.0
    return output, grad

class MyKerasLayer(tf.keras.layers.Layer):
    def __init__(self):
        super().__init__()
        self.nat_grad = False
        
    def build(self, input_shape):
        self.w = self.add_weight("w", dtype=tf.float32, trainable=True, initializer=tf.random_normal_initializer)
        super().build(input_shape)

    def call(self, inputs):
        return NatOp(inputs, self.w) if self.nat_grad else RegOp(inputs, self.w)

[1] https://www.tensorflow.org/api_docs/python/tf/function

[2] https://www.tensorflow.org/api_docs/python/tf/custom_gradient

Ziofil 2020-11-28 18:09:11

Thank you for this suggestion. I also thought of doubling the ops, but I actually have several of them, and the only thing that differs are a few lines in the custom gradients. So I'd have a lot of doubled code to maintain... It would be sad if this were the only way : /

Ziofil 2020-11-28 18:30:11

David, you gave me an idea though: if I decorate the grad function with @tf.function I can simply return 0 as gradient for the additional input (the nat_grad bool). I think it's working.

Ziofil 2020-11-29 14:25:45

If you'd like to update your answer (see my previous comment), I'll mark it as accepted.

David Vander Mijnsbrugge 2020-11-30 08:31:25

The first solution in the answer returns 0.0 as gradient for the bool. I added the second solution since I feel like this is 'hacky'. On the other hand if it's solves your problem perfectly it seems like the pragmatic thing to do.

How can I customize the gradient computation at training time in keras?

热门github