I want to implement the natural gradient in a keras layer. This has to happen inside the customized gradient that is already in place. I want to be able to choose which implementation should be computed (regular or natural gradient) when I call the optimizer.
The problem I'm facing is that when I pass the bool nat_grad=True
to the Op at training time (not at graph build time), AutoGraph isn't happy.
At the moment the pseudocode of what is happening is as follows:
@tf.custom_gradient
def MyOp(inputs, w, nat_grad=False):
output = w*inputs
def grad(dy):
if nat_grad:
return dy, 1.0
else:
return -dy, -1.0
return output, grad
class MyKerasLayer(tf.keras.layers.Layer):
def __init__(self):
super().__init__()
self.nat_grad = False
def build(self, input_shape):
self.w = self.add_weight("w", dtype=tf.float32, trainable=True, initializer=tf.random_normal_initializer)
super().build(input_shape)
def call(self, inputs):
return MyOp(inputs, self.w, self.nat_grad)
class MyModel(tf.keras.Sequential):
def __init__(self, num_layers):
super().__init__([tf.keras.Input(shape=[1], batch_size=None, dtype=tf.float32)]+[MyKerasLayer() for _ in range(num_layers)])
def optimize(model, X, Y, nat_grad:bool):
for layer in model.layers:
layer.nat_grad = nat_grad
model.fit(x=X, y=Y)
model = MyModel(5)
model.compile(optimizer='SGD', loss=lambda x,y:x-y, metrics=[])
X = np.array([1.0, 2.0, 3.0])
Y = np.array([1.0, 2.0, 3.0])
optimize(model, X, Y, nat_grad=True)
>>> OperatorNotAllowedInGraphError: using a `tf.Tensor` as a Python `bool` is not allowed: AutoGraph did convert this function. This might indicate you are trying to use an unsupported feature.
What is the right way to do this?
Tensorflow 2.x allows functions to be executed as tf.graphs [1].
So decorating grad(dy)
with @tf.function
should work, but you'll run into a new error since MyOp
takes nat_grad
as an input it will expect a gradient for this variable [2].
@tf.custom_gradient
def MyOp(inputs, w, nat_grad=False):
output = w*inputs
@tf.function
def grad(dy):
if nat_grad:
return dy, 1.0, 0.
else:
return -dy, -1.0, 0.
return output, grad
It seems to me that this is not the way to do this and rather split the gradient op in 2 parts and call them seperately in call
.
@tf.custom_gradient
def NatOp(inputs, w):
output = w*inputs
def grad(dy):
return dy, 1.0
return output, grad
@tf.custom_gradient
def RegOp(inputs, w):
output = w*inputs
def grad(dy):
return -dy, -1.0
return output, grad
class MyKerasLayer(tf.keras.layers.Layer):
def __init__(self):
super().__init__()
self.nat_grad = False
def build(self, input_shape):
self.w = self.add_weight("w", dtype=tf.float32, trainable=True, initializer=tf.random_normal_initializer)
super().build(input_shape)
def call(self, inputs):
return NatOp(inputs, self.w) if self.nat_grad else RegOp(inputs, self.w)
[1] https://www.tensorflow.org/api_docs/python/tf/function
[2] https://www.tensorflow.org/api_docs/python/tf/custom_gradient
Thank you for this suggestion. I also thought of doubling the ops, but I actually have several of them, and the only thing that differs are a few lines in the custom gradients. So I'd have a lot of doubled code to maintain... It would be sad if this were the only way : /
David, you gave me an idea though: if I decorate the grad function with
@tf.function
I can simply return 0 as gradient for the additional input (thenat_grad
bool). I think it's working.If you'd like to update your answer (see my previous comment), I'll mark it as accepted.
The first solution in the answer returns
0.0
as gradient for the bool. I added the second solution since I feel like this is 'hacky'. On the other hand if it's solves your problem perfectly it seems like the pragmatic thing to do.