Warm tip: This article is reproduced from serverfault.com, please click

Keras: Load dataset and autocrop relevant area of image

发布于 2020-12-01 19:43:49

I'm working on signature verification and there were a bunch of things I wanted to do using Keras/ OpenCV/ PIL but couldn't find relevant information. I have loaded the dataset folder using Keras.preprocessing.image_dataset_from_directory and now need to:

  • Crop the signature from the image stored in the dataset. There may be rectangular borders (or a side of the border) and the border pixels aren't the same in all images.
  • Resize the image and also take care of augmentation in the signature.

Example Images:

First Image

Second Image

Since I'm working in Keras, I thought of working with its functions but couldn't find any. How can I auto crop/ extract a signature in the dataset I've loaded? About image augmentation, should I do this in this image preprocessing stage, or implement this in CNN model I am using? I am new to image processing and Keras.

Also, because of loading entire training folder as a dataset, the labels are "Genuine" and "Forged". However, there are multiple genuine and forged signatures of a person, and there are multiple people. How do I divide the data?

Questioner
Aryan
Viewed
0
Gerry P 2020-12-02 11:19:07

Organize your directories as follows

main_dir
-train_dir
``person1_fake_dir
```person1 fake image
```person1 fake image
---etc
``person1_real_dir
---person1 real image
---person1 real image
--- etc
--person2_fake_dir
--- person2 fake image
--- person2 fake image
--- etc
--person2_real_dir
---person2 real image
---person2 real image
---etc
.
.
.
--personN_fake_dir
---personN fake image
---personN fake image
---etc
--personN_real_dir
---personN real image
---personN real image
--- etc

-test_dir
same structure as train_dir but put test images here

-valid_dir
same structure as train_dir but put validation images here

If you have N persons then you will have 2 X N classes

You can then use tf.keras.preprocessing.image.ImageDataGenerator().flow_from_directory() to input your data. Documentation is here. You don't have to worry about cropping the images just set the image size in flow to something like (256,256). Code below show the rest of the code you need

data_gen=tf.keras.preprocessing.image.ImageDataGenerator(resize=1/255)
train_gen=data_gen.flow_from_directory(train_dir, target_size=(224,224), color-mode='grayscale')
valid_gen=data_gen.flow_from_directory(valid_dir, target_size=(224,224), color-mode='grayscale', shuffle=False)
test_gen=data_gen.flow_from_directory(test_dir, target_size=(224,224), color-mode='grayscale', shuffle=False)
model.compile(optimizer=tf.keras.optimizers.Adam(), loss=tf.keras.losses.CategoricalCrossentropy(), metrics='accuracy')
history=model.fit(train_gen, epochs=20, verbose=1)
accuracy=model.evaluate (test_gen)[1]*100
print ('Model accuracy is ', accuracy)

Note your model will not be able to tell fake from real in the general case. It should work for persons 1 through N. You could try putting all the fake images in one class directory and all the real images in another class directory and train it but I suspect it will not work well in telling real from fake for the general case.