Warm tip: This article is reproduced from serverfault.com, please click

conv-neural-network keras opencv python-3.x tensorflow

Keras: Load dataset and autocrop relevant area of image

发布于 2020-12-01 19:43:49

I'm working on signature verification and there were a bunch of things I wanted to do using Keras/ OpenCV/ PIL but couldn't find relevant information. I have loaded the dataset folder using Keras.preprocessing.image_dataset_from_directory and now need to:

Crop the signature from the image stored in the dataset. There may be rectangular borders (or a side of the border) and the border pixels aren't the same in all images.
Resize the image and also take care of augmentation in the signature.

Example Images:

Since I'm working in Keras, I thought of working with its functions but couldn't find any. How can I auto crop/ extract a signature in the dataset I've loaded? About image augmentation, should I do this in this image preprocessing stage, or implement this in CNN model I am using? I am new to image processing and Keras.

Also, because of loading entire training folder as a dataset, the labels are "Genuine" and "Forged". However, there are multiple genuine and forged signatures of a person, and there are multiple people. How do I divide the data?

Questioner

Aryan

Viewed

0

Gerry P 2020-12-02 11:19:07

Organize your directories as follows

main_dir
-train_dir
``person1_fake_dir
```person1 fake image
```person1 fake image
---etc
``person1_real_dir
---person1 real image
---person1 real image
--- etc
--person2_fake_dir
--- person2 fake image
--- person2 fake image
--- etc
--person2_real_dir
---person2 real image
---person2 real image
---etc
.
.
.
--personN_fake_dir
---personN fake image
---personN fake image
---etc
--personN_real_dir
---personN real image
---personN real image
--- etc

-test_dir
same structure as train_dir but put test images here

-valid_dir
same structure as train_dir but put validation images here

If you have N persons then you will have 2 X N classes

You can then use tf.keras.preprocessing.image.ImageDataGenerator().flow_from_directory() to input your data. Documentation is here. You don't have to worry about cropping the images just set the image size in flow to something like (256,256). Code below show the rest of the code you need

data_gen=tf.keras.preprocessing.image.ImageDataGenerator(resize=1/255)
train_gen=data_gen.flow_from_directory(train_dir, target_size=(224,224), color-mode='grayscale')
valid_gen=data_gen.flow_from_directory(valid_dir, target_size=(224,224), color-mode='grayscale', shuffle=False)
test_gen=data_gen.flow_from_directory(test_dir, target_size=(224,224), color-mode='grayscale', shuffle=False)
model.compile(optimizer=tf.keras.optimizers.Adam(), loss=tf.keras.losses.CategoricalCrossentropy(), metrics='accuracy')
history=model.fit(train_gen, epochs=20, verbose=1)
accuracy=model.evaluate (test_gen)[1]*100
print ('Model accuracy is ', accuracy)

Note your model will not be able to tell fake from real in the general case. It should work for persons 1 through N. You could try putting all the fake images in one class directory and all the real images in another class directory and train it but I suspect it will not work well in telling real from fake for the general case.

Aryan 2020-12-03 04:31:41

Hi! Thanks for your answer! It solved most of my queries. To enable it to find fake signature from real ones, I will give a few real signatures and a fake signature of another person as an input

热门github

1

A multi-platform library for OpenGL, OpenGL ES, Vulkan, window and input

2

Dev tool that writes scalable apps from scratch while the developer oversees the implementation

3

shadcn/ui, but for Svelte. ✨

4

The Python Risk Identification Tool for generative AI (PyRIT) is an open access automation framework to empower security professionals and machine learning engineers to proactively find risks in their generative AI systems.

5

Performance-portable, length-agnostic SIMD with runtime dispatch

6

ZK Credo

7

OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement

8

Joplin - the secure note taking and to-do app with synchronisation capabilities for Windows, macOS, Linux, Android and iOS.

9

Mamba is a new state space model architecture showing promising performance on information-dense data such as language modeling, where previous subquadratic models fall short of Transformers. It is based on the line of progress on structured state space models, with an efficient hardware-aware design and implementation in the spirit of FlashAttention.

10

This repository contains System Design resources which are useful while preparing for interviews and learning Distributed Systems

11

Curso para aprender el lenguaje de programación Python desde cero y para principiantes. 75 clases, 37 horas en vídeo, código, proyectos y grupo de chat. Fundamentos, frontend, backend, testing, IA...

12

🎓 Path to a free self-taught education in Computer Science!

13

1️⃣🐝🏎️ The One Billion Row Challenge -- A fun exploration of how quickly 1B rows from a text file can be aggregated with Java

14

A collective list of free APIs

15

📚 Freely available programming books