I wonder why YOLO pictures need to have a bounding box.
Assume that we are using Darknet
. Each image need to have a corresponding .txt
file with the same name as the image file. Inside the .txt
file it need to be. It's the same for all YOLO frameworks that are using bounded boxes
for labeling.
<object-class> <x> <y> <width> <height>
Where x
, y
, width
, and height
are relative to the image's width and height.
For exampel. If we goto this page and press YOLO Darknet TXT
button and download the .zip
file and then go to train
folder. Then we can see a these files
IMG_0074_jpg.rf.64efe06bcd723dc66b0d071bfb47948a.jpg
IMG_0074_jpg.rf.64efe06bcd723dc66b0d071bfb47948a.txt
Where the .txt
file looks like this
0 0.7055288461538461 0.6538461538461539 0.11658653846153846 0.4110576923076923
1 0.5913461538461539 0.3545673076923077 0.17307692307692307 0.6538461538461539
Every image has the size 416x416
. This image looks like this:
My idéa is that every image should have one
class. Only one
class. And the image should taked with a camera like this.
This camera snap should been taked as:
416x416
Like this:
And then every .txt
file that correspons for every image should look like this:
<object-class> 0 0 1 1
Question
Is this possible for e.g Darknet
or other framework that are using bounded boxes
to labeling the classes?
Instead of let the software
e.g Darknet
upscale the bounded boxes to 416x416
for every class object, then I should do it and change the .txt
file to x = 0, y = 0, width = 1, height = 1
for every image that only having one
class object.
Is that possible for me to create a traing set
in that way and train with it?
Little disclaimer I have to say that I am not an expert on this, I am part of a project and we are using darknet so I had some time experimenting.
So if I understand it right you want to train with cropped single class images with full image sized bounding boxes.
It is possible to do it and I am using something like that but it is most likely not what you want.
Let me tell you about the problems and unexpected behaviour this method creates.
When you train with images that has full image size bounding boxes yolo can not make proper detection because while training it also learns the backgrounds and empty spaces of your dataset. More specifically objects on your training dataset has to be in the same context as your real life usage. If you train it with dog images on the jungle it won't do a good job of predicting dogs in house.
If you are only going to use it with classification you can still train it like this it still classifies fine but images that you are going to predict also should be like your training dataset, so by looking at your example if you train images like this cropped dog picture your model won't be able to classify the dog on the first image.
For a better example, in my case detection wasn't required. I am working with food images and I only predict the meal on the plate, so I trained with full image sized bboxes since every food has one class. It perfectly classifies the food but the bboxes are always predicted as full image.
So my understanding for the theory part of this, if you feed the network with only full image bboxes it learns that making the box as big as possible is results in less error rate so it optimizes that way, this is kind of wasting half of the algorithm but it works for me.
Also your images don't need to be 416x416 it resizes to that whatever size you give it, you can also change it from cfg file.
I have a code that makes full sized bboxes for all images in a directory if you want to try it fast.(It overrides existing annotations so be careful)
Finally boxes should be like this for them to be centered full size, x and y are center of the bbox it should be center/half of the image.
<object-class> 0.5 0.5 1 1
from imagepreprocessing.darknet_functions import create_training_data_yolo, auto_annotation_by_random_points
import os
main_dir = "datasets/my_dataset"
# auto annotating all images by their center points (x,y,w,h)
folders = sorted(os.listdir(main_dir))
for index, folder in enumerate(folders):
auto_annotation_by_random_points(os.path.join(main_dir, folder), index, annotation_points=((0.5,0.5), (0.5,0.5), (1.0,1.0), (1.0,1.0)))
# creating required files
create_training_data_yolo(main_dir)
```
I have chaned my idea now. I will not scale the picture. Just have it normal with a bounded box. By the way. Darknet Tiny need pictures in size 416x416 for training. I'm building a software that automaticly can collect training data.
oh I didn't know tiny yolo needs resizing.
Yes. Yolo needs square size.
By the way! Can you try this? github.com/DanielMartensson/Darknet-Data-Creator 1. Download the project. Start the project with the command
mvn javafx:run
. You need to stand inside the folder.mvn
is Maven command.sudo apt-get install maven
if you are using.deb
based inux systems.Well I tried but I don't have a usb camera so all I can say is ui works 🤷🏻♂️, are you going to run the training as terminal process.