Warm tip: This article is reproduced from serverfault.com, please click

Can YOLO pictures have a bounded box that covering the whole picture?

发布于 2020-11-26 10:40:02

I wonder why YOLO pictures need to have a bounding box. Assume that we are using Darknet. Each image need to have a corresponding .txt file with the same name as the image file. Inside the .txt file it need to be. It's the same for all YOLO frameworks that are using bounded boxes for labeling.

<object-class> <x> <y> <width> <height>

Where x, y, width, and height are relative to the image's width and height.

For exampel. If we goto this page and press YOLO Darknet TXT button and download the .zip file and then go to train folder. Then we can see a these files

IMG_0074_jpg.rf.64efe06bcd723dc66b0d071bfb47948a.jpg
IMG_0074_jpg.rf.64efe06bcd723dc66b0d071bfb47948a.txt

Where the .txt file looks like this

0 0.7055288461538461 0.6538461538461539 0.11658653846153846 0.4110576923076923 
1 0.5913461538461539 0.3545673076923077 0.17307692307692307 0.6538461538461539 

Every image has the size 416x416. This image looks like this:

enter image description here

My idéa is that every image should have one class. Only one class. And the image should taked with a camera like this.

enter image description here

This camera snap should been taked as:

  1. Take camera snap
  2. Cut the camera snap into desired size
  3. Upscale it to square 416x416

Like this:

enter image description here

And then every .txt file that correspons for every image should look like this:

<object-class> 0 0 1 1

Question

Is this possible for e.g Darknet or other framework that are using bounded boxes to labeling the classes?

Instead of let the software e.g Darknet upscale the bounded boxes to 416x416 for every class object, then I should do it and change the .txt file to x = 0, y = 0, width = 1, height = 1 for every image that only having one class object.

Is that possible for me to create a traing set in that way and train with it?

Questioner
Daniel Mårtensson
Viewed
11
can 2020-11-28 10:58:04

Little disclaimer I have to say that I am not an expert on this, I am part of a project and we are using darknet so I had some time experimenting.

So if I understand it right you want to train with cropped single class images with full image sized bounding boxes.

It is possible to do it and I am using something like that but it is most likely not what you want.

Let me tell you about the problems and unexpected behaviour this method creates.

When you train with images that has full image size bounding boxes yolo can not make proper detection because while training it also learns the backgrounds and empty spaces of your dataset. More specifically objects on your training dataset has to be in the same context as your real life usage. If you train it with dog images on the jungle it won't do a good job of predicting dogs in house.

If you are only going to use it with classification you can still train it like this it still classifies fine but images that you are going to predict also should be like your training dataset, so by looking at your example if you train images like this cropped dog picture your model won't be able to classify the dog on the first image.

For a better example, in my case detection wasn't required. I am working with food images and I only predict the meal on the plate, so I trained with full image sized bboxes since every food has one class. It perfectly classifies the food but the bboxes are always predicted as full image.

So my understanding for the theory part of this, if you feed the network with only full image bboxes it learns that making the box as big as possible is results in less error rate so it optimizes that way, this is kind of wasting half of the algorithm but it works for me.

Also your images don't need to be 416x416 it resizes to that whatever size you give it, you can also change it from cfg file.

I have a code that makes full sized bboxes for all images in a directory if you want to try it fast.(It overrides existing annotations so be careful)

Finally boxes should be like this for them to be centered full size, x and y are center of the bbox it should be center/half of the image.

<object-class> 0.5 0.5 1 1
from imagepreprocessing.darknet_functions import create_training_data_yolo, auto_annotation_by_random_points
import os

main_dir = "datasets/my_dataset"

# auto annotating all images by their center points (x,y,w,h)
folders = sorted(os.listdir(main_dir))
for index, folder in enumerate(folders):
    auto_annotation_by_random_points(os.path.join(main_dir, folder), index, annotation_points=((0.5,0.5), (0.5,0.5), (1.0,1.0), (1.0,1.0)))

# creating required files
create_training_data_yolo(main_dir)
```