In my previous blog post I showed how to detect a helmet with feature extraction. In this blog post, I will present a different view of this data.
The document arXiv 1801.04381 describes the MobileNet V2 and how the individual layers are structured. To select a suitable layer, I picked the values of different layers and then checked which one reacted best. Conv_1/BatchNorm/FusedBatchNorm turned out to be one of those layers. As you can see in the picture, it consists of 7 x 7 x 1028 values. These are 49 (7×7) layers with 1028 values each. But what do these values look like?
The values are in a range of minus 40 and plus 25. For my book, in which I also described Feature Extraction, I had the idea to display the values as points, low values as dark points and higher values as bright points.
On the above picture you can see the values, when I am without and with a helmet. My head is recognized with PoseNet and you can see the points of my eyes, nose and ears in the picture. The area around the head is cut out with a certain margin and used for the feature extraction recognition. In total there are 62720 values, so you don’t see any difference. SAP Leonardo e.g. uses 128 values for the face feature extraction and you would see the differences better.
For this reason, I have subtracted the values of the respective images from each other and also presented the difference as dots. On the picture on the left side I compare two pictures without helmet and with helmet. You can see a differences, but weak (maybe you should clean your monitor if there are too many dots). On the right side, the difference is greater when I compare a picture without a helmet and with a helmet. The dots are more visible.
To display the values, I pressed them in a relatively squared block. In reality, as already mentioned, these are 49 layers with 1028 values each. The following picture shows the results of the last comparison (without and with helmet) and the values are shown as 1028 values in 49 lines.
As you can see, there are a few lines with differences and lines without big differences. In a further step you could check if the lines are always the same and use only one line. This will reduce the values for the similarity calculation from 62720 to 1028 values.
I think it’s an interesting subject. But as a child I also found the noise on the TV interesting, because you could see ants 🐜🐜🐜.