Object Recognition Terminology
This is based on CS231n note and stackoverflow post.
1. Object Localization
It is used to locate an object from a given image, as shown in the second picture. It is called bounding box, consisting of the coordinates of the square-shaped space (x-axis and y-axis of its center, height, width) that fits the object compactly, such as the red square above.
2. Object Detection
It is almost the same as the object localization, but object detection can fetch each bounding box for multiple objects like the third picture above.
3. Semantic Segmentation
It does not distinguish instances of the same class. I mean, in the second figure below, the all chairs in the picture are labeled as blue - no difference between chairs (each chair = an instance).
The class is labeled per pixel because it focuses only on the pixel of the image and determines which class the pixel belongs to. Let’s just say that there is an image of 512*512, class 1 is a person, 2 is a motorcycle, and 3 is a car. Then the image is labeled like
000000000000
0000001111100
…
where 0 denotes background.
4. Instance Segmentation
Further to semantic segmentation, even the same class is expressed in different colors to distinguish the instance. In the fourth picture right above, each chair is distinguished by different colors.