Computer Vision in Brief.

Mayuresh Madiwale
Nov 2, 2022
6 min read

What is Computer Vision?

Computer Vision (CV) is the field of Deep Learning which solely works with images. This is quite new and still evolving technology which is not only efficient, but also quite eye-catchy. It is a field of artificial intelligence (AI) that enables computers and systems to derive meaningful information from digital images, videos and other visual inputs — and take actions or make recommendations based on that information. If AI enables computers to think, computer vision enables them to see, observe and understand.

Computer vision works much the same as human vision, except humans have a head start. Human sight has the advantage of lifetimes of context to train how to tell objects apart, how far away they are, whether they are moving and whether there is something wrong in an image. It trains machines to perform these functions, but it has to do it in much less time with cameras, data and algorithms rather than like us who use retinas, optic nerves and a visual cortex. Because a system trained to inspect products or watch a production asset can analyze thousands of products or processes a minute, noticing imperceptible defects or issues, it can quickly surpass human capabilities.

What are the possibilities with CV?

There are several tasks that can be done with CV. The list of the same is as follows.

Image Classification
Localization
Object Detection
Object Identification
Instance Segmentation
Object Tracking

Let's see all of the above one by one.

Image Classification

Look at the image below and recognize the animal in it, Ready?

You must have already answered it as soon as you saw it right?. This is because you have seen a lot of dogs in real as well as in real life. But it is not the case when same thing is done with computers. Also they see the world differently than all of us. The can only understand numbers and process them. So the above image is just a bunch of numbers for them and by analyzing them they have to tell what animal it is. The whole Image Classification is all about telling apart different classes of images.

Localization

Now, we can tell apart that the image is of a dog or a cat. But, what if we want to see where the animal is in the image? Here, our model will not be able to tell where exactly is the Dog or Cat is in the given image. This is where Localization come in to picture. Localization can set a bounding box around the object of interest in the image.

Still, there's a problem. What if there are all the classes present in a single image? Then we can set the bounding box but we will not be able to tell what is what in the image. This is now case of Object Detection.

Object Detection.

Now we can tell apart all the classes, locate them in the image and tell what class is there in the image. As humans we can now easily identify the classes in the image but, the computer still doesn't know what a Dog or Cat is! for the computer, it is still a bunch of numbers which makes a pattern if it is a dog or cat from which it can tell apart. In order to make the computer identify what it is dealing with, we will have to take a look at Object Identification.

Object Identification

Object identification is slightly different from object detection, although similar techniques are often used to achieve them both. In this case, given a specific object, the goal is to find instances of said object in images. It is not about classifying an image, as we saw previously, but about determining if the object appears in an image or not, and if it does appear, specifying the location(s) where it appears. An example may be searching for images that contain the logo of a specific company. Another example is monitoring real time images from security cameras to identify a specific person’s face.

Instance Segmentation

Instance segmentation is slightly advanced version of localization. In localization, we could see a bounding box around the image. Here we can draw a mask on the object of interest. You have seen the application of segmentation on Google Maps Street View. If you look closely, you will not see any person's face in the maps; how is this done? The answer is there is a model somewhere which can identify the facial characteristics of humans on the maps and then use blurring to hide them.

Object Tracking

Let's see more complicated things that can be done with CV. As seen in the above gif, we can now locate the object even if the input is a video and not a stationary image. And also keep track of which object is in the every frame and where is it. This is easy for us humans to keep track of moving objects but again the same task's complexity is multiplied many folds if it has to be done by a computer.

What are the algorithms used for above tasks?

Image Classification

For Classification, basic CNN network can pull off the task easily and efficiently. This is a easy task and hence the complexity of model building is less. To see an example of a classic CNN model, you can refer the previous blog in which a classifier was built to classify between a "Forest Fire" or "No Fire" Here. Transfer learning can also be used to do the stuff easier with pre trained algorithms such as:

VGG
MobileNet
ResNet
DenseNet
EfficientNet

Localization, Object Detection and Object Identification

Here, all of the above tasks can be done by same algorithms just by tweaking what you want to be the result. These are basically CNN but with a few things changed to receive desired output. The common ones are:

Region based Convolutional Neural Network (RCNN)
Fast RCNN
Faster RCNN
You Only Look Once (YOLO) (v1, v2... latest v7)
Single Shot Detectors (SSD) (EfficientDet, etc)
CenterNet

Instance Segmentation

Segmentation aka Masking is a task to identify the object, find it's boundary and then fill the area with either a color or blur the area. This can be helpful in cases such as blurring out sensitive information such as person's face or license plate of a car. The most famous algorithm is Mask R-CNN whose output is bounding boxes along with masks.

Object Tracking

Object tracking is still fairly new concept. there are a few algorithms that do the stuff fairly well. Here are some of them:

Simple Online And Realtime Tracking (SORT)
DeepSORT
FairMOT
TransMOT
ByteTrack
YOLOR

Real Life Use Cases for CV

There are many real world problems that can be solved using CV. I have listed a few of them below.

Visual Search Engines

If you look at your android phone, on the home screen google widget, there is an icon of camera on it. this is called Google Lens. This is a Visual Search Engine developed by Google. Similar to such is Samsung Bixby Vision which also does the same thing. This is one of the most advanced use case of CV.

Amazon GO

Recently, Amazon has started their cashier-less shops in the US which completely depend on CV to do the business. The purchase is made when the customer picks up a certain product and puts it in the cart. Also the money is deducted from their Amazon Wallet and all of this happens even thou.gh there is no shopkeeper around

Tesla Autopilot

This one blows my mind. The sophistication which is involved in the Autopilot is just amazing. The car can sense it's surrounding with cameras and drives itself on the road without any human intervention. The thing that sets apart the Autopilot is it's ability to predict Accidents and take corrective actions too. The car literally avoids accidents that could hardly be avoided if a person was driving. This is all the magic of CV.

Microsoft InnerEye

In the healthcare sector, InnerEye by Microsoft is an incredibly valuable tool that assists radiologists, oncologists, and surgeons who work with radiology-based images. The primary goal of the tool is to accurately identify tumors among healthy anatomy in 3D images of cancerous growths.

In radiation therapy, for instance, InnerEye results make it possible to direct radiation specifically at the target tumor while preserving vital organs. These results also help radiologists better understand sequences of images and whether a disease is progressing, stable, or reacting favorably to treatment, while taking into account the evolution of a tumor’s size, among other things. In this way, medical imaging becomes a means of tracking and measuring.

Facebook Facial Recognition

Remember when you posted a picture of you with your friends and then Facebook showed suggestion of people to tag in that picture? This is the use case of CV in Facebook. Facial recognition can also be used in a more sophisticated way, such as to recognize emotions in facial expressions. This system is also used in Google Photos. Here, if you navigate to search button on the app, you will see that the images are clustered based on persons face. This also helps in making customized searches on Google Photos such as photos of you and your pet. The AI will only show you photos where you and your pet can be seen.

Final Words

The fact that the computer vision implementations of large companies are the most often discussed does not mean that you have to be Google or Amazon to benefit from this machine learning technology. Businesses of all sizes can leverage their data with computer vision techniques in order to become more efficient and effective at what they do, all while making better decisions.

. . .

Like , Share if you found this helpful.

I am open to any Suggestions/ Corrections/ Comments so please feel free.

Also , Connect with me on LinkedIn

Open to Entry Level Jobs as Data Scientist/Data Analyst. Please DM on LinkedIn for my Resume for any openings in near future 🤗 🙏

Computer Vision in Brief.

Recent Posts

1 Comment