What is Computer Vision?

What is Computer Vision?


Computer vision systems have long been able to compete with the human eye. This is what you should know about the topic. […]

Autonomous driving, virtual reality and augmented reality are just some of the areas of application for computer vision.

Computer Vision (dt.: computer-based vision) refers to systems that recognize objects in digital still and moving image material and process them accordingly. The field of computer vision has evolved significantly over the past twenty years: Today’s computer vision systems achieve an accuracy of 99 percent and now also run on mobile devices.

In order to abstract image processing through the visual cortex, researchers in computer vision rely in particular on artificial neural networks. The breakthrough came in 1998 with Yann LeCun’s LeNet-5 (a seven-step convolutional neural network that recognizes handwritten numbers in digitized images with a resolution of 32×32 pixels). This model has been purposefully expanded: today’s image classification systems recognize entire object catalogs in HD resolution and in color.

In addition to neural networks, experts in computer vision also rely on hybrid vision models that combine deep learning with classic machine learning algorithms.

Various public image databases are available to train computer vision models:

  • MNIST is one of the simplest and oldest databases and contains about 70,000 handwritten numbers in ten different classes. The MNIST data set can be easily converted into a model-even with a laptop without hardware acceleration.
  • COCO offers a large data set-for example, for object recognition and image segmentation. More than 330,000 images in 80 object categories are available.
  • ImageNet contains approximately 1.5 million images, including Labels and bounding box.
  • Open Images hosts the URLs for about nine million images – also including labels.
  • Google, Azure, and AWS each have their own computer vision models trained with large data sets. These can either be used directly or be trained with your own image data sets via transfer learning. This saves a lot of time compared to the basic rebuilding of a model.

Computer vision is not perfect, but the systems are accurate enough to be used in various industries.


Waymo – formerly Google’s flagship project in terms of Autonomous Driving, has trained its software, according to its own statement with data from seven million kilometres travelled. So far, at least one accident of a Waymo van is known, but the software should not have been the cause.

The Tesla models are also known to offer opportunities to move autonomously, relying on computer vision. After a fatal accident, the vehicle software has been adapted so that the driver’s hands must be on the steering wheel in any case.


Amazon relies on self-service and computer vision in its Go stores: the system detects when a customer takes products off the shelf or puts them back, identifies and bills the purchases via a smartphone app. If the Amazon Go software misses a product, it is free of charge for the customer – he receives a credit for incorrectly calculated products.


Computer vision is also regularly used in healthcare, for example when it comes to X-rays and other medical imaging systems.

Financial sector

In banking, computer vision is used, for example, to detect fraud or documents.


When it comes to Agriculture 4.0, computer vision comes into play – for example, when it comes to monitoring farmland.

Controversial applications

Computer vision is also used for controversial purposes. In particular, facial recognition techniques are (not only) highly valued in autocracies. Deepfakes and training bias are also frequently described problem areas.

Most deep learning frameworks offer comprehensive support for computer vision, for example the Python-based frameworks TensorFlow, PyTorch or MXNet.

  • The video analytics service Amazon Rekognition can recognize objects, people, texts and activities – including faces and custom labels.
  • The pre-trained analysis outstaffing service Google Cloud Vision API enables the detection of objects and faces, reads printed and written text and provides image catalogs with metadata. Custom Image Models can also be trained with Google AutoML Vision.
  • Microsoft’s Computer Vision API can also detect objects. The Face API is available in the cloud or as a container solution at the network edge and can recognize faces as well as emotions.
  • IBM Watson Visual Recognition classifies images based on a pre-trained model and also enables transfer learning, object detection and counting. The IBM solution runs in the cloud or through iOS devices with Core ML.
  • With Matlab, MathWorks also offers an analysis package that masters image recognition based on machine and deep learning.
  • The Apple Vision Framework recognizes faces, text or barcodes. For purposes of image classification or object recognition, own CoreML models can also be used.

Computer vision models have evolved rapidly since LeNet-5 – most of these models are artificial neural networks:

Computer vision is becoming more precise and reliable and can already compete with the human visual cortex in many cases. Due to the further development of frameworks and models as well as the possibility of transfer learning, you no longer need a doctorate to use computer vision.

This post is based on an article from our US sister publication Infoworld.

* Martin Heller is a freelance writer for the sister publication InfoWorld.

Ready to see us in action:

More To Explore

Enable registration in settings - general
Have any project in mind?

Contact us: