It is common to talk about image and video based analysis in the same breath, implying that there is not much of a difference and both can be applied interchangeably. In reality though, this is quite far from the truth, especially in the context of machine vision applications for inspection and defect detection. And paradoxically, the clue lies in the motive itself for which the image or video is being captured. So, let us dive deeper into the subject to understand the subtleties better, and take an informed decision as to which would be suitable for the purpose; an image based or a video based system.
Image and Video – A Bit about Both
Before discussing the distinguishing features of Image and Video Based Inspection, let us take a glance at the similarities between the constituents of an image and a video. An image is a visual representation of the form of a person or object, such as painting or photograph. In a digital format, the image is composed of a finite number of elements, with each element having a specific value of intensity at that location. The smallest and the most widely used element of a digital image is a pixel. And the size of the pixel primarily depends on the resolution of the sensor on which the image is being captured. A set of these pixels are combined together and displayed on screen as an image, which finally becomes the fundamental subject of analysis for machine learning and AI applications.
Videos are essentially pictures in motion. They are a series of images, commonly known as frames, played in sequence at a specified frame rate. The frame is a single image in this sequence of pictures. In general, one second of a video consists of frames moving at a rate of 24 or 30 frames per second, also known as FPS. The frame is a combination of the image, and the time at which it was captured. This lends the spatiotemporal characteristic to a video, which needs to be taken into account during processing and analysis.
Comparative View of Image and Video Based Analysis
An image is discrete in nature – a one shot event, and its analysis involves measurement of the object’s shape, size, texture, color, etc., and the anomalies thereof, at that instance. Video captures images continuously over a period of time. During this period, changes could occur in the object or it’s environment and these are captured by the camera. These changes become the subject for video based analysis. Having understood the basics, let us now take a look at the discerning features of image and video based analysis.
Volume of Data
The most fundamental difference between Image and Video Based Inspection is the amount of data that each segment of an image or a video capture. A single pixel of an image contains 8 bits (1 byte) if it is in BW (black and white). For colored images it uses a certain color scheme called RGB (Red, Green, Blue) represented as 1 byte each or 24 bits (3 bytes) per pixel. In case of a video, the fps governs the amount of data being captured. Videos, comprising of higher volume of data, require higher processing and transmission capabilities. The bandwidth required for a 720x576p (PAL) resolution at a frame rate of 25 fps and 8 bit color video is 1.66 Mbps and this goes up to 1.99 Gbps for a 60 fps, 1920x1080p HD quality video.
Images and videos in uncompressed form deliver maximum quality, but with a very high data rate. Both can be compressed for economy in a lossy or a lossless format – images via their spatial coherence and video via their spatiotemporal coherence. Though the lossy format provides a better compression, the choice of compression entirely depends on the use case. To optimize images for storage and performance, they can be compressed in either JPEG, PNG, etc. (lossy) or in TIFF, BMP, etc.(lossless) format. The most commonly used compression standards for videos are MPEG-2, MPEG-4, and H.264; with JPEG2000, H.264 and H.265 offering lossless modes, as well.
The volume of data to be processed directly translates into compute power required. In image based machine vision applications like inspection and defect detection, comparison is made between the ground- truth and the image of the object under study. Any anomaly is reported as a defect, and further action is taken based on a preset criterion.
The data from a static video file or a live stream often needs to be compressed to a format that can be transmitted over limited bandwidth, and thereafter decompressed for analysis. The video is read frame-by-frame and features are extracted using AI/ML algorithms. These features are used to detect, locate and track an object (or objects), based on the changes in its state or behavior.
Because of these differences, while real-time image-based defect inferencing can often be done with good software optimizations on a standard CPU, this is rarely, if ever, possible for video. The latter usually needs dedicated hardware such as GPUs on the inferencing side.
Another major difference between the two processes is in the need for a triggering mechanism during acquisition. For example, the application of machine vision during inspection of objects moving on a conveyor belt necessitates that the camera acquires the image at the precise moment. This is achieved through a hardware or software driven “trigger input”. Essentially, the camera is in “trigger mode” to capture exactly one image when the trigger pulse arrives, and then waits for the next pulse.
A video based inspection system doesn’t need a trigger as it records the video of the object continuously. The instance at which analysis is to be carried out is based on predefined criteria like change in nature or behavior of the object, or the surroundings.
Also read: Machine Vision Trigger Mechanism
Image analysis based machine vision applications such as automatic counting of items in a production line, reading bar codes or OCRs, measuring dimensions, defect detection, bio-metrics, food grading, cracks in metal welds, etc., are frequently used in the industry.
Video based analysis is mostly used in larger spaces, for example, on the factory shop floor, where it provides inputs on head count, identification of personnel, monitoring of behaviors and gestures, activities being performed, wearing of appropriate clothing and equipment by workers on the job, movement of equipment, machinery, vehicles and so on. These are compared with the predefined set of features, and detected discrepancies, if any, are flagged accordingly.
Image and Video Based Inspection – The Decider
The decision lies in the ‘motive’. If an inspection requires continuous observation of a scene for decision making, one needs to record a video. Most applications of video analysis in industrial setups are – monitoring of manufacturing processes and surveillance of factory premises for adherence to rules, and procedures for safety and security. On the other hand, if the application requires inspection of a defect, a single still image can help achieve the ‘go’ – ‘no go’ decision rapidly enough for follow up actions. It would be computationally wasteful and inefficient to use video capture for discrete events like presence or absence of defects.
The scope for application of ML and AI techniques in image and video based inspection and defect detection is immense, but the choice is dependent on the use case and matching configurations. It needs to be appreciated that there is no ‘one size fits all’ solution.