Improving the quality of mine detection by a mine-detecting drone by selecting an architecture for classifying objects in visible-light camera images
Main Article Content
Abstract
The use of unmanned aerial vehicles (drones) is increasingly being used in humanitarian demining. Multiple sensors operating on different physical principles are used simultaneously to search for mines. This increases the probability of mine detection. To expedite the survey of the area, the initial processing of sensor signals is performed onboard the drone. Therefore, the requirement for sensor signal processing algorithms not only increases the probability of object detection but also improves the ratio of classification accuracy to the required computational effort. One of the sensors used is visible-light cameras. A large number of neural networks have been developed for classifying objects in images. However, a specific feature of their use for humanitarian demining is the lack of large datasets for training and testing. Therefore, the challenge arises of finding neural networks that offer a high ratio of classification accuracy to the required computational effort, while also being able to train on very small data sets. This paper compares convolutional neural networks and networks based on transformers. The networks were trained on a small dataset containing images of anti-tank mines, rocks, and various backgrounds. The study showed that convolutional neural networks train faster and are more resilient to image quality degradation. However, their potential is limited; increasing the number of layers does not significantly improve mine classification quality. Transformer-based neural networks offer greater flexibility and a wealth of options for architectural configuration, resulting in superior performance compared to convolutional networks. Although they are slower to train and potentially require an expanded training dataset, they are the preferred choice for implementing image processing on drones.

