Friday, May 2, 2025

 These are the steps in a typical cnn based vision processor for drone images. Let’s enumerate them:

1. Initialization: Drone Images are 512x512 resolution images. They are not labeled in pascal voc format. Before each image in drone video is processed, the model is initialized as a 7-layer CNN with activation and sigmoid. Activation functions introduce non-linearity to neural networks allowing them to learn complex patterns such as edges, textures and shapes by adjusting neuron outputs before passing them to the next layer. Sigmoid is a mathematical function that squashes the input values between 0 and 1 that makes it useful for probability-based tasks including drawing heat-maps discussed earlier. The specific one used with this model is one that combines sigmoid and binary cross-entropy loss into a single operation for numerical stability for binary classification tasks. Hyperparameters for the model such as learning rate, targets and masks are set to default values. Optimizers are essential to neural network for updating its weights during the training process and help in finding the optimal set of weights that minimize the loss functions. A loss function measures the difference between the predicted and actual values of the target variable. The optimizer used with this model is one that implements the Adam algorithm.

2. Each convolutional layer transforms using input and output channels. It involves an activations scheme of Rectified Linear Unit aka ReLU which takes a value only if its positive and 0 otherwise. During training, each layer has a default value for dropout as none, padding as same and batchnorm and transpose as turned off. Dropout prevents overfitting by randomly setting a fraction of neurons to zero. Padding are extra pixels around the borders of an image before a convolutional operation. Batch normalizations normalize activation around a mini batch of data. Transpose or Transposed convolution often called deconvolution or upsampling is used to increase spatial dimensions reversing the standard convolutional process.

Kernel and biases are also set for each layer. Kernel used is a 3x3 with an initializer that generates a truncated normal distribution on the input channels for transformation to output channels. Biases only affect the output channel with a constant initializer.

3. location: Pixel co-ordinates are transformed to world co-ordinates. The alignment data is stored in the bounds which helps to transform the data in the raw frame to the detections in the world coordinates. This involves perspective transformation using OpenCV’s method to find the homography matrix which describes the transformation between two sets of corresponding points in two different images.


No comments:

Post a Comment