Friday, May 2, 2025

 These are the steps in a typical cnn based vision processor for drone images. Let’s enumerate them:

1. Initialization: Drone Images are 512x512 resolution images. They are not labeled in pascal voc format. Before each image in drone video is processed, the model is initialized as a 7-layer CNN with activation and sigmoid. Activation functions introduce non-linearity to neural networks allowing them to learn complex patterns such as edges, textures and shapes by adjusting neuron outputs before passing them to the next layer. Sigmoid is a mathematical function that squashes the input values between 0 and 1 that makes it useful for probability-based tasks including drawing heat-maps discussed earlier. The specific one used with this model is one that combines sigmoid and binary cross-entropy loss into a single operation for numerical stability for binary classification tasks. Hyperparameters for the model such as learning rate, targets and masks are set to default values. Optimizers are essential to neural network for updating its weights during the training process and help in finding the optimal set of weights that minimize the loss functions. A loss function measures the difference between the predicted and actual values of the target variable. The optimizer used with this model is one that implements the Adam algorithm.

2. Each convolutional layer transforms using input and output channels. It involves an activations scheme of Rectified Linear Unit aka ReLU which takes a value only if its positive and 0 otherwise. During training, each layer has a default value for dropout as none, padding as same and batchnorm and transpose as turned off. Dropout prevents overfitting by randomly setting a fraction of neurons to zero. Padding are extra pixels around the borders of an image before a convolutional operation. Batch normalizations normalize activation around a mini batch of data. Transpose or Transposed convolution often called deconvolution or upsampling is used to increase spatial dimensions reversing the standard convolutional process.

Kernel and biases are also set for each layer. Kernel used is a 3x3 with an initializer that generates a truncated normal distribution on the input channels for transformation to output channels. Biases only affect the output channel with a constant initializer.

3. location: Pixel co-ordinates are transformed to world co-ordinates. The alignment data is stored in the bounds which helps to transform the data in the raw frame to the detections in the world coordinates. This involves perspective transformation using OpenCV’s method to find the homography matrix which describes the transformation between two sets of corresponding points in two different images.


Thursday, May 1, 2025

 This is a summary of the book titled “Give to Grow: Invest in relationships to build your business and grow your career” written by Mo Bunnell and published by Bard Press in 2024. The author is a performance and growth consultant, who offers a framework to develop relationships that boost productivity and growth. His book draws on his decades in business development and consulting. This is an easy-to-read and high impact book. Investing in client relationships to unlock growth and drive long-term success, increasing your value, rooting out false beliefs that were self-limiting, showing clients that you have a genuine desire to understand the problems and help them, demonstrating your expertise, ensuring your success by always taking action and “thinking in bets” and thus growing your clients, growing your team and growing your scale are some of the tenets of his framework.

To achieve full performance and growth potential, prioritize relationships as the foundation for long-term success. The Give to Grow framework guides individuals in two components of high performance: "Doing the Work" by delivering outcomes to clients and "Winning the Work" by developing relationship skills. Top performers distinguish themselves by their focus on long-term relationships, which generate growth and provide opportunities for growth. Top performers in complex roles deliver between eight and thirty times the value of average employees. The key difference between top performers and others is their focus on growth strategies and actions. Top performers prioritize client conversations, engage in extensive research, and embrace an ethos of continuous improvement. They translate annual goals into weekly priorities and take time to reflect on what worked and what didn't after each client meeting.

Adam Grant's book Give and Take identifies three types of people: "Takers" who seek the best outcome, "Matchers" who negotiate fair deals, and "Givers" who are perpetually generous. Successful people are Givers, who focus on their most important relationships and give without demanding anything in return. They maintain healthy boundaries to prevent burnout. To become a Strategic Giver, reach out to clients frequently, helping them even when they aren't in a position to buy from you, and consistently become the client's first call when a need manifests. Expand your idea of growth by enlarging your network and investing in relationships. To reach your highest growth potential, identify false beliefs about yourself that can limit your growth. Replace them with a growth mindset, "I can't" and "I don't know how," "I might do it wrong," "I'm too busy," and "I might look bad." Overcoming these fears helps you grow and become a more effective professional.

To effectively engage clients, it is essential to show genuine interest and genuine engagement. This can be achieved by setting a two-sided, enjoyable, and energizing conversation, keeping meetings productive, and offering different forms of support. Connect with clients by finding commonalities and reducing stress through humor and celebrating incremental progress. Focus on their engagement and aim to "fall in love" with their problem, ensuring they feel seen and heard. Before each meeting, reflect on questions to better understand the client's situation and listen attentively. Demonstrate your expertise by giving potential clients a taste of what working with you would be like, such as providing a technical analysis free of charge. This groundwork will position you as the best candidate for the job. When meeting with clients, always give them a recommendation regarding their next steps, allowing them to make better decisions while placing you as a guide and expert. It is best to appear "passionately agnostic" and give them space to choose their next steps.

As you grow in your career, remember that you can always improve your situation, even in difficult circumstances. Strengthen relationships and respond to setbacks with compassion and generosity. Identify three high-impact tasks every week and schedule time for them, aligning with your vision of long-term growth. "Think in bets" - investing time and energy in opportunities with the biggest payoff.

High performers experience three levels of business growth: growth in client list, growth in team, and growth at scale. In the first stage, make yourself indispensable by bringing in more business than it costs your organization to employ you. As you build success, build a team to support you, delegate more to free up time, and scale your success throughout your organization. As your business scales, view your impact holistically, focusing on helping others succeed.


Wednesday, April 30, 2025

An image processing pipeline can have any number of extensions or operators. It is not limited to the proprietary models or techniques. In fact, if there are locations that you already have captured images and have labeled the objects of interest, you can plug-n-play your model for processing the next round of images say from the UAV swarm flight which will prioritize your predictions in the test flight and route autonomously. This widens the strategy and purpose of developing applications that can leverage this pipeline for their specific use cases. Objects detected using the Bring-Your-Own-Device processor can still be registered to a world catalog.


As an example, some preprocessing of the drone images with a dataset is based on 512x512 resolution images of highways and annotated in the Pascal VoC format, could leverage the following transform


1. Filters using kernels. A kernel is any matrix A, that when multiplied by another matrix B, transforms B in a way that highlights a certain feature. Finding features in images can be helpful to classificatio


2. CNN: A Convolutional Neural Network that takes an image and produces a vector based on embeddings that it derived from its training. Most Landing.AI experiments with images leverage this technique. It applies different kernels across the image and constantly improves these kernels using gradient descent. MobileNet is an example model suitable for drone imageries. Another example is YOLOv3 and we sourced most of the runti


3. LSTM: also called Long Short-Term Memory Neural network uses previous predictions and occurrences as a basis for predicting current input. This helps with temporal information such as movemen


4. Augmentation: Certain shifts, jubilations and rotations to images as part of preprocessing before CNN would be covered in this operator and this can be a great way to normalize all the input images to a common standar


5. Gaussian Blurring: is a kernel that can be applied across the image to balance the pixel around its neighbors and thereby make transition smoother. A 5x5 pixel with a standard deviation of 2 could be an example blurring kerne


6. Edge detection: come very helpful to detecting road boundaries which in turn can help analyze a variety of drone imageries and yield useful information. Canny is one such edge detection algorithm, but you can bring your ow


7. Heat-map: a variety of probability functions can be used to create a probability map of the image in color coding or gray scale so that lighter are areas of importance and darker regions are less importan



 t.n.l.d.t.men.s:


Tuesday, April 29, 2025

 Multimodal image search

 The following code snippet describes how multimodal search can come useful to search images. The images are indexed and searched based on vector embeddings but the query is text based.

from dotenv import load_dotenv,dotenv_values

import json

import os

import requests

from tenacity import retry, stop_after_attempt, wait_fixed

from dotenv import load_dotenv

from azure.core.credentials import AzureKeyCredential

from azure.identity import DefaultAzureCredential

from azure.search.documents import SearchClient

from azure.search.documents.indexes import SearchIndexClient

from azure.search.documents.models import (

    RawVectorQuery,

)

from azure.search.documents.indexes.models import (

    ExhaustiveKnnParameters,

    ExhaustiveKnnVectorSearchAlgorithmConfiguration,

    HnswParameters,

    HnswVectorSearchAlgorithmConfiguration,

    SimpleField,

    SearchField,

    SearchFieldDataType,

    SearchIndex,

    VectorSearch,

    VectorSearchAlgorithmKind,

    VectorSearchProfile,

)

from IPython.display import Image, display

 load_dotenv()

service_endpoint = os.getenv("AZURE_SEARCH_SERVICE_ENDPOINT")

index_name = os.getenv("AZURE_SEARCH_INDEX_NAME")

api_version = os.getenv("AZURE_SEARCH_API_VERSION")

key = os.getenv("AZURE_SEARCH_ADMIN_KEY")

aiVisionApiKey = os.getenv("AZURE_AI_VISION_API_KEY")

aiVisionRegion = os.getenv("AZURE_AI_VISION_REGION")

aiVisionEndpoint = os.getenv("AZURE_AI_VISION_ENDPOINT")

credential = AzureKeyCredential(key)

search_client = SearchClient(endpoint=service_endpoint, index_name=index_name, credential=credential)

query_image_path = "images/PIC01.jpeg"

@retry(stop=stop_after_attempt(5), wait=wait_fixed(1))

def get_image_vector(image_path, key, region):

    headers = {

        'Ocp-Apim-Subscription-Key': key,

    }

    params = urllib.parse.urlencode({

        'model-version': '2023-04-15',

    })

    try:

        if image_path.startswith(('http://', 'https://')):

            headers['Content-Type'] = 'application/json'

            body = json.dumps({"url": image_path})

        else:

            headers['Content-Type'] = 'application/octet-stream'

            with open(image_path, "rb") as filehandler:

                image_data = filehandler.read()

                body = image_data

        conn = http.client.HTTPSConnection(f'{region}.api.cognitive.microsoft.com', timeout=3)

        conn.request("POST", "/computervision/retrieval:vectorizeImage?api-version=2023-04-01-preview&%s" % params, body, headers)

        response = conn.getresponse()

        data = json.load(response)

        conn.close()

        if response.status != 200:

            raise Exception(f"Error processing image {image_path}: {data.get('message', '')}")

        return data.get("vector")

    except (requests.exceptions.Timeout, http.client.HTTPException) as e:

        print(f"Timeout/Error for {image_path}. Retrying...")

        raise

vector_query = RawVectorQuery(vector=get_image_vector(query_image_path,

                                                      aiVisionApiKey,

                                                      aiVisionRegion),

                              k=3,

                              fields="image_vector")

def generate_embeddings(text, aiVisionEndpoint, aiVisionApiKey):

    url = f"{aiVisionEndpoint}/computervision/retrieval:vectorizeText"

    params = {

        "api-version": "2023-02-01-preview"

    }

    headers = {

        "Content-Type": "application/json",

        "Ocp-Apim-Subscription-Key": aiVisionApiKey

    }

    data = {

        "text": text

    }

    response = requests.post(url, params=params, headers=headers, json=data)

    if response.status_code == 200:

        embeddings = response.json()["vector"]

        return embeddings

    else:

        print(f"Error: {response.status_code} - {response.text}")

        return None

query = "farm"

vector_text = generate_embeddings(query, aiVisionEndpoint, aiVisionApiKey)

vector_query = RawVectorQuery(vector=vector_text,

                              k=3,

                              fields="image_vector")

# Perform vector search

results = search_client.search(

    search_text=query,

    vector_queries= [vector_query],

    select=["description"]

)

for result in results:

    print(f"{result['description']}")

    display(Image(DIR_PATH + "/images/" + result["description"]))

    print("\n")


Monday, April 28, 2025

 Image processing is made easy with platforms like landing.ai

As an example, the following is an application that counts cars in drone images. The dataset is based on 512x512 resolution images of highways and is annotated in the Pascal VoC format. The model is hosted and usable with a sample web-request as follows:

from PIL import Image

from landingai.predict import Predictor

# Enter your API Key

endpoint_id = "11cb6c44-3b6a-4b47-bac9-031826bc80ea"

api_key = "YOUR_API_KEY"

# Load your image

image = Image.open("image.png")

# Run inference

predictor = Predictor(endpoint_id, api_key=api_key)

predictions = predictor.predict(image)


And it can even be requested with agentic ai framework as follows:

import requests

url = "https://api.va.landing.ai/v1/tools/agentic-object-detection"

files = {

  "image": open("{{path_to_image}}", "rb")

}

data = {

  "prompts": "{{prompt}}",

  "model": "agentic"

}

headers = {

  "Authorization": "Basic {{your_api_key}}"

}

response = requests.post(url, files=files, data=data, headers=headers)

print(response.json())


For context on DFCS drone video sensing platform, please check the references.


Sunday, April 27, 2025

 Some more illustrations for drone imagery processing:

def stable_groups(keypoints, groups, threshold):

    for kp in keypoints:

        matched = false

        for group in groups:

            mean_feature = get_mean_feature(group)

            recent_pixel = get_recent_pixel(group)

            if kd.feature - mean_feature < threshold and abs(lucas_kanade_optical_flow(recent_pixel)-kp.pixel) < threshold:

               group.add(kp)

               matched = true

               break

        if matched == false:

            groups.add(create_group(kp))


def global_groups(stable_groups,global_groups, threshold):

    for stable_group in stable_groups:

        matched = false

        for global_group in global_groups:

            mean_feature = get_mean_feature(global_group)

            recent_pixel = get_recent_pixel(global_group)

            if get_mean_feature(stable_group) - mean_feature < threshold and delta_least_squares(stable_group,global_group):

               global_group.add(stable_group)

               matched = true

               break

        if matched == false:

            global_groups.add(create_global_group(stable_group))


def spherical_gps_to_position_n_orientation(gps, frame):

    return (d,x,d,y,h)


def camera_angle(keypoint, resolutionW, resolutionH, field_of_view):

       return arctan((x1 x (tan(field_of_view)/2) ) / (W/2)


def world_coordinates(keypoint, drone_frame):

    # solve these equations

    # 1.      (di.h - si.h).tan-theta-i-x = s.x - di.x,

    # 2.      (di.h - si.h).tan-theta-i-y = s.y - di.y

    # return (s.x,s.y, s.h)


Saturday, April 26, 2025

 This is illustration for sift feature extraction:

import cv2

sift = cv2.xfeatures2d.SIFT_create()

def compute_one(im):

 return sift.detectAndCompute(im, None)

def compute_sift(frames):

 print('get sift features')

 sift_features = [(None, None) for _ in frames]

 for frame_idx, im in enumerate(frames):

  if im is None or frame_idx % 3 != 0:

   continue

  print('... sift {}/{}'.format(frame_idx, len(frames)))

  keypoints, descs = compute_one(im)

  sift_features[frame_idx] = (keypoints, descs)

 return sift_features