Wednesday, July 31, 2024

 Problem 4

The relationship "friend" is often symmetric, meaning that if I am your friend, you are my friend. Implement a MapReduce algorithm to check whether this property holds. Generate a list of all non-symmetric friend relationships.


Map Input

Each input record is a 2 element list [personA, personB] where personA is a string representing the name of a person and personB is a string representing the name of one of personA's friends. Note that it may or may not be the case that the personA is a friend of personB.


Reduce Output

The output should be all pairs (friend, person) such that (person, friend) appears in the dataset but (friend, person) does not.


You can test your solution to this problem using friends.json:


Answer:

import MapReduce

import json

import sys

# Part 1

mr = MapReduce.MapReduce()

people = ["Myriel","Geborand", "Champtercier", "Count", "OldMan", "Valjean", "Napoleon", "MlleBaptistine", "MmeMagloire", "Labarre", "Marguerite", "MmeDeR", "Isabeau", "Fantine", "Cosette", "Simplice", "Woman1", "Judge", "Woman2", "Gillenormand", "MlleGillenormand", "Babet", "Montparnasse"]

persons = []

# Part 2

def mapper(record):

    for friend1 in people:

        for friend2 in people:

          if friend1 == friend2:

             continue

          if friend1 == record[0] and friend2 == record[1]:

             mr.emit_intermediate((friend1,friend2), 1)

          else:

             mr.emit_intermediate((friend1,friend2), 0)


# Part 3

def reducer(key, list_of_values):

    #print(repr((key, list_of_values)))

    if 1 in list_of_values:

         pass

    else:

        mr.emit(key)


# Part 4

inputdata = open(sys.argv[1])

mr.execute(inputdata, mapper, reducer)


Sample output:

["MlleBaptistine", "Myriel"]

["MlleBaptistine", "MmeMagloire"]

["MlleBaptistine", "Valjean"]

["Fantine", "Valjean"]

["Cosette", "Valjean"]               


#codingexercise: https://1drv.ms/w/s!Ashlm-Nw-wnWhPIxEJaEe9_uKGDHgg?e=mrXrYM

Tuesday, July 30, 2024

 Problem 4

The relationship "friend" is often symmetric, meaning that if I am your friend, you are my friend. Implement a MapReduce algorithm to check whether this property holds. Generate a list of all non-symmetric friend relationships.


Map Input

Each input record is a 2 element list [personA, personB] where personA is a string representing the name of a person and personB is a string representing the name of one of personA's friends. Note that it may or may not be the case that the personA is a friend of personB.


Reduce Output

The output should be all pairs (friend, person) such that (person, friend) appears in the dataset but (friend, person) does not.


You can test your solution to this problem using friends.json:


Answer:

import MapReduce

import json

import sys

# Part 1

mr = MapReduce.MapReduce()

people = ["Myriel","Geborand", "Champtercier", "Count", "OldMan", "Valjean", "Napoleon", "MlleBaptistine", "MmeMagloire", "Labarre", "Marguerite", "MmeDeR", "Isabeau", "Fantine", "Cosette", "Simplice", "Woman1", "Judge", "Woman2", "Gillenormand", "MlleGillenormand", "Babet", "Montparnasse"]

persons = []

# Part 2

def mapper(record):

    for friend1 in people:

        for friend2 in people:

          if friend1 == friend2:

             continue

          if friend1 == record[0] and friend2 == record[1]:

             mr.emit_intermediate((friend1,friend2), 1)

          else:

             mr.emit_intermediate((friend1,friend2), 0)


# Part 3

def reducer(key, list_of_values):

    #print(repr((key, list_of_values)))

    if 1 in list_of_values:

         pass

    else:

        mr.emit(key)


# Part 4

inputdata = open(sys.argv[1])

mr.execute(inputdata, mapper, reducer)


Sample output:

["MlleBaptistine", "Myriel"]

["MlleBaptistine", "MmeMagloire"]

["MlleBaptistine", "Valjean"]

["Fantine", "Valjean"]

["Cosette", "Valjean"]               




Monday, July 29, 2024

 Problem 1

Create an Inverted index. Given a set of documents, an inverted index is a dictionary where each word is associated with a list of the document identifiers in which that word appears.


Mapper Input

The input is a 2-element list: [document_id, text], where document_id is a string representing a document identifier and text is a string representing the text of the document. The document text may have words in upper or lower case and may contain punctuation. You should treat each token as if it was a valid word; that is, you can just use value.split() to tokenize the string.


Reducer Output

The output should be a (word, document ID list) tuple where word is a String and document ID list is a list of Strings.


You can test your solution to this problem using books.json:


1

     python inverted_index.py books.json

You can verify your solution against inverted_index.json.


Answer:

import MapReduce

import json

import sys

# Part 1

mr = MapReduce.MapReduce()


# Part 2

def mapper(record):

    for word in record[1].split():

        mr.emit_intermediate(word, record[0])


# Part 3

def reducer(key, list_of_values):

    mr.emit((key, list_of_values))


# Part 4

inputdata = open(sys.argv[1])

mr.execute(inputdata, mapper, reducer)


Sample Output:

["all", ["milton-paradise.txt", "blake-poems.txt", "melville-moby_dick.txt"]]

["Rossmore", ["edgeworth-parents.txt"]]

["Consumptive", ["melville-moby_dick.txt"]]

["forbidden", ["milton-paradise.txt"]]

["child", ["blake-poems.txt"]]


Sunday, July 28, 2024

 Problem statement:

Assume you have two matrices A and B in a sparse matrix format, where each record is of the form i, j, value. Design a MapReduce algorithm to compute the matrix multiplication A x B


Map Input

The input to the map function will be a row of a matrix represented as a list. Each list will be of the form [matrix, i, j, value] where matrix is a string and i, j, and value are integers.


The first item, matrix, is a string that identifies which matrix the record originates from. This field has two possible values: "a" indicates that the record is from matrix A and "b" indicates that the record is from matrix B.


Reduce Output

The output from the reduce function will also be a row of the result matrix represented as a tuple. Each tuple will be of the form (i, j, value) where each element is an integer.


Answer:

#!/usr/bin/python

import MapReduce

import json

import sys

# Part 1

mr = MapReduce.MapReduce()


# A has dimensions L,M

# B has dimensions M,N

L = 5

M = 5

N = 5

# Part 2

def mapper(record):

    print(f"record={record} \t + {record[0]} + \t + {record[1]} + \t + {record[2]} + \t + {record[3]}")

    matrix_index  = record[0]

    row           = record[1]

    col           = record[2]

    value         = record[3]

    if matrix_index == "a":

        for i in range(0, N):

            key = f"{row},{i}"

            mr.emit_intermediate(key, ("a", row, col, value))

    if matrix_index == "b":

        for j in range(0, L):

            key = f"{j},{col}"

            mr.emit_intermediate(key, ("b", row, col, value))


# Part 3

def reducer(key, list_of_values):

    # one reducer per output cell of destination matrix

    # print(f"{key},{list_of_values}")

    total = 0

    line = ""

    for k in range(0,M):

        left = getcolumn(list_of_values, k, "a")

        right = getrow(list_of_values, k, "b")

        total += left*right

        line += f"{left}*{right}={left*right} +"

    line += f"= {total}"

    print(line)

    mr.emit((int(key.split(',')[0]), int(key.split(',')[1]), total))


def getcolumn(values, k, matrix_type):

    result = 0

    for item in values:

        mtype = item[0]

        row = item[1]

        col = item[2]

        value = item[3]

        if mtype == matrix_type and col == k:

           result = value

           break

    return result


def getrow(values, k, matrix_type):

    result = 0

    for item in values:

        mtype = item[0]

        row = item[1]

        col = item[2]

        value = item[3]

        if matrix_type == mtype and row == k:

           result = value

           break

    return result


# Part 4

inputdata = open(sys.argv[1])

mr.execute(inputdata, mapper, reducer)


Output:

python3 multiply1.py matrix.json

record=['a', 0, 0, 63]   + a +   + 0 +   + 0 +   + 63

record=['a', 0, 1, 45]   + a +   + 0 +   + 1 +   + 45

record=['a', 0, 2, 93]   + a +   + 0 +   + 2 +   + 93

record=['a', 0, 3, 32]   + a +   + 0 +   + 3 +   + 32

record=['a', 0, 4, 49]   + a +   + 0 +   + 4 +   + 49

record=['a', 1, 0, 33]   + a +   + 1 +   + 0 +   + 33

record=['a', 1, 3, 26]   + a +   + 1 +   + 3 +   + 26

record=['a', 1, 4, 95]   + a +   + 1 +   + 4 +   + 95

record=['a', 2, 0, 25]   + a +   + 2 +   + 0 +   + 25

record=['a', 2, 1, 11]   + a +   + 2 +   + 1 +   + 11

record=['a', 2, 3, 60]   + a +   + 2 +   + 3 +   + 60

record=['a', 2, 4, 89]   + a +   + 2 +   + 4 +   + 89

record=['a', 3, 0, 24]   + a +   + 3 +   + 0 +   + 24

record=['a', 3, 1, 79]   + a +   + 3 +   + 1 +   + 79

record=['a', 3, 2, 24]   + a +   + 3 +   + 2 +   + 24

record=['a', 3, 3, 47]   + a +   + 3 +   + 3 +   + 47

record=['a', 3, 4, 18]   + a +   + 3 +   + 4 +   + 18

record=['a', 4, 0, 7]    + a +   + 4 +   + 0 +   + 7

record=['a', 4, 1, 98]   + a +   + 4 +   + 1 +   + 98

record=['a', 4, 2, 96]   + a +   + 4 +   + 2 +   + 96

record=['a', 4, 3, 27]   + a +   + 4 +   + 3 +   + 27

record=['b', 0, 0, 63]   + b +   + 0 +   + 0 +   + 63

record=['b', 0, 1, 18]   + b +   + 0 +   + 1 +   + 18

record=['b', 0, 2, 89]   + b +   + 0 +   + 2 +   + 89

record=['b', 0, 3, 28]   + b +   + 0 +   + 3 +   + 28

record=['b', 0, 4, 39]   + b +   + 0 +   + 4 +   + 39

record=['b', 1, 0, 59]   + b +   + 1 +   + 0 +   + 59

record=['b', 1, 1, 76]   + b +   + 1 +   + 1 +   + 76

record=['b', 1, 2, 34]   + b +   + 1 +   + 2 +   + 34

record=['b', 1, 3, 12]   + b +   + 1 +   + 3 +   + 12

record=['b', 1, 4, 6]    + b +   + 1 +   + 4 +   + 6

record=['b', 2, 0, 30]   + b +   + 2 +   + 0 +   + 30

record=['b', 2, 1, 52]   + b +   + 2 +   + 1 +   + 52

record=['b', 2, 2, 49]   + b +   + 2 +   + 2 +   + 49

record=['b', 2, 3, 3]    + b +   + 2 +   + 3 +   + 3

record=['b', 2, 4, 95]   + b +   + 2 +   + 4 +   + 95

record=['b', 3, 0, 77]   + b +   + 3 +   + 0 +   + 77

record=['b', 3, 1, 75]   + b +   + 3 +   + 1 +   + 75

record=['b', 3, 2, 85]   + b +   + 3 +   + 2 +   + 85

record=['b', 4, 1, 46]   + b +   + 4 +   + 1 +   + 46

record=['b', 4, 2, 33]   + b +   + 4 +   + 2 +   + 33

record=['b', 4, 3, 69]   + b +   + 4 +   + 3 +   + 69

record=['b', 4, 4, 88]   + b +   + 4 +   + 4 +   + 88

63*63=3969 +45*59=2655 +93*30=2790 +32*77=2464 +49*0=0 += 11878

63*18=1134 +45*76=3420 +93*52=4836 +32*75=2400 +49*46=2254 += 14044

63*89=5607 +45*34=1530 +93*49=4557 +32*85=2720 +49*33=1617 += 16031

63*28=1764 +45*12=540 +93*3=279 +32*0=0 +49*69=3381 += 5964

63*39=2457 +45*6=270 +93*95=8835 +32*0=0 +49*88=4312 += 15874

33*63=2079 +0*59=0 +0*30=0 +26*77=2002 +95*0=0 += 4081

33*18=594 +0*76=0 +0*52=0 +26*75=1950 +95*46=4370 += 6914

33*89=2937 +0*34=0 +0*49=0 +26*85=2210 +95*33=3135 += 8282

33*28=924 +0*12=0 +0*3=0 +26*0=0 +95*69=6555 += 7479

33*39=1287 +0*6=0 +0*95=0 +26*0=0 +95*88=8360 += 9647

25*63=1575 +11*59=649 +0*30=0 +60*77=4620 +89*0=0 += 6844

25*18=450 +11*76=836 +0*52=0 +60*75=4500 +89*46=4094 += 9880

25*89=2225 +11*34=374 +0*49=0 +60*85=5100 +89*33=2937 += 10636

25*28=700 +11*12=132 +0*3=0 +60*0=0 +89*69=6141 += 6973

25*39=975 +11*6=66 +0*95=0 +60*0=0 +89*88=7832 += 8873

24*63=1512 +79*59=4661 +24*30=720 +47*77=3619 +18*0=0 += 10512

24*18=432 +79*76=6004 +24*52=1248 +47*75=3525 +18*46=828 += 12037

24*89=2136 +79*34=2686 +24*49=1176 +47*85=3995 +18*33=594 += 10587

24*28=672 +79*12=948 +24*3=72 +47*0=0 +18*69=1242 += 2934

24*39=936 +79*6=474 +24*95=2280 +47*0=0 +18*88=1584 += 5274

7*63=441 +98*59=5782 +96*30=2880 +27*77=2079 +0*0=0 += 11182

7*18=126 +98*76=7448 +96*52=4992 +27*75=2025 +0*46=0 += 14591

7*89=623 +98*34=3332 +96*49=4704 +27*85=2295 +0*33=0 += 10954

7*28=196 +98*12=1176 +96*3=288 +27*0=0 +0*69=0 += 1660

7*39=273 +98*6=588 +96*95=9120 +27*0=0 +0*88=0 += 9981

[0, 0, 11878]

[0, 1, 14044]

[0, 2, 16031]

[0, 3, 5964]

[0, 4, 15874]

[1, 0, 4081]

[1, 1, 6914]

[1, 2, 8282]

[1, 3, 7479]

[1, 4, 9647]

[2, 0, 6844]

[2, 1, 9880]

[2, 2, 10636]

[2, 3, 6973]

[2, 4, 8873]

[3, 0, 10512]

[3, 1, 12037]

[3, 2, 10587]

[3, 3, 2934]

[3, 4, 5274]

[4, 0, 11182]

[4, 1, 14591]

[4, 2, 10954]

[4, 3, 1660]

[4, 4, 9981]


#codingexercise: https://1drv.ms/w/s!Ashlm-Nw-wnWhM0bmlY_ggTBTNTYxQ?e=s7hP7W

Saturday, July 27, 2024

 Given that tweets have location, find the happiest state:

Answer: happiest_state.py:

import sys


def hw():

    afinnfile = open("AFINN-111.txt")

    scores = {} # initialize an empty dictionary

    for line in afinnfile:

      term, score  = line.split("\t")  # The file is tab-delimited. "\t" means "tab character"

      scores[term] = int(score)  # Convert the score to an integer.

    print scores.items()


    import json

    outputfile = open("output.txt")

    tweets = []

    for line in outputfile:

      tweets += [json.loads(line)]


    nonsentiment_scores = []

    for item in tweets:

        if item.text:

           sentence = trim(item.text)

           words = sentence.split()

           score = 0

           for i, word in enumerate(words, start=1):

               term = tolower(trim(word))

               if term not in scores:

                  if i-1 > 0 && is_present(scores, words[i-1]):

                     score += get_score(scores, words[i-1]) > 0 ? 1 : -1

                  if i+1 < len(words) && is_present(scores,words[i+1]):

                     score += get_score(scores, words[i-1]) > 0 ? 1 : -1

                  score = score/3

           nonsentiment_scores.append(tolower(trim(word)), score)


   for item in nonsentiment_scores:

       print(item)


def is_present(scores, word):

     term = tolower(trim(word))

     return term in scores

def get_score(scores, word):

    score = 0

    term = tolower(trim(word))

    if term in scores:

       if scores[term] > 0:

          score += 1

       else if scores[term] < 0:

          score -= 1

       else:

          score += 0

    return score


def lines(fp):

    print str(len(fp.readlines()))


states = {

        'AK': 'Alaska',

        'AL': 'Alabama',

        'AR': 'Arkansas',

        'AS': 'American Samoa',

        'AZ': 'Arizona',

        'CA': 'California',

        'CO': 'Colorado',

        'CT': 'Connecticut',

        'DC': 'District of Columbia',

        'DE': 'Delaware',

        'FL': 'Florida',

        'GA': 'Georgia',

        'GU': 'Guam',

        'HI': 'Hawaii',

        'IA': 'Iowa',

        'ID': 'Idaho',

        'IL': 'Illinois',

        'IN': 'Indiana',

        'KS': 'Kansas',

        'KY': 'Kentucky',

        'LA': 'Louisiana',

        'MA': 'Massachusetts',

        'MD': 'Maryland',

        'ME': 'Maine',

        'MI': 'Michigan',

        'MN': 'Minnesota',

        'MO': 'Missouri',

        'MP': 'Northern Mariana Islands',

        'MS': 'Mississippi',

        'MT': 'Montana',

        'NA': 'National',

        'NC': 'North Carolina',

        'ND': 'North Dakota',

        'NE': 'Nebraska',

        'NH': 'New Hampshire',

        'NJ': 'New Jersey',

        'NM': 'New Mexico',

        'NV': 'Nevada',

        'NY': 'New York',

        'OH': 'Ohio',

        'OK': 'Oklahoma',

        'OR': 'Oregon',

        'PA': 'Pennsylvania',

        'PR': 'Puerto Rico',

        'RI': 'Rhode Island',

        'SC': 'South Carolina',

        'SD': 'South Dakota',

        'TN': 'Tennessee',

        'TX': 'Texas',

        'UT': 'Utah',

        'VA': 'Virginia',

        'VI': 'Virgin Islands',

        'VT': 'Vermont',

        'WA': 'Washington',

        'WI': 'Wisconsin',

        'WV': 'West Virginia',

        'WY': 'Wyoming'

}


def main():

    sent_file = open(sys.argv[1])

    tweet_file = open(sys.argv[2])

    hw()

    lines(sent_file)

    lines(tweet_file)


if __name__ == '__main__':

    main()


Friday, July 26, 2024

 Tweet sentiment analyzer:

import sys


def hw():

    afinnfile = open("AFINN-111.txt")

    scores = {} # initialize an empty dictionary

    for line in afinnfile:

      term, score  = line.split("\t")  # The file is tab-delimited. "\t" means "tab character"

      scores[term] = int(score)  # Convert the score to an integer.

    print scores.items()


    import json

    outputfile = open("output.txt")

    tweets = []

    for line in outputfile:

      tweets += [json.loads(line)]


    for item in tweets:

        if item.text:

           sentence = trim(item.text)

           words = sentence.split()

           score = 0

           for word in words:

               term = tolower(trim(word))

               if term in scores:

                   if scores[term] > 0:

                      score += 1

                   else if scores[term] < 0:

                      score -= 1

                   else:

                      score += 0

           if len(words) > 0:

              score = score/len(words)

           print(score)

        else:

           print(0)


def lines(fp):

    print str(len(fp.readlines()))


def main():

    sent_file = open(sys.argv[1])

    tweet_file = open(sys.argv[2])

    hw()

    lines(sent_file)

    lines(tweet_file)


if __name__ == '__main__':

    main()


Thursday, July 25, 2024

 This is a continuation of previous articles on Azure resources, their IaC deployments and trends in data infrastructure. The previous article touched upon data platforms and how they go out of their way to recommend that data must not be given to vendors and not even to the platform and that it is proprietary. This section continues that line of discussion to elaborate on understanding data.

The role of data in modern business operations is changing, with organizations facing the challenge of harnessing their potential and safeguarding it with utmost care. Data governance is crucial for businesses to ensure the protection, governance, and effective management of their data assets. Compliance frameworks like the EU's AI Act highlight the importance of maintaining high-quality data for successful AI integration and utilization.

The complex web of data governance presents multifaceted challenges, especially in the realm of data silos and disparate governance mechanisms. Tracking data provenance, ensuring data visibility, and implementing robust protection schemes are crucial for mitigating cybersecurity risks and ensuring data integrity across various platforms and applications.

The evolution of artificial intelligence (AI) introduces new dimensions to data management practices, as organizations explore the transformative potential of AI and machine learning technologies. Leveraging AI for tasks like backup recovery, compliance, and data protection plans offers unprecedented opportunities for enhancing operational efficiencies and driving innovation within businesses.

The future of data management lies at the intersection of compliance, resilience, security, backup, recovery, and AI integration. By embracing these foundational pillars, businesses can navigate the intricate landscape of data governance with agility and foresight, paving the way for sustainable data-driven strategies and robust cybersecurity protocols.

Prioritizing data management practices that align with compliance standards and cybersecurity best practices is key. By embracing the transformative potential of AI while maintaining a steadfast commitment to data protection, businesses can navigate the complexities of the digital landscape with confidence and resilience.

References:

Previous article explaining a catalog: IaCResolutionsPart148.docx

https://docs.databricks.com/en/data-governance/unity-catalog/enable-workspaces.html#enable-workspace 

https://docs.databricks.com/en/data-governance/unity-catalog/create-metastore.html


#codingexercise: https://1drv.ms/w/s!Ashlm-Nw-wnWhPIMgfH3QDAPfwCW6Q?e=dM89NH


Wednesday, July 24, 2024

 The shift from dbms to catalogs is already underway. Earlier, the databases were the veritable access grantors but with heterogenous data stores, this has shifted to catalogs like the Unity Catalog for databricks and the Horizon catalog for Snowflake. This is a deliberate attempt from the perspective of these platforms even though they fight for their ecosystems. The end-users and the organizations that empower them are rapidly making this shift themselves. 


For example, the Databricks Unity Catalog offers centralized access control, auditing, lineage, and data discovery capabilities across multiple Databricks workspaces. It includes user management, metastore, clusters, SQL warehouses, and a standards-compliant security model based on ANSI SQL. The catalog also includes built-in auditing and lineage, allowing for user-level audit logs and data discovery. The metadata store is a top-level container, while the data catalog has a three-level namespace namely catalog.schema.table. The catalog explorer allows for creation of tables and views, while the tables of views and volumes provide governance for nontabular data. The catalog is multi-cloud friendly, allowing for federation across multiple cloud vendors and unified access. The idea here is that you can define once and secure anywhere. 


Databricks Unity Catalog consists of a metastore and a catalog. The metastore is the top-level logical container for metadata, storing data assets like tables or models and defining the namespace hierarchy. It handles access control policies and auditing. The catalog is the first-level organizational unit within the metastore, grouping related data assets and providing access controls. However, only one metastore per deployment is used. Each Databricks region requires its own Unity Catalog metastore.  


There is a Unity catalog quick start notebook in Python. The key steps include creating a workspace with the Unity Catalog meta store, creating a catalog, creating a managed schema, managing a table, and using the Unity catalog in the Pandas API on Spark. The code starts with creating a catalog, selecting show, and then creating a managed schema. The next step involves creating and managing schemas, extending them, and granting permissions. The table is managed using the schema created earlier, and the table is shown and all available tables are shown. The final step involves using the Pandas API on Spark, which can be found in the official documentation for Databricks. This quick start is a great way to get a feel for the process and to toggle back and forth with the key steps inside the code. 


The Unity Catalog system employs object security best practices, including access control lists (ACLs) for granting or restricting access to specific users and groups on securable objects. ACLs provide fine-grain control, ensuring access to sensitive data and objects. Less privilege is used, limiting access to the minimum required, avoiding broad groups like All Users unless necessary. Access is revoked once the purpose is served, and policies are reviewed regularly for relevance. This technique enhances data security and compliance, prevents unnecessary broad access, and controls a blast radius in case of security breaches. 


The Databricks Unity Catalog system offers best practices for catalogs. First, create a separate catalog for loose coupling, managing access and compliance at the catalog level. Align catalog boundaries with business domains or applications, such as marketing analytics or HR. Customize security policies and governance within the catalog to drill down into specific domains. Create access control groups and roles specific to a catalog, fine-tune read-write privileges, and customize settings like resource quotas and scrum rules. These fine-grain policies provide the best of security and functionality in catalogs. 


To ensure security and manage external connections, limit visibility by granting access only to specific users, groups, and roles, and setting lease privileges. Limit access to only necessary users and groups using granular access control lists or ACLs. Be aware of team activities and avoid giving them unnecessary access to external resources. Tag connections effectively for discovery using source categories or data classifications, and discover connections by use case for organizational visibility. This approach enhances security, prevents unintended data access, and simplifies external connection discovery and management. 


Databricks Unity Catalog Business Unit Best Practices emphasize the importance of providing dedicated sandboxes for each business unit, allowing independent development environments, and preventing interference between different workflows. Centralizing shareable data into production catalogs ensures consistency and reduces the need for duplicate data. Discoverability is crucial, with meaningful naming conventions and metadata best practices. Federated queries via Lakehouse architecture unify data access across silos, governing securely via contracts and permissions. This approach supports autonomy for units, increases productivity through reuse, and maintains consistency with collaborative governance. This approach supports autonomy, increases productivity, and maintains consistency. 


In conclusion, the Unity catalog standard allows centralized data governance and best practices for catalogs, connections, and business units. 


https://docs.databricks.com/en/data-governance/unity-catalog/enable-workspaces.html#enable-workspace 


https://docs.databricks.com/en/data-governance/unity-catalog/create-metastore.html


Tuesday, July 23, 2024

 This is a summary of the book titled “Active Listening Techniques – 30 Practical tools to hone your Communication Skills” written by Nixaly Leonardo and published by Callisto in 2020. The author offers insights into active listening building off a decade of social work. She covers listening skills such as mindfulness, empathy, non-verbal cues, and effective questioning techniques – all of which lead to a deeper understanding of others. Her five-point agenda includes empathizing with others before interacting, being aware of the tensions so as to respond not react, acknowledging one’s negative emotions, involving loved ones in the journey, writing journal entries about our reactions and being aware of this emotional state. This will help us adjust our communications and persuading others by acknowledging their needs, projecting confidence, and choosing the right words, dealing with stressful situations by validating other people’s emotions, easing tension, and refocusing the conversations.

Active listening is a crucial communication skill that involves paying attention, understanding people's emotions, and giving time for others to talk. It is applied in various situations, including work, personal relationships, and therapy. Active listening helps individuals feel supported and heard, and it demonstrates respect for others. To improve communication skills, seven fundamentals can be applied: paraphrasing, using nonverbal language, emotional labeling, silence, redirection, mirroring, and validating.

Paraphrasing involves restating what someone says to ensure understanding, while nonverbal cues like eye contact, gestures, posture, and facial expressions help convey the message. Emotional labeling involves noticing and repeating what others feel, while silence allows for time to think and express thoughts without interruption. Redirecting the conversation back to the original topic helps maintain direction and reduce tension. Mirroring involves faking the speaker's body language and tone of voice to create a sense of connection and rapport. Validating others' emotions allows them to experience their emotions and hold their beliefs, making them feel understood and supported.

Active listening involves being present and mindful during conversations, ignoring distractions and staying open-minded. It helps us accept that we all experience negative emotions and stress and understand how our experiences shape our perceptions and interpretations of others' messages. To challenge and move through assumptions, empathize with others, be aware of tension, apologize when you react negatively, involve loved ones, and write journal entries about your reactions.

Be aware of your emotional state during conversations, as strong emotions can interfere with attentive listening. Adjust your communication to ensure others hear and understand you, considering other people's communication styles and preferences. Navigate situations tactfully by asking questions instead of directly challenging your supervisor's idea, describing or praise their vision, and seeking details to address your concerns without undermining their creativity or judgment.

Know your audience wisely, choosing when and where to raise critical issues and choosing the appropriate mode of communication. Electronic communication such as texting and email can be more effective than face-to-face conversations. By following these steps, you can become a better active listener and maintain a productive dialogue.

Persuasion involves acknowledging others' needs, projecting confidence, and choosing the right words. It is a matter of giving and taking and understanding why someone might not agree with your viewpoint is crucial. Acknowledging their needs helps build respect and build a stronger bond. Using precise language is essential in handling sensitive situations, avoiding hurting others and conveying your intended message. Confidence is key, so pretending to be confident can help.

To deal with stressful situations, validate others' emotions, easing tension, and refocusing the conversation. Addressing emotional concerns fosters stronger connections and genuine conversations. Calming others can ease tensions by recognizing escalating situations, lowering your tone, seeking clarification, taking responsibility for your contribution, and addressing the speaker's concerns. If tensions continue to rise, repeat the steps, or suggest a break. Set boundaries and communicate potential consequences if the conversation escalates. 

When a conversation goes awry, refocus on the original subject to avoid defensiveness and avoid resolving the issue. Address communication challenges by rephrasing statements, acknowledging shifts, asking for thoughts, and validating the listener's feelings. This ensures both parties hear and understand each other, preventing a recurrence of arguments. By following these steps, you can ensure effective communication.


Summarizing Software: SummarizerCodeSnippets.docx


Monday, July 22, 2024

The well-known Knuth-Morris-Pratt algorithm.

This algorithm can be explained in terms of the sequence matching between input and patterns this way:


void KMP(string pattern, string text, vector<int> *positions) {

    int patternLength = pattern.length();

    int textLength = text.length();

    int* next =  PreProcess(pattern);

 if (next == 0) return;

    int i = 0;

    int j = 0;

    while ( j < textLength )

 {

  while(true)

   if (text[j] == pattern[i]) //matches

   {

    i++;   // yes, move on to the next state

    if (i == patternLength)  // maybe that was the last state

    {

     // found a match;

     positions->push_back(j-(i-1));

     i = next[i];

    }

    break;

   }

   else if (i == 0) break; // no match in state j = 0, give up

   else i = next[i];

  j++;

 }

}


int* PreProcess( string pattern) {

 int patternLength = pattern.length();

 if (patternLength == 0) return 0;

    int * next = new int[patternLength + 1];

    if (next == 0) return 0;

    next[0] = -1;  // set up for loop below; unused by KMP


    int i = 0;

    int j = -1;

    // next[0] = -1;

    // int len = pattern.length();

    while (i < patternLength) {

  next[i + 1] = next[i] + 1;

  while ( next[i+1] > 0 &&

    pattern[i] != pattern[next[i + 1] - 1])

   next[i + 1] = next[next[i + 1] - 1] + 1;

  i++;

 }

    return next;

}

Usage: DroneDataAddition.docx

Sunday, July 21, 2024

Knuth-Morris-Pratt method of string matching

 Public void KMP-Matcher(String text, String pattern) { 


Int n = text.length(); 


Int m = pattern.length(); 


Int[] prefixes = ComputePrefixFunction(pattern); 


Int noOfCharMatched = 0; 


for ( int I = 1; I <= n; I++) { 


       While (noOfCharMatched > 0 && pattern[noOfCharMatched + 1] != Text[I]) 


    NoOfCharMatched = prefixes[nofOfCharMatched] 


       If (pattern[noOfCharMatched + 1] == text[I])  


   NoOfCharMatched = NoOfCharMatched + 1; 


       If (noOfCharMatched == m) { 


                 System.out.println(“Pattern occurs at “ + I); 


                 NoOfCharMatched = prefixes[NoOfCharMatched]; 


        } 




Public int[] ComputePrefixFunction(String pattern) { 


Int m = pattern.length(); 


Int[] prefixes  = new int[m+1]; 


Prefixes[1] = 0; 


Int k = 0; 


For (int q = 2; q <=m ; q++) { 


While (k > 0 && Pattern[k + 1] != Pattern[q]) 


       K = pattern[k]; 


If (pattern[k+1] == Pattern[q]) { 


      K = k + 1; 



Pattern[q] = k; 



Return prefixes; 


}


Saturday, July 20, 2024

 The steps to create a machine learning pipeline in Azure Machine Learning Workspace:

1. Create an Azure Machine Learning Workspace:

If you don't have one already, create an Azure Machine Learning workspace. This serves as the central hub for managing your machine learning resources.

2. Set Up Datastores:

Datastores allow you to access data needed in your pipeline. By default, each workspace has a default datastore connected to Azure Blob storage. You can register additional datastores if necessary [4].

3. Define Your Pipeline Steps:

Break down your ML task into manageable components (steps). Common steps include data preparation, model training, and evaluation.

Use the Azure Machine Learning SDK to create these steps. You can define them as PythonScriptStep or other relevant step types.

4. Configure Compute Targets:

Set up the compute targets where your pipeline steps will run. Options include Azure Machine Learning Compute, Azure Databricks, or other compute resources.

5. Orchestrate the Pipeline:

Use the Azure Machine Learning pipeline service to automatically manage dependencies between steps.

Specify the order in which steps should execute and how they interact.

6. Publish the Pipeline:

Once your pipeline is ready, publish it. This makes it accessible for later use or sharing with others.

7. Monitor and Track Performance:

Monitor your pipeline's performance in real-world scenarios.

Detect data drift and adjust your pipeline as needed.


This workspace provides an environment to create and manage the end-to-end life cycle of Machine Learning models. Unlike general purpose software, Azure machine learning has significantly different requirements such as the use of a wide variety of technologies, libraries and frameworks, separation of training and testing phases before deploying and use of a model and iterations for model tuning independent of the model creation and training etc.  Azure Machine Learning’s compatibility with open-source frameworks and platforms like PyTorch and TensorFlow makes it an effective all-in-one platform for integrating and handling data and models which tremendously relieves the onus on the business to develop new capabilities. Azure Machine Learning is designed for all skill levels, with advanced MLOps features and simple no-code model creation and deployment. 


Friday, July 19, 2024

 Sample program to count the number of different triplets (a, b, c) in which a occurs before b and b occurs before c from a given array.

Solution: Generate all combinations in positional lexicographical order for given array using getCombinations method described above. Select those with size 3. When selecting the elements, save only their indexes, so that we can determine they are progressive. 

 

class solution {

 

public static void getCombinations(List<Integer> elements, int N, List<List<Integer>> combinations) { 

      for (int i = 0; i < (1<<N); i++) { 

          List<Integer> combination = new ArrayList<>(); 

          for (int j = 0; j < elements.size(); j++) { 

              if ((i & (1 << j)) > 0) { 

                combination.add(j); 

              } 

          } 

          List<Integer> copy = new ArrayList<Integer>(combination); 

          combinations.add(copy); 

      } 

   }

 public static void main (String[] args) {

                  List<Integer> elements = Arrays.asList(1,2,3,4);

List<List<Integer>> indices = new ArrayList<Integer>();

        getCombinations(elements, elements.size(), indices);

        indices.stream().filter(x -> x.size() == 3)  

                      .filter(x -> x.get(0) < x.get(1)  && x.get(1) < x.get(2)) 

                       .forEach(x -> printList(elements, x)); 

       }

 

public static void printList(List<Integer> elements, List<Integer> indices) {

                 StringBuilder sb = new StringBuilder();

                 for (int i = 0; i < indices.size(); i++) {

                        sb.append(elements.get(indices.get(i)) + " "); 

                 }

                 System.out.println(sb.toString());

}

}

/* sample output:

1 2 3 

1 2 4 

1 3 4 

2 3 4

 */


Thursday, July 18, 2024

 

This is a summary of the book titled “The Canary Code:  A guide to Neurodiversity, Dignity and Intersectional Belonging at work”  written by Ludmila Praslova and published by Berrett-Koehler in 2024.  This book is about how to foster an inclusive workplace that celebrates neurodiversity and intersectional dignity and where everyone feels valued and respected. Neurodivergent people with conditions such as autism spectrum disorder, attention deficit disorder, dyslexia, or obsessive-compulsive disorder that impacts the way brain processes information, have suffered to keep up with the rest of the workforce because they don’t fit the status quo. The flexibility provided during Covid-19 came to their rescue and more steps can be taken to promote inclusivity such as hiring and onboarding them by understanding their needs, heeding their input on office space design and workflow flexibility. Leaders should create psychologically safe workplaces by listening, communicating clearly, and aiming for objective performance reviews. When these same employees become leaders, they could pass on the same benefits to others.

Inclusive workplaces promote diverse interaction, communication, and productivity styles by challenging neuronormative standards. Neurodiversity acknowledges the vast variations in human cognition, emotion, and perception, including conditions like ADHD, autism, and dyslexia. Myths about neurodiversity perpetuate exclusion in the workplace, as they lump people together while ignoring neurodivergent needs and preferences. To enable neurodivergent employees to do their best work, organizations must create flexible, inclusive environments that respect individual differences in social, cognitive, emotional, and sensory needs.

The "Canary Code" framework promotes inclusivity for neurodivergent employees. It emphasizes the importance of involving marginalized employees in decision-making, focusing on outcomes, ensuring flexibility, promoting organizational justice, and maintaining transparency, and using appropriate decision-making tools. By adopting these principles, organizations can create a more productive and inclusive workplace.

Companies like Deloitte, Infinite Flow, Legalite, Call Yachol, Ultranauts, and Dell have implemented these principles to ensure a diverse workforce. These practices have improved onboarding processes, engagement, and overall performance. Companies like Dell have also implemented neurodiversity programs, allowing candidates to showcase their abilities. Overall, these principles promote a more inclusive and productive workplace.

To make hiring and onboarding more inclusive, organizations should understand the needs of neurodivergent employees and conduct thorough analyses to ensure job descriptions accurately reflect the position's requirements. This includes separating essential qualifications from desirable ones, using plain language, and focusing on outcomes rather than methods. Onboarding should integrate new employees into the organization, offering a quality "preboarding" experience, providing clear information, and tailoring training methods.

Inclusive office spaces should accommodate a wide range of sensory, physical, and cognitive needs, with employees' input in the design process. Flexible work arrangements, such as flexible schedules, remote work options, and hybrid models, can enhance productivity for neurodivergent employees. Psychologically safe workspaces should be created by listening, communicating clearly, and aiming for objective performance reviews. A toxic work environment features non-inclusive, disrespectful, unethical, cutthroat, and abusive behaviors, which can negatively impact all employees' well-being and performance. By implementing these strategies, organizations can create a more inclusive and supportive work environment for all employees.

Neurodivergent leaders can create more inclusive workspaces by overcoming myths and stigmas that limit recognition and development of diverse leadership talents. By leveraging individual strengths, creating growth tracks, and fostering a culture that values diverse perspectives, companies can unlock innovation, improve morale, and build more resilient leadership teams. Neurodivergent leaders can overcome biases by embracing unique experiences, fostering empathy, and fostering inclusivity within their teams. By remaining authentic, trusting in their unique perspective, and focusing on transparent communication, neurodivergent leaders can inspire others and promote a culture of acceptance and understanding.

Wednesday, July 17, 2024

 Problem Statement: Given an integer array arr, in one move you can select a palindromic subsequence arr[i], ..., arr[j] where 0 <= i <= j < arr.length, and remove that subsequence from the given array. Note that after removing a subarray, the elements move to remove the gap between them.

 

Return the minimum number of moves needed to remove all numbers from the array.

 

Solution:

import java.util.*;

class Solution {

    public int minimumMoves(int[] arr) {

        int N = arr.length;

        int max = 1;

        int min = Integer.MAX_VALUE;

        List<Integer> A = new ArrayList<>();

        for (int i = 0; i < arr.length; i++) A.add(arr[i]);

        int count = 0;

        while(A.size() > 0) {

           boolean hasPalindrome = false; 

           List<Integer> elements = new ArrayList<>();

           for (int i = 0; i < (1<<N); i++) { 

               

               List<Integer> combination = new ArrayList<>(); 

                for (int j = 0; j < A.size(); j++) { 

                  if ((i & (1 << j)) > 0) { 

                    combination.add(j); 

                  } 

                } 

                if (isPalindrome(A, combination) && (combination.size() > max) && getCharactersToRemove(A, combination) < min) {

                      hasPalindrome = true;

                      max = combination.size();

                      min = getCharactersToRemove(A, combination);

                      elements = new ArrayList<>(combination);                

                      if (getCharactersToRemove(A, combination) == 0) { break;}

                } else {

                    // System.out.println("A: " + print(A) + " Elements: " + print(elements) + " Combination: " + print(combination) + "isPalindrome=" + String.valueOf(isPalindrome(A, combination)) + " getCharsToRemove=" + getCharactersToRemove(A, combination) + " min = " + min);

                }

           }            

           if (!hasPalindrome) {

               count += 1;

               A.remove(A.size() - 1);

           } else {

               count += getCharactersToRemove(A, elements) + 1;

               A = removeCharacters(A, elements);

               // System.out.println("Removing " + count + " characters at indices:" + print(elements) + " and remaining elements: " + print(A));

               // elements = new ArrayList<>();

               max = 1;

               min = Integer.MAX_VALUE;

           }

        }

        return count;

    }

    public boolean isPalindrome(List<Integer> A, List<Integer> combination) {

        int start = 0;

        int end = combination.size()-1;

        while (start <= end) {

            if (A.get(combination.get(start)) != A.get(combination.get(end))) {

                return false;

            }

            start++;

            end--;

        }

        return true;

    }

    public int getCharactersToRemove(List<Integer> A, List<Integer> combination){

        if (combination.size() < 2) return 0;

        List<Integer> clone = new ArrayList<>(A); 

        return removeCharacters(clone, combination).size();

    }

    public List<Integer> removeCharacters(List<Integer> A, List<Integer> combination) {

     int start = 0;

     int end = combinations.size()-1;

     int last = 0;

     while (start <= end) {

             for (int i = last; i< A.size(); i++) {

                    if (A.get(i) == combination.get(start)) {

                          A.set(I, Integer.MAX_VALUE);

                          last = i+1;

                          start++;

                    }

             }

     }

    List<Integer> result = new ArrayList<>();

    For (int I = 0; I < A.size(); i++) {

         if (A.get(i) != Integer.MAX_VALUE) {

               result.add(A.get(i));

          }

    }

    return result;

    }

    public List<Integer> removeCharacters(List<Integer> A, List<Integer> combination) {

        int start = combination.get(0);

        int end = combination.get(combination.size()-1);

        List<Integer> result = new ArrayList<>();

        if (start > 0){

            result.addAll(A.subList(0, start));

        }

        if (end < A.size() - 1) {

            result.addAll(A.subList(end + 1,A.size()));

        }

        return result;

    }

    public String print(List<Integer> elements){

        StringBuilder sb = new StringBuilder();

        for (int i = 0; i < elements.size(); i++) {

            sb.append(elements.get(i) + " ");

        }

        return sb.toString();

    }

}


Examples:

A = [-1,0,1]           => 3

A = [-1,0,-1]          => 1

A = [-1]                    => 1

A = [-1,0]                => 2

A = [0,0]                 => 1

A = [1,0,1,2,3]     => 3

A = [-2,-1,0,1,0]   => 3


Tuesday, July 16, 2024

 These are more use cases targeted for a commercial drone fleet management software that scales elastically and helps drones manage their flight path in real-time.

Case 8: Safety enforcement on failures across multiple drone units is a scenario that should not be the norm, but it is dedicated to the platform for its capability to operate different fleets. Take the specific example of 55 drone units failing out of 200 for a 4th of July show where the failed units landed in Angle Lake close to SeaTac airport and sank to the bottom, some with their lights on. It was a technical glitch where multiple airborne units failed with “no global positioning” and instead of falling from the sky, made controlled landings into the lake. If there were an override to the GPS failure, it could have resulted in runaways, injury or damage. Each of the drones cost about $2600 to the Great Lakes Drone Co based out of Coloma, Michigan before they engaged in safety policies, procedures and programming to make controlled landings.  External interference such as radio deterrence devices including radio frequency jammers or internal malfunctions such as system compromise were not ruled out. The point of this use case is that the responsibility for the flight of the drones lies not just with the controller but also with the platform relaying commands to the drones. Such a use case that spans all drone fleets and their formations and flight paths across tenants is a specific use case that must be tried out with drills and controlled environments. Some amount of Chaos Engineering practices apply to the handling of these drones via the portal.

Case 9: One of the benefits of a cloud native platform for drone fleet management is that this pseudo resource can make use of other user-friendly services such as OpenAI for chatbot like interaction with the software, Cognitive services for multi-media analysis from drone captures, and for voice translation into commands for the drone fleet. From mundane tasks of using a data lake to stash all sensor captures from the drone fleet organized by individual drones as folders and with data plane access separate from control plane access, to more advanced and sophisticated use cases of correlating and automation service requests against drone fleet inventory and dynamic control of fleet crew, the possibilities are endless. Even workflow management becomes easier with cloud resources and integration. Customized automations are facilitated by the cloud’s powerful methods of interactions namely api, sdk, command line and portal.

Previous use cases: DFCSUseCasesList.docx : https://1drv.ms/w/s!Ashlm-Nw-wnWhPB1Ov2NRhBtAQFyNQ?e=GhfAMw

References: https://1drv.ms/w/s!Ashlm-Nw-wnWhPA9saJLYQGA7q2Wiw?e=AONTxo for data architecture 

https://1drv.ms/w/s!Ashlm-Nw-wnWhO4OGADjCj0GVLyFTA?e=UGMEpB for software description.


Monday, July 15, 2024

 Drone Data Architecture:

One of the advantages of a cloud platform for real-time drone data capture and analysis is that the businesses who sign up for it do not have to reinvent it for themselves. In fact, the cloud data architecture, deployment and data driven applications can be stood up with full IaC and data seeding without even involving any drones. When this proof-of-concept has succeeded, scaled, performed and optimized for lowest cost, it is a veritable proprietary patentable asset and one that can serve the world for various use cases. As of today, just a little bit over 10% of enterprises, enact data strategy that works. Drone data a niche but any investments in its planning and architecture will win over mindshare that can only grow. Data is regarded as a byproduct of operations but it can become a driver of business value. With that comes many choices for data infrastructure and tools in a vast ecosystem. Cloud continues to dominate this space with ever increasing storage and computing power to crunch the data. With drone data, the options for being tied down with legacy or on-premises silos just do not exist, so there is an opportunity to start right. Some consider data architecture to be an oxymoron because databases, data warehouses and data lake do not eliminate one another and evolve their own architectures, so that is again an opportunity to start right for drone data without complexity and even come with full-service in the foreseeable increase in regulatory requirements on drone data. Data democratization will favor data literacy which in turn will foster data culture. Also, this builds a single source of truth with purview and that matters.

The previous articles regarding storage of Drone Data enumerated the following components or lines of data organization for workloads such as business analytics, data engineering, streaming and Machine Learning:

Traditional databases for inventory including progressive states and timestamps along with drone capabilities such as degrees of freedom which is inherently relational 

Vector database for training and inferences for both self-organizing maps and CNNs

Performance databases leveraging embedded, unstructured, QoS and cloud databases

Graph databases to serve graph analytics between drones 

Cache infrastructure to mitigate the load on the data tier 

Streaming Data services such as Apache Flink Live and others.

Leaving out the microservices, apis, UI, scriptability and analytics stacks out of the data access discussion and in this section, we describe the workloads in terms of updates and search. This data architecture drawn out from the previous articles strives to drive the most relevant, real-time application experience supporting data acquisition, metadata filtering and the highest F-score search results. A single query must be responded to using vector search, text search, and metadata filtering and consequently span multiple data sources which may not have been virtualized or made amenable to a single SQL-like query interface. While vector embeddings use large language models are becoming popular on the web and cloud computing, model for drones is subject to more specific domain data and filters and draws from the interdisciplinary science from traffic engineering, computer networks, pattern recognition and database systems. This translates to the following requirements:

Fast complex search – spanning vectors, txt, geo and custom json data

Data and index changes – spanning large number of vectors

Rankings – from various ranking algorithms

Real-time and historical updates – with ability to update and delete any data including vectors in milliseconds without incurring reindexing costs

Hyperconverged indexing that includes different types of indexes such as vector index, range index, column store and documents will be sought-after for this kind of platform to manage drone fleet.

#codingexercise: CodingExercise-07-15-2024.docx


Sunday, July 14, 2024

 Drone:

- SysId

- ID

- Name

- Status

- Created

- Modified

- CreatedBy

- ModifiedBy

- Degree-of-freedom

- Max speed

- FuelType

- BatteryLife

- NumberOfFans

- FanStatus1-8

- Location

- Weight

- PayloadWeight

- Mileage

- Color

- LightSensor

- HeatSensor

- IRSensor

- Camera

- Zoom

- Pitch

- Yaw

- Roll

- Display0-255

Fleet:

- ID

- Name

- Drones

- Created

- Modified

- CreatedBy

- ModifiedBy

Location

- Geo coordinates

- GPS Tracking

- LastUpdated

-


Friday, July 12, 2024

 These are more use cases targeted for a commercial drone fleet management software that scales elastically and helps drones manage their flight path in real-time.

Case 6: So far the use cases have elaborated conveniences for drone makers, fleet operators, businesses leveraging drone for delivery, but the following use case focuses on end-users. A drone unit and a small personal fleet of drones can come in helpful for individuals such as farmers or for home security where the units fly to take aerial photo/video around a specific landmark such as a home or barn. Today a single drone is sold along with its remote-operated controller for manual flying but a small fleet of drones can also be sold if they are registered with and controlled by cloud software for specific limited activities. The end-user may download an application and scan the qrcodes off the back of drones so that the application registers them with the clouds and is able to relay commands individually to them. Then a set of predetermined algorithms can help navigate the  drones for aerial surveillance and footage. They can be launched on ad hoc basis or scheduled periodically and matched with image recognition for durations when active monitoring might help. With the potential to build different models of drones for different range and speed and the possibility to scale up the number of drones in the fleet seamlessly, this software could meet a variety of usages across sectors such as fire and rescue, pet surveillance, pest control, improved security, and many more that do not fall under the small-and-medium business category and enterprise category of use cases mentioned earlier.  With topologies that allow dedicated controller for a fleet of lightweight drones and the controller receiving input from the cloud software and relaying to the drones while sending back the sensor data from the drones, the entire compute, storage and network available from the cloud can be leveraged to overcome the limitations on what is shipped out-of-box today or enhance existing fleet of popular appliances like vacuum robots. A high volume of sensor data traffic over the internet might be considered slow for response times in near real-time to drones but they are more for guidance or changes to schedule so-to-speak for mundane activities in these limited scopes for drones. The cloud is uniquely positioned to scale up to the demands of real-time traffic.

Case 7: Federal Regulatory Compliance and Governance of drone usages and activities across tenants is a specific use case dedicated to this platform and one that can be alleviated from the responsibilities of the businesses signing up as tenants. The platform is uniquely positioned to provide a full data purview including genealogy, history, archival and auditing of every operation associated with the drones registered with the platform. Information as requested and required to be disclosed based on sub-poena sent by law enforcement agencies are easy to be drafted and responded to by the platform. For the businesses themselves, change data capture of the drone data as well as real-time information from logs and metrics stores are something that they can view for themselves and thus enforce necessary role-based access controls and auditing for their requirements on governance.

Previous use cases: DFCSUseCasesList.docx 


Thursday, July 11, 2024

 These are more use cases targeted for a commercial drone fleet management software that scales elastically and helps drones manage their flight path in real-time.

Case 5: Fleet operators looking to serve various businesses want to leverage a common platform for management and operations of their fleet.  Earlier, controllers were assigned to partitioned fleets and the logistics would be dependent on each with isolation provided between controllers. As a multitenant platform, both isolation and scalability are no longer restricted for the customers of this Drone Formation Commercial software. The fleet operators can add as many tenants as they can accommodate, and it is easy to roll inventory from one tenant to another. Decisions taken for the operation of the fleet are not only captured in the same single pane of glass for management but also the outcomes are easy to roll up in consolidated reports from the platform while maintaining drill down on authorized queries. 

Case 6: So far the use cases have elaborated conveniences for drone makers, fleet operators, businesses leveraging drone for delivery, but the following use case focuses on end-users. A drone unit and a small personal fleet of drones can come in helpful for individuals such as farmers or for home security where the units fly to take aerial photo/video around a specific landmark such as a home or barn. Today a single drone is sold along with its remote-operated controller for manual flying but a small fleet of drones can also be sold if they are registered with and controlled by cloud software for specific limited activities. The end-user may download an application and scan the qrcodes off the back of drones so that the application registers them with the clouds and is able to relay commands individually to them. Then a set of predetermined algorithms can help navigate the  drones for aerial surveillance and footage. They can be launched on ad hoc basis or scheduled periodically and matched with image recognition for durations when active monitoring might help. With the potential to build different models of drones for different range and speed and the possibility to scale up the number of drones in the fleet seamlessly, this software could meet a variety of usages across sectors such as fire and rescue, pet surveillance, pest control, improved security, and many more that do not fall under the small-and-medium business category and enterprise category of use cases mentioned earlier.  With topologies that allow dedicated controller for a fleet of lightweight drones and the controller receiving input from the cloud software and relaying to the drones while sending back the sensor data from the drones, the entire compute, storage and network available from the cloud can be leveraged to overcome the limitations on what is shipped out-of-box today or enhance existing fleet of popular appliances like vacuum robots. A high volume of sensor data traffic over the internet might be considered slow for response times in near real-time to drones but they are more for guidance or changes to schedule so-to-speak for mundane activities in these limited scopes for drones. The cloud is uniquely positioned to scale up to the demands of real-time traffic.


Wednesday, July 10, 2024

 These are more use cases targeted for a commercial drone fleet management software that scales elastically and helps drones manage their flight path in real-time.

Case 4: As drone manufacturers make incredible progress in the scope and purpose of drone activities, they would like to outsource software targeting repeated drills across models and versions of their drones including abilities to plugin customizations for enhanced intelligence. These drone manufacturers are experimenting with a variety of shapes, compositions and degrees of freedom both individually and collectively for formations and their ability to overcome obstacles. A cloud software that can be reached ubiquitously by drones themselves or relayers will be cost-effective in maximizing the output of these drones. Among the features expected by the manufacturers are ability to repeat tests, maintain test history,  and generate test reports. A single transparent pane of glass for observability and manageability of the drones along with these reports and dashboards will be very helpful to them for updating their drones.  Initially the size of the fleets for most of these manufactures might be small in research and development but their deployments on their customer premises could be large and require dial-home capabilities from the units or their controllers and relayers. Different kinds of topology involving different types of units participating in overall fleet management and formations must be supported by this software.

Case 5: Fleet operators looking to serve various businesses want to leverage a common platform for management and operations of their fleet.  Earlier, controllers were assigned to partitioned fleets and the logistics would be dependent on each with isolation provided between controllers. As a multitenant platform, both isolation and scalability are no longer restricted for the customers of this Drone Formation Commercial software. The fleet operators can add as many tenants as they can accommodate and it is easy to roll inventory from one tenant to another. Decisions taken for the operation of the fleet are not only captured in the same single pane of glass for management but also the outcomes are easy to roll up in consolidated reports from the platform while maintaining drill down on authorized queries. 



Tuesday, July 9, 2024

 These are more use cases targeted for a commercial drone fleet management software that scales elastically and helps drones manage their flight path in real-time.

Case 3. Many businesses outsource their fleet with a fleet provider that maintains a large inventory of hybrid models that are continually updated in their degrees of motion, capabilities and inputs. These businesses expect to have one interface with the fleet provider and another with the fleet management service such that they can choose to organize their fleet dynamically and pass the fleet details to their management service for commands that they can relay to their fleet members. They expect a detailed and comprehensive management dashboard or portal where they can monitor the key performance indicators aka KPI  for the fleet and drill down. They expect these programmability interfaces to be compatible in a way that requires little or no processing other than relaying the commands or updating the goals on the management dashboard.

Case 4: As drone manufacturers make incredible progress in the scope and purpose of drone activities, they would like to outsource software targeting repeated drills across models and versions of their drones including abilities to plugin customizations for enhanced intelligence. These drone manufacturers are experimenting with a variety of shapes, compositions and degrees of freedom both individually and collectively for formations and their ability to overcome obstacles. A cloud software that can be reached ubiquitously by drones themselves or relayers will be cost-effective in maximizing the output of these drones. Among the features expected by the manufacturers are ability to repeat tests, maintain test history,  and generate test reports. A single transparent pane of glass for observability and manageability of the drones along with these reports and dashboards will be very helpful to them for updating their drones.  Initially the size of the fleets for most of these manufactures might be small in research and development but their deployments on their customer premises could be large and require dial-home capabilities from the units or their controllers and relayers. Different kinds of topology involving different types of units participating in overall fleet management and formations must be supported by this software.


Monday, July 8, 2024

 These are the use cases targeted for a commercial drone fleet management software that scales elastically and helps drones manage their flight path in real-time.

Case 1. A retail company owns and operates a fleet of drones with proprietary software to deliver merchandise purchased by its customers to their residences. The software tracks each drone individually providing route and schedule information independent of others. The number of drones exceeds hundreds of thousands. The software is tasked with the dual goal of assigning non-overlapping,  contention free and flow maximizing  flight manifests for each drone as well as keeping the costs low for the operation of a single drone deployed to serve a customer. Controllers for the drones are placed in multiple cities worldwide and operate their own fleet independent of others and cover a specific non-overlapping geographic region. The software differentiates from the one used to mange the fleet of robots in its warehouse in that the space to operate is not a grid but a hub-and-spoke model and the destinations are not internally managed but received from order processing service bearing address, volume, weight and delivery date information in a streaming manner. The software does not differentiate between fleet members and requires the ability to form the fleet dynamically by replacing , adding or removing members. When this proprietary software becomes general purpose use for fleets of varying sizes, operating loads, and quality of service, it becomes promising as a multitenant platform so that different companies do not need to reinvent the wheel. This platform must keep the use of cloud resources as efficient as possible to pass on the savings to businesses and streamline consumption of additional resources when any single parameter is stretched. Additionally, it must provide offloading of any interface of its components for customization by these businesses. For example, they can determine their own scheduling plug-in and are not bound to supplying values to predetermined scheduling templates. Volume and rate of drone command updates must cover the six sigma of normalizes  probability distribution of current and foreseeable future fleet sizes.

Case 2. A small and medium sized business is looking to improve the efficiency of  scheduling for its fleet and wants to download models that it can operate on its own premises or invoke remotely from APIs. This controller-offload-to-the cloud scenario demands world class performance and availability at affordable price. This business does not expect to grow its fleet or experience a lot of churns in its fleet but appreciates full service outsourcing of its fleet management so that it merely controls the inventory and the commands to the fleet members. It also expects to include a clause for liability of drone damages resulting from faulty instructions. It might want to change scheduling algorithms depending on changing business priorities that are not fully covered by the scheduling parameters.

Case 3. Many businesses outsource their fleet with a fleet provider that maintains a large inventory of hybrid models that are continually updated in their degrees of motion, capabilities and inputs. These businesses expect to have one interface with the fleet provider and another with the fleet management service such that they can choose to organize their fleet dynamically and pass the fleet details to their management service for commands that they can relay to their fleet members. They expect a detailed and comprehensive management dashboard or portal where they can monitor the key performance indicators aka KPI  for the fleet and drill down. They expect these programmability interfaces to be compatible in a way that requires little or no processing other than relaying the commands or updating the goals on the management dashboard.

References: https://1drv.ms/w/s!Ashlm-Nw-wnWhPA9saJLYQGA7q2Wiw?e=AONTxo for data architecture 

https://1drv.ms/w/s!Ashlm-Nw-wnWhO4OGADjCj0GVLyFTA?e=UGMEpB for software description.



Sunday, July 7, 2024

 A self organizing map algorithm for scheduling meeting times as availabilities and bookings.  A map is a low-dimensional representation of a training sample comprising of elements e. It is represented by nodes n. The map is transformed by a  regression operation to modify the nodes position one element from the model (e) at a time. With preferences translating to nodes and availabilities as elements, this allows the map to start getting a closer match to the sample space with each epoch/iteration.

from sys import argv


import numpy as np


from io_helper import read_xyz, normalize

from neuron import generate_network, get_neighborhood, get_boundary

from distance import select_closest, euclidean_distance, boundary_distance

from plot import plot_network, plot_boundary


def main():

    if len(argv) != 2:

        print("Correct use: python src/main.py <filename>.xyz")

        return -1


    problem = read_xyz(argv[1])


    boundary = som(problem, 100000)


    problem = problem.reindex(boundary)


    distance = boundary_distance(problem)


    print('Boundary found of length {}'.format(distance))



def som(problem, iterations, learning_rate=0.8):

    """Solve the xyz using a Self-Organizing Map."""


    # Obtain the normalized set of timeslots (w/ coord in [0,1])

    timeslots = problem.copy()

    # print(timeslots)

    #timeslots[['X', 'Y', 'Z']] = normalize(timeslots[['X', 'Y', 'Z']])


    # The population size is 8 times the number of timeslots

    n = timeslots.shape[0] * 8


    # Generate an adequate network of neurons:

    network = generate_network(n)

    print('Network of {} neurons created. Starting the iterations:'.format(n))


    for i in range(iterations):

        if not i % 100:

            print('\t> Iteration {}/{}'.format(i, iterations), end="\r")

        # Choose a random timeslot

        timeslot = timeslots.sample(1)[['X', 'Y', 'Z']].values

        winner_idx = select_closest(network, timeslot)

        # Generate a filter that applies changes to the winner's gaussian

        gaussian = get_neighborhood(winner_idx, n//10, network.shape[0])

        # Update the network's weights (closer to the timeslot)

        network += gaussian[:,np.newaxis] * learning_rate * (timeslot - network)

        # Decay the variables

        learning_rate = learning_rate * 0.99997

        n = n * 0.9997


        # Check for plotting interval

        if not i % 1000:

            plot_network(timeslots, network, name='diagrams/{:05d}.png'.format(i))


        # Check if any parameter has completely decayed.

        if n < 1:

            print('Radius has completely decayed, finishing execution',

            'at {} iterations'.format(i))

            break

        if learning_rate < 0.001:

            print('Learning rate has completely decayed, finishing execution',

            'at {} iterations'.format(i))

            break

    else:

        print('Completed {} iterations.'.format(iterations))


    # plot_network(timeslots, network, name='diagrams/final.png')


    boundary = get_boundary(timeslots, network)

    plot_boundary(timeslots, boundary, 'diagrams/boundary.png')

    return boundary


if __name__ == '__main__':

    main()


Reference: 

https://github.com/raja0034/som4drones


DroneCommercialSoftware.docx


Saturday, July 6, 2024

 Problem Statement: A 0-indexed integer array nums is given.

Swaps of adjacent elements are able to be performed on nums.

A valid array meets the following conditions:

The largest element (any of the largest elements if there are multiple) is at the rightmost position in the array.

The smallest element (any of the smallest elements if there are multiple) is at the leftmost position in the array.

Return the minimum swaps required to make nums a valid array.

 

Example 1:

Input: nums = [3,4,5,5,3,1]

Output: 6

Explanation: Perform the following swaps:

- Swap 1: Swap the 3rd and 4th elements, nums is then [3,4,5,3,5,1].

- Swap 2: Swap the 4th and 5th elements, nums is then [3,4,5,3,1,5].

- Swap 3: Swap the 3rd and 4th elements, nums is then [3,4,5,1,3,5].

- Swap 4: Swap the 2nd and 3rd elements, nums is then [3,4,1,5,3,5].

- Swap 5: Swap the 1st and 2nd elements, nums is then [3,1,4,5,3,5].

- Swap 6: Swap the 0th and 1st elements, nums is then [1,3,4,5,3,5].

It can be shown that 6 swaps is the minimum swaps required to make a valid array.

Example 2:

Input: nums = [9]

Output: 0

Explanation: The array is already valid, so we return 0.

 

Constraints:

1 <= nums.length <= 105

1 <= nums[i] <= 105

Solution: 

class Solution {

    public int minimumSwaps(int[] nums) {

        int min = Arrays.stream(nums).min().getAsInt();

        int max = Arrays.stream(nums).max().getAsInt();

        int count = 0;

        while (nums[0] != min && nums[nums.length-1] != max && count < 2 * nums.length) {            

            var numsList = Arrays.stream(nums).boxed().collect(Collectors.toList());

            var end = numsList.lastIndexOf(max);

            for (int i = end; i < nums.length-1; i++) {

                swap(nums, i, i+1);

                count++;

            }

 

            numsList = Arrays.stream(nums).boxed().collect(Collectors.toList());

            var start = numsList.indexOf(min);

            for (int j = start; j >= 1; j--) {

                swap(nums, j, j-1);

                count++;

            }

        }

 

        return count;

    }

 

    public void swap (int[] nums, int i, int j) {

        int temp = nums[j];

        nums[j] = nums[i];

        nums[i] = temp;

    }

}


Input

nums =

[3,4,5,5,3,1]

Output

6

Expected

6


Input

nums =

[9]

Output

0

Expected

0


Friday, July 5, 2024

 Find minimum in a rotated sorted array:

class Solution {

    public int findMin(int[] A) {

        If (A == null || A.length == 0) { return Integer.MIN_VALUE; }

        int start = 0;

        int end = A.length -1;

        while (start < end) {

            int mid = (start + end) / 2;


            // check monotonically increasing series

            if (A[start] <= A[end] && A[start] <= A[mid] && A[mid] <= A[end]]) { return A[start];};


            // check if only [start, end]

            if (mid == start || mid == end) { if (A[start] < A[end]) return A[start]; else return A[end];}


            // detect rotation point 

            if (A[start] > A[mid]){

                end = mid;

            } else {

                if (A[mid] > A[mid+1]) return A[mid+1]; 

                start = mid + 1;

            }

        }

        return A[0];

    }   

}

Works for:

[0 1 4 4 5 6 7]

[7 0 1 4 4 5 6]

[6 7 0 1 4 4 5]

[5 6 7 0 1 4 4]

[4 5 6 7 0 1 4]

[4 4 5 6 7 0 1]

[1 4 4 5 6 7 0]

[1 0 0 0 0 0 1]