Sunday, April 30, 2023

 A script to copy attributes and tags for objects from on-premises s3 store to the Azure public cloud, when the preserve metadata option in ADF CopyActivity does not suffice. This article follows the on on customizing ADF with WebHook activity to include functionality from external services.

#! /usr/bin/bash 

#------------------- 

# This script is equally applicable to windows 

#------------------- 

throw() { 

  echo "$*" >&2 

  (exit 33) && true 

} 

  

STORAGE_ACCOUNT_NAME= 

STORAGE_ACCOUNT_KEY= 

CONTAINER_NAME= 

LOCAL_FOLDER_PATH= 

REMOTE_FOLDER_PREFIX= 

ARM_TENANT_ID=f66b7197-eb94-49fa-80fb-6df9fa346b46 

RCLONE_CONNECTION_NAME= 

  

usage() { 

  echo 

  echo "Usage: $(basename $0) -b arg -c arg -l arg -x arg -r arg [-h]" 

  echo 

  echo "-b - The name of the blob storage account." 

  echo "-c - The name of the container." 

  echo "-l - The name of the local folder path." 

  echo "-r - The name of the remote folder path." 

  echo "-x - The name of the rclone connection." 

  echo "-k - The key for the storage account." 

  echo "-h - This help text." 

  echo 

} 

  

parse_options() { 

while getopts ':b:l:c:r:x:k:h' opt; do 

  case "$opt" in 

    b) 

      STORAGE_ACCOUNT_NAME="$OPTARG" 

      ;; 

  

    k) 

      STORAGE_ACCOUNT_KEY="$OPTARG" 

      ;; 

  

    l) 

      LOCAL_FOLDER_PATH="$OPTARG" 

      ;; 

  

    r) 

      REMOTE_FOLDER_PREFIX="$OPTARG" 

      ;; 

  

    c) 

      CONTAINER_NAME="$OPTARG" 

      ;; 

  

    x) 

      RCLONE_CONNECTION_NAME="$OPTARG" 

      ;; 

  

    h) 

      echo "Processing option 'h'" 

      usage 

      (exit 33) && true 

      ;; 

  

    :) 

      echo "option requires an argument.\n" 

      usage 

      (exit 33) && true 

      ;; 

  

    ?) 

      echo "Invalid command option.\n" 

      usage 

      (exit 33) && true 

      ;; 

  esac 

done 

shift "$(($OPTIND -1))" 

} 

  

  

parse_options "$@" 

if ([ -z "$LOCAL_FOLDER_PATH" ] || [ -z "$REMOTE_FOLDER_PREFIX" ] || [ -z "$STORAGE_ACCOUNT_NAME" ] || [ -z "$CONTAINER_NAME" ] || [ -z "$RCLONE_CONNECTION_NAME" ] || [ -z "$STORAGE_ACCOUNT_KEY" ]); 

then  

  echo "Invalid command.\n" 

  usage 

  (exit 33) && true 

fi 

# az login 

key="$STORAGE_ACCOUNT_KEY" 

items=($(rclone lsf "$RCLONE_CONNECTION_NAME":"$LOCAL_FOLDER_PATH" --recursive)) 

echo LENGTH=${#items[@]} 

for item in $items 

do 

  [[ "$item" == */ ]] && continue 

  tagsJson=$(rclone lsf --format M $(eval echo $RCLONE_CONNECTION_NAME:$LOCAL_FOLDER_PATH/$item)) 

  [[ -z ${tagsJson} ]] && continue  

  #{"btime":"2023-03-30T15:57:08.66Z","content-type":"application/octet-stream","owner":"you","test-dataset":""} 

  keyValues=`echo "$tagsJson" | jq -r '[to_entries|map("\(.key)=\(.value|tostring)")|.[]]|join(" ")'` 

  [[ -z ${keyValues} ]] && continue 

  agreeableKeyValues=`echo "${keyValues//\-/\_}"` 

  [[ -z ${agreeableKeyValues} ]] && continue 

  existsJson=`az storage blob exists --account-name "$STORAGE_ACCOUNT_NAME" --account-key $key --container-name "$CONTAINER_NAME" --name $(eval echo $REMOTE_FOLDER_PREFIX/$LOCAL_FOLDER_PATH/$item)` 

  exists=`echo "$existsJson" | jq .exists` 

  #echo $exists 

  #{ 

  #  "exists": true 

  #} 

  if [[ $exists == *"true"* ]]; then 

     az storage blob metadata update --account-name "$STORAGE_ACCOUNT_NAME" --account-key $key  --container-name "$CONTAINER_NAME" --name $(eval echo $REMOTE_FOLDER_PREFIX/$LOCAL_FOLDER_PATH/$item) --metadata $(eval echo $agreeableKeyValues) 

#{ 

#  "client_request_id": "819eed5c-e557-11ed-9b75-8ef5922a9146", 

#  "date": "2023-04-27T23:59:11+00:00", 

#  "encryption_key_sha256": null, 

#  "encryption_scope": null, 

#  "etag": "\"0x8DB477B663E0F10\"", 

#  "last_modified": "2023-04-27T23:59:12+00:00", 

#  "request_id": "8fa92f3a-e01e-0014-6564-79ae0b000000", 

#  "request_server_encrypted": true, 

#  "version": "2021-06-08", 

#  "version_id": null 

#} 

    newMetadata=`az storage blob metadata show --account-name "$STORAGE_ACCOUNT_NAME" --account-key $key --container-name "$CONTAINER_NAME" --name $(eval echo $REMOTE_FOLDER_PREFIX/$LOCAL_FOLDER_PATH/$item)` 

    echo $newMetadata 

#{ 

#  "btime": "2023-03-30T15:56:48.161Z", 

#  "content_type": "application/octet-stream", 

#  "owner": "you", 

#  "test_dataset": "" 

#} 

  else 

    echo "$item not found" 

  fi 

done 

# crontab -e 

# */5 * * * * sh /path/to/this_script.sh 

 


Saturday, April 29, 2023

 

Epidemic algorithms:

These are a class of algorithms that strive to disseminate existing information and overcome reliability and scalability problems. Dissemination protocols are hampered by scalability problems and there is usually a tradeoff between reliability and scalability.  They are applicable to use cases where the scale is to the tune of millions of nodes and without any real-time information dissemination is not required.

The term epidemics pertains to the spread of disease or infection in terms of populations of individuals and the rate of change. It starts from an individual and spreads by transmission. The goal of these protocols is to spread the update as fast and completely as possible.

There are two styles of epidemic protocols. The first is Anti-entropy and the second is Rumor-mongering.

In Anti-entropy algorithms, each peer p periodically contacts a random partner q selected from the current population. Then, p and q engage in an information exchange protocol, where updates known to p but not to q are transferred from p to q (push), or vice-versa (pull), or in both-direction (push-pull).

In Rumor-mongering algorithms, peers are initially ignorant. When an update is learned by a peer, it becomes a hot rumour. When a peer holds a hot rumour, it periodically chooses a random peer from the current population and sends (pushes) the rumor to it. Eventually, a node will lose interest in spreading the rumor. There are different modes for these operations. The loss of interest can be explained by a counter analogy where one loses interest after k contacts or by a coin or random analogy where one loses interest with a probability of 1/k. Also, one can lose interest as feedback versus blind mode. One loses interest in Feedback mode only if the recipient knows the rumor and one loses interest in blind mode regardless of the recipient.

The reason that epidemic protocols scale is that the participants’ load is independent of size and information spreads in log(system size) time. With such fast scaling, they find usage in aggregations, membership management and topology management.

The design space for such algorithms can be articulated in the form of three axis for a 3D space. These axes are each for Peer Selection, View Propagation, and View Selection.  Peer Selection increases with Rand method for uniform random selection and Tail method for highest age. The View Propagation increases from algorithms involving push mode to those involving push-pull mode. The View Selection increases from Blind mode to Healer mode to Swapper mode.

Gossip-based Peer Sampling Protocol will shuffle requests and responses at chosen peers. The state update progresses from these peers.

Newscast as a peer sampling example will involve a uniform random peer selection, a push-pull view propagation and a Healer view selection. A Newscast peer will pick a random peer from its view, send each other view along with its own fresh link. It will keep c freshest links by removing own info and duplicates. The information will be tracked based on sender id and virtual time.

Cyclon as a peer sampling example will involve a highest age Tail strategy for peer selection, a push-pull mode for view propagation, and a Swapper mode for view selection. A Cyclon peer will pick the oldest peer from its view and remove it from the view. It will exchange some of the peers in neighbours via (swap policy) and the active peer sends its fresh address.

#codingexercise

 

You are assigned to put some amount of boxes onto one truck. You are given a 2D array boxTypes, where boxTypes[i] = [numberOfBoxesi, numberOfUnitsPerBoxi]:

  • numberOfBoxesi is the number of boxes of type i.
  • numberOfUnitsPerBoxi is the number of units in each box of the type i.

You are also given an integer truckSize, which is the maximum number of boxes that can be put on the truck. You can choose any boxes to put on the truck as long as the number of boxes does not exceed truckSize.

Return the maximum total number of units that can be put on the truck.

 

Example 1:

Input: boxTypes = [[1,3],[2,2],[3,1]], truckSize = 4

Output: 8

Explanation: There are:

- 1 box of the first type that contains 3 units.

- 2 boxes of the second type that contain 2 units each.

- 3 boxes of the third type that contain 1 unit each.

You can take all the boxes of the first and second types, and one box of the third type.

The total number of units will be = (1 * 3) + (2 * 2) + (1 * 1) = 8.

Example 2:

Input: boxTypes = [[5,10],[2,5],[4,7],[3,9]], truckSize = 10

Output: 91

 

Constraints:

  • 1 <= boxTypes.length <= 1000
  • 1 <= numberOfBoxesi, numberOfUnitsPerBoxi <= 1000
  • 1 <= truckSize <= 106

 

 

class Solution {

    public int maximumUnits(int[][] boxTypes, int truckSize) {

        var boxMap = new TreeMap<Integer, List<Integer>>();

        for (int i = 0; i < boxTypes.length; i++) {

            var type = boxTypes[i][0];

            var countOfUnits = boxTypes[i][1];

            if (boxMap.containsKey(countOfUnits)){

                var list = boxMap.get(countOfUnits);

                list.add(type);

                boxMap.put(countOfUnits, list);

            } else {

                var list = new ArrayList<Integer>();

                list.add(type);

                boxMap.put(countOfUnits, list);

            }

        }

        int total = 0;

        int totalUnits = 0;

        for (var entry : boxMap.descendingMap().entrySet()){

            var countOfUnits = entry.getKey();

            var types = entry.getValue();

            for (var type : types){

                if (total + 1 <= truckSize) {

                    for (int i = 0; i < type && total < truckSize; i++) {

                        total += 1;

                        totalUnits += 1 * countOfUnits;

                    }

                } else {

                    return totalUnits;

                }

            }

        }

        return totalUnits;

    }

}

 

Input

boxTypes =

[[1,3],[2,2],[3,1]]

truckSize =

4

Output

8

Expected

8

 

Input

boxTypes =

[[5,10],[2,5],[4,7],[3,9]]

truckSize =

10

Output

91

Expected

91