Tuesday, March 24, 2026

 The following sample script illustrates how to acl containers and folders inside an Azure Data Lake Storage so that users with only Reader control plane access can be allowed access at a fine-granular level.

Script begins:

subscriptionid=%1

az account set --subscription "$subscriptionid"

accountkey=%2

accountname=%3

cradle=%4

domesticrw=%5

domesticro=%6

globalro=%7

globalrw=%8

if  [[ -n "$domesticrw" ]] && ! [[ "$domesticrw" =~ ^[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}$ ]]; then

  echo "translating domesticrw=$domesticrw"

  domesticrw=$(az ad group list --filter "displayName eq '$domesticrw'" --query "[0].id" --output tsv)

  echo "domesticrw=$domesticrw"

fi

if  [[ -n "$domesticro" ]] && ! [[ "$domesticro" =~ ^[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}$ ]]; then

  echo "translating domesticro=$domesticro"

  domesticro=$(az ad group list --filter "displayName eq '$domesticro'" --query "[0].id" --output tsv)

  echo "domesticro=$domesticro"

fi

if  [[ -n "$globalrw" ]] && ! [[ "$globalrw" =~ ^[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}$ ]]; then

  echo "translating globalrw=$globalrw"

  globalrw=$(az ad group list --filter "displayName eq '$globalrw'" --query "[0].id" --output tsv)

  echo "globalrw=$globalrw"

fi

if [[ -n "$globalro" ]] && ! [[ "$globalro" =~ ^[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}$ ]]; then

  echo "translating globalro=$globalro"

  globalro=$(az ad group list --filter "displayName eq '$globalro'" --query "[0].id" --output tsv)

  echo "globalro=$globalro"

fi

echo "create container, if not exists"

az storage container create -n $cradle --account-name "$accountname" --account-key "$accountkey"

echo "container exists, acling..."

az storage fs access set --acl "group:"$globalrw":r-x,group:"$globalro":r-x,group:"$domesticro":r-x" -p "/" -f "$cradle" --account-name "$accountname" --account-key "$accountkey"

az storage fs access update-recursive --acl "group:"$domesticrw":rwx,default:user:"$domesticrw":rwx" -p "/" -f "$cradle" --account-name "$accountname" --account-key "$accountkey"

echo "container acl'ed."

echo "creating global and domestic folders..."

az storage fs directory create -n domestic -f "$cradle" --account-name "$accountname" --account-key "$accountkey" --only-show-errors

az storage fs directory create -n global -f "$cradle" --account-name "$accountname" --account-key "$accountkey" --only-show-errors

echo "folders exist, acling..."

[[ -n "$domesticrw" ]] && az storage fs access update-recursive --acl "group:"$domesticrw":rwx,default:user:"$domesticrw":rwx" -p "domestic" -f "$cradle" --account-name "$accountname" --account-key "$accountkey"

[[ -n "$domesticro" ]] && az storage fs access update-recursive --acl "group:"$domesticro":r-x,default:user:"$domesticro":r-x" -p "domestic" -f "$cradle" --account-name "$accountname" --account-key "$accountkey"

[[ -n "$globalrw" ]] && az storage fs access update-recursive --acl "group:"$globalrw":rwx,default:user:"$globalrw":rwx" -p "global" -f "$cradle" --account-name "$accountname" --account-key "$accountkey"

[[ -n "$globalro" ]] && az storage fs access update-recursive --acl "group:"$globalro":r-x,default:user:"$globalro":r-x" -p "global" -f "$cradle" --account-name "$accountname" --account-key "$accountkey"

echo "folders acl'ed."

#codingexercise: CodingExercise-03-24-2026.docx

Sunday, March 22, 2026

 

This is a summary of a book titled “The AI Revolution in Customer Service and Support: A Practical Guide to Impactful Deployment of AI to Best Serve Your Customers” written by Ross Smith, Emily McKeon and Mayte Cubino and published by Pearson Education (USA) in 2024. This book examines how artificial intelligence is reshaping customer service at a moment when expectations for speed, personalization, and convenience are higher than ever. The authors argue that customer service has become a defining factor in how organizations are judged, often as important as the products or services themselves. Many traditional support models struggle to meet contemporary demands, leaving customers frustrated by long wait times and inefficient interactions. Against this backdrop, the authors position AI as a tool capable of transforming customer service into something more responsive, consistent, and closely aligned with individual customer needs.

 

Drawing parallels with earlier technological shifts such as electrification and industrial automation, the book situates AI within a broader pattern of innovation that alters how work is organized and value is delivered. In customer service, AI systems can process vast amounts of data to provide personalized assistance at scale, often more quickly and reliably than human agents alone. While implementing such systems can require significant upfront investment, the authors suggest that long-term efficiencies and improved customer satisfaction can offset these costs.

 

Organizations are encouraged to develop a clear vision for how AI fits into their long-term strategy rather than treating it as a short-term efficiency fix. This vision should articulate what success looks like several years into the future and should be communicated clearly to all stakeholders, including employees and customers. The authors emphasize that leadership commitment must be visible and consistent, and that AI initiatives should be grounded in a realistic understanding of both technological capabilities and organizational needs. Setting concrete, measurable goals allows companies to move beyond abstract enthusiasm and toward meaningful outcomes.

 

Before deploying AI, the authors stress the need to understand existing customer service operations. Establishing a baseline helps organizations evaluate whether AI adoption is actually improving performance. This involves identifying gaps between current service levels and customer expectations, prioritizing areas for improvement, and quantifying desired changes in metrics such as customer satisfaction. During development, AI systems should be tested iteratively with different customer segments, assessed for integration with existing tools, and reviewed regularly from an ethical standpoint. Validation should include basic accuracy checks, stress testing under real-world conditions, and confirmation that systems comply with regulatory and internal ethical standards.

 

Once deployed, AI systems must be accessible across the channels customers already use and adaptable to the needs of both customers and employees. Successful integration depends not only on technical infrastructure but also on education and change management. The authors note that while customers ultimately benefit from faster and more consistent service, some may be concerned about losing human interaction. Transparency about when and how AI is used, along with clear pathways to human support, can help address these concerns. Employee responses to AI adoption also vary, ranging from enthusiasm to anxiety about job security. The book emphasizes that AI should be framed as a tool that supports human work rather than replaces it, and that employees should be encouraged to engage with and learn from the technology.

 

Ethical considerations run throughout the authors’ discussion. As AI systems become more influential, the risks associated with bias, lack of accountability, and opaque decision-making increase. The book argues that responsible AI use must be grounded in human values, with explicit commitments to fairness, transparency, security, and accountability. Organizations are urged to take responsibility for the outputs of their AI systems and to address any harms that arise from their use, rather than treating ethical issues as secondary or abstract concerns.

 

Cultural factors also play a significant role in how AI is received. Resistance to new technology often stems from fear or misunderstanding, and the authors suggest that organizational culture can either amplify or mitigate these reactions. A culture that values learning and adaptation is more likely to view AI as an opportunity rather than a threat. Generational differences can shape expectations as well, with younger customers and employees generally more comfortable with automation than older ones. Addressing these differences thoughtfully, such as by showing how AI can reduce routine work and allow for deeper human engagement, can ease adoption.

 

The book also explores how AI changes the nature of customer support roles. As organizations map their customer journeys and introduce AI into specific touchpoints, employee responsibilities shift toward more complex, judgment-based tasks. Training becomes essential, particularly in teaching staff how to work effectively with AI systems and interpret their outputs. At the same time, new roles emerge, including specialists focused on data, model performance, ethics, and content management. These roles help ensure that AI systems remain aligned with organizational goals and customer needs.

 

The authors argue that leadership itself must evolve. Leaders in customer service are tasked not only with managing operations but also with guiding their organizations through ongoing technological change. This requires openness to learning, attentiveness to employee concerns, and a willingness to address the broader social implications of AI use. By emphasizing transparency, accountability, and respect for data privacy, leaders can build trust among customers, employees, and other stakeholders as AI becomes an integral part of customer service and support.


#codingexercise: CodingExercise-03-22-2026.docx

Saturday, March 21, 2026

 This is a summary of a book titled “The Mentally Strong Leader: Build the Habits to Productively Regulate Your Emotions, Thoughts, and Behaviors” written by Scott Mautz and published by Peakpoint Press in 2024. This book says that mentally strong leaders are distinguished by their capacity for self-regulation. They are intentional about their behavior, their thoughts, and their emotional responses, and this intentionality translates into self-discipline, confidence, decisiveness, and clarity of purpose. Rather than reacting impulsively, they choose responses that align with their values and long-term goals. Mental strength shows up in observable ways, including fortitude in adversity, boldness in pursuing meaningful goals, sustained focus, and the ability to motivate others through clear and credible messages.

Mautz treats mental strength as something that can be developed, much like physical fitness. With consistent effort and the right practices, leaders can expand their capacity to think expansively, stay positive, and make sound decisions. Experience and maturity naturally contribute to this growth, but progress accelerates when leaders adopt deliberate habits that help them recover from setbacks, learn from failure, and maintain momentum.

Fortitude, the ability to withstand adversity without losing direction or resolve, is essential. Building fortitude begins with discipline and with reframing how challenges are perceived. Leaders are encouraged to see difficulties not as threats but as problems to be worked through and, potentially, as sources of learning. This shift requires tolerating routine and pressure, confronting uncomfortable situations, planning for setbacks, and resisting the pull of victim thinking. It also involves welcoming disagreement and engaging in hard conversations rather than avoiding them.

When facing challenges, Mautz advises leaders to slow down their thinking and look for what a situation might teach them or how it could open new options. Perspective is gained by connecting current problems to past experiences and remembering previous moments of difficulty that were ultimately resolved. Emotional reactions are acknowledged but not allowed to dominate; action is emphasized over rumination, and leaders are urged to move forward even when they feel discouraged.

Problem-solving starts with honestly admitting that a problem exists while maintaining confidence in one’s ability to address it. Effective leaders focus on the issue itself rather than personalizing conflict, remain flexible in their approach, and ask questions that clarify root causes. They generate multiple possible solutions, narrow their options, commit to a course of action, and follow through. Under sustained pressure, they concentrate on immediate priorities instead of worst-case scenarios and build on what is already working. Mautz offers a simple mental model to guide leaders through turbulent moments: being candid to reduce uncertainty, serving as a steady anchor for others, providing clear direction, and paying attention to how people are responding.

Avoiding a victim mindset is another aspect of fortitude. Leaders are encouraged to question the comfort that self-pity can provide and to replace “Why me?” with “Why not me?” By owning their role in a situation and letting go of the expectation that everything must be fair, they regain a sense of agency. In disagreements, they treat differing views as opportunities for better ideas, provided the environment feels safe, everyone is heard, and discussions stay focused on facts rather than personal attacks. Difficult conversations are not postponed indefinitely but scheduled and approached with preparation for emotional reactions.

Confidence and intentionality form the next pillar of mentally strong leadership. Mautz frames mistakes as part of an ongoing learning process rather than as verdicts on competence. Leaders build confidence by learning how to receive criticism without defensiveness, extracting useful lessons, and deciding consciously which feedback to act on. Improvement is integrated into daily work through intentional practices, self-correction, optimism, and a willingness to accept oneself as imperfect but capable.

Managing self-doubt requires balance. Leaders are cautioned against both overconfidence and excessive fear and encouraged to aim for a grounded sense of self-belief. Feelings of inadequacy or impostorism are addressed by challenging negative self-talk, focusing on the value one brings, and seeking perspective from trusted colleagues. Comparison with others is discouraged, since outward success rarely tells the whole story; the only meaningful comparison is with one’s own past performance. Optimism is cultivated by recognizing that responses to events shape their impact and by remembering that difficulties, while uncomfortable, often contribute to growth.

Boldness is not recklessness but informed courage. Leaders grow by expanding their thinking, questioning limiting narratives, and allowing themselves to imagine ambitious possibilities. They identify their strengths, study examples of success, draw on others’ insights, and take purposeful leaps forward. Bold leaders challenge assumptions, remain open to new information, adjust quickly as conditions change, and replace wishful thinking with effort. Risk is treated as a skill to be developed, with learning and even failed attempts viewed as necessary investments. Change itself is framed as an opportunity for improvement, and leaders help others navigate it by articulating a clear and realistic vision of the future.

Communication is presented as a critical expression of mental strength. The way leaders speak, listen, and frame messages shapes trust and motivation. When emotions run high, mentally strong leaders pause to name what they are feeling and consciously choose a more constructive response. They listen with the intent to understand and to signal genuine interest, staying present and attentive even when conversations are challenging. Positivity is expressed through acceptance, forgiveness, gratitude, and encouragement, while transparency and integrity guide what leaders share and how they act. Clear personal values serve as reference points for decisions and behavior.

The book also connects mental strength to decision-making quality. Leaders are urged to recognize and counter common cognitive biases, to replace unhelpful habits with better ones, and to bring discipline to group decisions by clarifying roles and expectations. Data is used thoughtfully rather than indiscriminately, with attention paid to bias, missing information, and what truly matters. Leaders are encouraged to look beyond binary choices, consider timing, delegate appropriately, and ensure that decisions align with broader goals. Once a decision is made, confidence and follow-through reinforce credibility.

#codingexercise: https://1drv.ms/w/c/d609fb70e39b65c8/IQCQ5_vu_UCOQaAtWXPEFpCSAXzIyKBI6t9U2EAvyBiM88s?e=A2iIz0 

Friday, March 20, 2026

 This demonstrates sample data transfer from sharepoint to adls:

# Databricks notebook source

import os

os.environ["SHAREPOINT_CLIENT_ID"] = ""

os.environ["SHAREPOINT_CLIENT_SECRET"] = ""

os.environ["SHAREPOINT_TENANT_ID"] = ""

# COMMAND ----------

import json

import time

import logging

import requests

from msal import ConfidentialClientApplication

from azure.identity import DefaultAzureCredential

from azure.storage.blob import BlobServiceClient, ContentSettings

from urllib.parse import quote

from typing import Dict, Set, List

# COMMAND ----------

# Logging

logging.basicConfig(level=logging.INFO)

logger = logging.getLogger("sp-to-adls")

# COMMAND ----------

client_id = os.environ["SHAREPOINT_CLIENT_ID"]

client_secret = os.environ["SHAREPOINT_CLIENT_SECRET"]

tenant_id = os.environ["SHAREPOINT_TENANT_ID"]

# COMMAND ----------

os.environ["STORAGE_CONNECTION_STRING"] = ""

# COMMAND ----------

connection_string = os.environ["STORAGE_CONNECTION_STRING"]

# COMMAND ----------

def get_graph_token():

    authority = f"https://login.microsoftonline.com/{tenant_id}"

    app = ConfidentialClientApplication(client_id, authority=authority, client_credential=client_secret)

    scope = "https://graph.microsoft.com/.default"

    token = app.acquire_token_for_client(scopes=[scope])

    return token["access_token"]

# COMMAND ----------

def get_drive_name(site_id: str, drive_id: str, token: str) -> str:

    """

    Return the drive name for the given site_id and drive_id using Microsoft Graph.

    Requires a get_graph_token() function that returns a valid app-only access token.

    """

    access_token = None

    if token:

        access_token = token

    if not access_token:

        access_token = get_graph_token()

    headers = {"Authorization": f"Bearer {access_token}"}

    url_drive = f"https://graph.microsoft.com/v1.0/drives/{drive_id}"

    resp = requests.get(url_drive, headers=headers)

    if resp.status_code == 200:

        drive = resp.json()

        return drive.get("name")

    elif resp.status_code in (401, 403):

        # Token issue or permission problem; try refreshing token once

        access_token = get_graph_token()

        headers = {"Authorization": f"Bearer {access_token}"}

        resp = requests.get(url_drive, headers=headers)

        resp.raise_for_status()

        return resp.json().get("name")

    elif resp.status_code == 404:

        # Fallback: list drives under the site and match by id

        url_site_drives = f"https://graph.microsoft.com/v1.0/sites/{site_id}/drives"

        resp2 = requests.get(url_site_drives, headers=headers)

        resp2.raise_for_status()

        drives = resp2.json().get("value", [])

        for d in drives:

            if d.get("id") == drive_id:

                return d.get("name")

        raise RuntimeError(f"Drive id {drive_id} not found under site {site_id}")

    else:

        # Raise for other unexpected statuses

        resp.raise_for_status()

# COMMAND ----------

def get_drive_name_or_none(site_id: str, drive_id: str, token: str) -> str:

    try:

        drive_name = get_drive_name(site_id, drive_id, token)

        return drive_name.strip('/')

    except Exception as e:

        # print("Failed to resolve drive name:", e)

        return None

# COMMAND ----------

def get_site_id_drive_ids(hostname="myazure.sharepoint.com", site_path="/sites/site1/Deep/to/EI"):

    access_token = get_graph_token()

    headers = {"Authorization": f"Bearer {access_token}"}

    # 1) Resolve site by path -> site-id

    site_url = f"https://graph.microsoft.com/v1.0/sites/{hostname}:{site_path}"

    r = requests.get(site_url, headers=headers)

    r.raise_for_status()

    site = r.json()

    site_id = site["id"]

    print("Site id:", site_id)

    # 2) List drives (document libraries) for the site -> find drive id

    drives_url = f"https://graph.microsoft.com/v1.0/sites/{site_id}/drives"

    r = requests.get(drives_url, headers=headers)

    r.raise_for_status()

    drives = r.json().get("value", [])

    drive_ids = []

    for d in drives:

        print("Drive name:", d["name"], "Drive id:", d["id"])

        drive_ids += [d["id"]]

    return site_id, drive_ids

# COMMAND ----------

site_id, drive_ids = get_site_id_drive_ids()

print(f"site_id={site_id}, drive_ids={drive_ids}")

print(f"drive_ids={drive_ids}")

# COMMAND ----------

from typing import List, Dict

def list_children_for_drive_item(drive_id: str, item_id: str, token: str) -> List[Dict]:

    """

    List children for a drive item (folder) using Graph API with pagination.

    - drive_id: the drive id (not site id)

    - item_id: 'root' or a folder item id

    - token: Graph access token

    Returns list of item dicts (raw Graph objects).

    """

    headers = {"Authorization": f"Bearer {token}"}

    # Use the drive children endpoint; for root use /drives/{drive_id}/root/children

    if item_id == "root":

        url = f"https://graph.microsoft.com/v1.0/drives/{drive_id}/root/children"

    else:

        url = f"https://graph.microsoft.com/v1.0/drives/{drive_id}/items/{item_id}/children"

    items = []

    while url:

        resp = requests.get(url, headers=headers)

        resp.raise_for_status()

        payload = resp.json()

        items.extend(payload.get("value", []))

        url = payload.get("@odata.nextLink")

    return items

def enumerate_drive_recursive(site_id: str, drive_id: str, token: str) -> List[Dict]:

    """

    Recursively enumerate all items in the specified drive and return a list of items

    with an added 'relativePath' key for each item.

    - site_id: Graph site id (not used directly in the drive calls but kept for signature parity)

    - drive_id: Graph drive id (document library)

    Returns: List of dicts with keys: id, name, relativePath, file/folder, lastModifiedDateTime, parentReference, ...

    """

    # Assumes get_graph_token() is defined elsewhere in your notebook and returns a valid app-only token

    if not token:

        token = get_graph_token()

    results = []

    stack = [("root", "")] # (item_id, relative_path)

    while stack:

        current_id, current_rel = stack.pop()

        try:

            children = list_children_for_drive_item(drive_id, current_id, token)

        except requests.HTTPError as e:

            # If token expired or transient error, refresh token and retry once

            if e.response.status_code in (401, 403):

                token = get_graph_token()

                children = list_children_for_drive_item(drive_id, current_id, token)

            else:

                raise

        for child in children:

            name = child.get("name", "")

            # Build relative path: if current_rel is empty, child_rel is name; else join with slash

            child_rel_path = f"{current_rel}/{name}".lstrip("/")

            # Attach relativePath to the returned item dict

            item_with_path = dict(child) # shallow copy

            item_with_path["relativePath"] = child_rel_path

            results.append(item_with_path)

            # If folder, push onto stack to enumerate its children

            if "folder" in child:

                stack.append((child["id"], child_rel_path))

    return results

# COMMAND ----------

# Retry settings

MAX_RETRIES = 5

BASE_BACKOFF = 2 # seconds

# COMMAND ----------

# Azure Storage destination

storage_account = "deststoraccount"

container_name = "ctr1"

blob_service = BlobServiceClient.from_connection_string(connection_string)

container_client = blob_service.get_container_client(container_name)

# Checkpoint blob path inside container

checkpoint_blob_path = "_checkpoints/sharepoint_to_adls_checkpoint.json"

# Checkpoint structure: { item_id: lastModifiedDateTime }

checkpoint_blob = container_client.get_blob_client(checkpoint_blob_path)

def load_checkpoint() -> Dict[str, str]:

    try:

        data = checkpoint_blob.download_blob().readall()

        return json.loads(data)

    except Exception:

        logger.info("No checkpoint found, starting fresh.")

        return {}

def save_checkpoint(checkpoint: Dict[str, str]):

    checkpoint_blob.upload_blob(json.dumps(checkpoint), overwrite=True)

    logger.info("Checkpoint saved with %d entries", len(checkpoint))

# COMMAND ----------

import os

import mimetypes

from io import BytesIO

from typing import Optional, Dict

import requests

from azure.core.exceptions import ResourceExistsError, ServiceRequestError, ClientAuthenticationError, HttpResponseError

from azure.storage.blob import (

    BlobServiceClient,

    BlobClient,

    ContentSettings

)

# ----------------------------

# Azure Blob upload helpers

# ----------------------------

def _get_blob_service_client(

    *,

    connection_string: Optional[str] = None,

    account_url: Optional[str] = None,

    sas_token: Optional[str] = None

) -> BlobServiceClient:

    """

    Create a BlobServiceClient from one of:

      - connection_string

      - account_url + sas_token (e.g., https://<acct>.blob.core.windows.net/?<sas>)

    """

    if connection_string:

        return BlobServiceClient.from_connection_string(connection_string)

    if account_url and sas_token:

        # Ensure sas_token starts with '?'

        sas = sas_token if sas_token.startswith('?') else f'?{sas_token}'

        return BlobServiceClient(account_url=account_url, credential=sas)

    raise ValueError("Provide either connection_string OR (account_url AND sas_token).")

def download_file(drive_id, item_id, token = None):

    if not token:

        token = get_graph_token()

    headers = {"Authorization": f"Bearer {token}"}

    url = f"https://graph.microsoft.com/v1.0/drives/{drive_id}/items/{item_id}/content"

    resp = requests.get(url, headers=headers)

    resp.raise_for_status()

    return resp.content

def upload_bytes_to_blob(

    data: bytes,

    *,

    container_name: str,

    blob_name: str,

    blob_service: Optional[BlobServiceClient] = None,

    connection_string: Optional[str] = None,

    account_url: Optional[str] = None,

    sas_token: Optional[str] = None,

    content_type: Optional[str] = None,

    overwrite: bool = False,

    metadata: Optional[Dict[str, str]] = None

) -> str:

    """

    Uploads a bytes object to Azure Blob Storage and returns the blob URL.

    Parameters:

        data (bytes): File content.

        container_name (str): Target container name.

        blob_name (str): Target blob name (e.g., 'reports/file.pdf').

        connection_string/account_url/sas_token: Auth options.

        content_type (str): MIME type; if None, guessed from blob_name.

        overwrite (bool): Replace if exists.

        metadata (dict): Optional metadata key/value pairs.

    Returns:

        str: The URL of the uploaded blob.

    """

    # print(f"blob_name={blob_name}")

    # Guess content type if not provided

    if content_type is None:

        guessed, _ = mimetypes.guess_type(blob_name)

        content_type = guessed or "application/octet-stream"

    # Build client

    bsc = None

    if blob_service != None:

        bsc = blob_service

    if not bsc:

        bsc = _get_blob_service_client(

        connection_string=connection_string,

        account_url=account_url,

        sas_token=sas_token

        )

    container_client = bsc.get_container_client(container_name)

    # Ensure container exists (idempotent)

    try:

        container_client.create_container()

    except ResourceExistsError:

        pass # already exists

    blob_client: BlobClient = container_client.get_blob_client(blob_name)

    # Upload

    try:

        content_settings = ContentSettings(content_type=content_type)

        # Use a stream to be memory-friendly for large files, though we already have bytes

        stream = BytesIO(data)

        blob_client.upload_blob(

            stream,

            overwrite=overwrite,

            metadata=metadata,

            content_settings=content_settings

        )

    except ResourceExistsError:

        if not overwrite:

            raise

    except ClientAuthenticationError as e:

        raise RuntimeError(f"Authentication failed when uploading blob: {e}") from e

    except (ServiceRequestError, HttpResponseError) as e:

        raise RuntimeError(f"Blob upload failed: {e}") from e

    # Construct and return URL (works for both conn string and SAS)

    return blob_client.url

def with_retries(func):

    def wrapper(*args, **kwargs):

        backoff = BASE_BACKOFF

        for attempt in range(1, MAX_RETRIES + 1):

            try:

                return func(*args, **kwargs)

            except Exception as e:

                logger.warning("Attempt %d/%d failed for %s: %s", attempt, MAX_RETRIES, func.__name__, e)

                if attempt == MAX_RETRIES:

                    logger.error("Max retries reached for %s", func.__name__)

                    raise

                time.sleep(backoff)

                backoff *= 2

    return wrapper

download_file_with_retries = with_retries(download_file)

upload_bytes_to_blob_with_retries = with_retries(upload_bytes_to_blob)

# ----------------------------

# Orchestrator

# ----------------------------

def download_and_upload_to_blob(

    *,

    drive_id: str,

    item_id: str,

    token: str,

    container_name: str,

    blob_name: str,

    blob_service: Optional[BlobServiceClient] = None,

    connection_string: Optional[str] = None,

    account_url: Optional[str] = None,

    sas_token: Optional[str] = None,

    content_type: Optional[str] = None,

    overwrite: bool = False,

    metadata: Optional[Dict[str, str]] = None

) -> str:

    """

    Downloads a file from Microsoft Graph using the provided item_id and uploads it to Azure Blob Storage.

    Returns the blob URL.

    """

    # 1) Download bytes from Graph

    file_bytes = download_file_with_retries(drive_id, item_id, token=token)

    # 2) Upload to Blob

    blob_url = upload_bytes_to_blob_with_retries(

        file_bytes,

        container_name=container_name,

        blob_name=blob_name,

        blob_service=blob_service,

        connection_string=connection_string,

        account_url=account_url,

        sas_token=sas_token,

        content_type=content_type,

        overwrite=overwrite,

        metadata=metadata,

    )

    return blob_url

# COMMAND ----------

import base64

from azure.storage.blob import ContentSettings

def copy_site_items_to_storage_account(hostname="myazure.sharepoint.com", site_path="/sites/site1/Deep/to/EI", container_name = "ctr1", destination_folder = "domestic/EI"):

    site_id, drive_ids = get_site_id_drive_ids(hostname,site_path)

    checkpoint = load_checkpoint() # item_id -> lastModifiedDateTime

    for drive_id in drive_ids:

        print(f"Processing drive_id={drive_id}")

        token = get_graph_token()

        items = enumerate_drive_recursive(site_id, drive_id, token)

        items = [it for it in items if "file" in it]

        len_items = len(items)

        if len_items == 0:

            continue

        max_size = max(entry["size"] for entry in items)

        sum_size = sum(entry["size"] for entry in items)

        print(f"max_size={max_size}, sum_size={sum_size}, len_items={len_items}")

        drive_name = get_drive_name_or_none(site_id, drive_id, token)

        if not drive_name:

            continue

        for item in items:

            item_id = item["id"]

            last_mod = item.get("lastModifiedDateTime")

            rel_path = item.get("relativePath")

            try:

                blob_name = destination_folder + "/"

                if drive_name:

                    blob_name += drive_name + "/"

                if item["relativePath"]:

                    blob_name += rel_path

                else:

                    continue

                download_and_upload_to_blob(drive_id = drive_id, item_id=item_id, blob_service = blob_service, token = token, container_name=container_name, blob_name=blob_name, connection_string=connection_string)

                # Update checkpoint after successful upload

                checkpoint[item_id] = last_mod

                save_checkpoint(checkpoint)

                logger.info("Copied and checkpointed %s", rel_path)

            except Exception as e:

                print(f"Error: {e}, item_id={item["id"]}, item_url={item["webUrl"]}")

    # break

    # break

    # return

# COMMAND ----------

copy_site_items_to_storage_account()

References: previous article: https://1drv.ms/w/c/d609fb70e39b65c8/IQBiE_AAtirtRbK2Ur7XROmCAYCKwfMgvdfvbvFjw0j_o5Q?e=EmfLlu


Thursday, March 19, 2026

 Coding Exercise: Number of centered subarrays

You are given an integer array nums.

A subarray of nums is called centered if the sum of its elements is equal to at least one element within that same subarray.

Return the number of centered subarrays of nums.

Example 1:

Input: nums = [-1,1,0]

Output: 5

Explanation:

• All single-element subarrays ([-1], [1], [0]) are centered.

• The subarray [1, 0] has a sum of 1, which is present in the subarray.

• The subarray [-1, 1, 0] has a sum of 0, which is present in the subarray.

• Thus, the answer is 5.

Example 2:

Input: nums = [2,-3]

Output: 2

Explanation:

Only single-element subarrays ([2], [-3]) are centered.

class Solution {

    public int centeredSubarrays(int[] nums) {

        int count = 0;

        for (int i = 0; i < nums.length; i++) {

            for (int j = i; j < nums.length; j++) {

                long sum = 0;

                for (int k = i; k<=j; k++) {

                    sum += nums[k];

                }

                for (int k = i; k<=j; k++) {

                    if (nums[k] == (int)sum) {

                        count++;

                        break;

                    }

                }

            }

        }

        return count;

    }

}

Accepted

1042 / 1042 testcases passed


Wednesday, March 18, 2026

 Sample code to import sharepoint data to Azure Storage Account:

# Databricks notebook source

import os

os.environ["PROD_DSS_SHAREPOINT_CLIENT_ID"] = ""

os.environ["PROD_DSS_SHAREPOINT_CLIENT_SECRET"] = ""

os.environ["PROD_DSS_SHAREPOINT_TENANT_ID"] = ""

# COMMAND ----------

# Optional: use dbutils secrets for sensitive values

# client_secret = dbutils.secrets.get(scope="my-scope", key="sharepoint-client-secret")

# client_id = dbutils.secrets.get(scope="my-scope", key="sharepoint-client-id")

# tenant_id = dbutils.secrets.get(scope="my-scope", key="sharepoint-tenant-id")

# COMMAND ----------

import requests

from msal import ConfidentialClientApplication

from azure.identity import DefaultAzureCredential

from azure.storage.blob import BlobServiceClient

import json

import time

# === SharePoint App Registration ===

import os

client_id = os.environ["PROD_DSS_SHAREPOINT_CLIENT_ID"]

client_secret = os.environ["PROD_DSS_SHAREPOINT_CLIENT_SECRET"]

tenant_id = os.environ["PROD_DSS_SHAREPOINT_TENANT_ID"]

siteId = "site01/EI/"

listId = "Links"

authority = f"https://login.microsoftonline.com/{tenant_id}"

scope = ["https://graph.microsoft.com/.default"]

app = ConfidentialClientApplication(

    client_id,

    authority=authority,

    client_credential=client_secret

)

def get_graph_token():

    token = app.acquire_token_for_client(scopes=scope)

    return token["access_token"]

# COMMAND ----------

# === Azure Storage ===

storage_account = "someaccount01"

container_name = "container01"

credential = DefaultAzureCredential()

blob_service = BlobServiceClient(

    f"https://{storage_account}.blob.core.windows.net",

    credential=credential

)

container_client = blob_service.get_container_client(container_name)

# COMMAND ----------

checkpoint_blob = container_client.get_blob_client("_checkpoints/sharepoint_copied.json")

def load_checkpoint():

    try:

        data = checkpoint_blob.download_blob().readall()

        return set(json.loads(data))

    except Exception:

        return set()

def save_checkpoint(copied_ids):

    checkpoint_blob.upload_blob(

        json.dumps(list(copied_ids)),

        overwrite=True

    )

# COMMAND ----------

import requests

from msal import ConfidentialClientApplication

# The SharePoint host and site path from your URL

hostname = "uhgazure.sharepoint.com"

site_path = "/sites/site01/EI" # server-relative path (no trailing path/to/lists)

# MSAL app-only token

authority = f"https://login.microsoftonline.com/{tenant_id}"

app = ConfidentialClientApplication(client_id, authority=authority, client_credential=client_secret)

token = app.acquire_token_for_client(scopes=["https://graph.microsoft.com/.default"])

access_token = token.get("access_token")

headers = {"Authorization": f"Bearer {access_token}"}

# 1) Resolve site by path -> site-id

site_url = f"https://graph.microsoft.com/v1.0/sites/{hostname}:{site_path}"

r = requests.get(site_url, headers=headers)

r.raise_for_status()

site = r.json()

site_id = site["id"]

print("Site id:", site_id)

# 2) List drives (document libraries) for the site -> find drive id

drives_url = f"https://graph.microsoft.com/v1.0/sites/{site_id}/drives"

r = requests.get(drives_url, headers=headers)

r.raise_for_status()

drives = r.json().get("value", [])

for d in drives:

    print("Drive name:", d["name"], "Drive id:", d["id"])

# If you know the library name, pick it:

target_library = "Documents" # or the library name you expect

drive_id = next((d["id"] for d in drives if d["name"] == target_library), None)

print("Selected drive id:", drive_id)

# COMMAND ----------

drive_id = "pick-a-value-from-above-output"

def list_all_items():

    token = get_graph_token()

    headers = {"Authorization": f"Bearer {token}"}

    url = f"https://graph.microsoft.com/v1.0/drives/{drive_id}/root/children"

    items = []

    while url:

        resp = requests.get(url, headers=headers)

        resp.raise_for_status()

        data = resp.json()

        items.extend(data.get("value", []))

        url = data.get("@odata.nextLink") # pagination

    return items

# COMMAND ----------

def download_file(item_id):

    token = get_graph_token()

    headers = {"Authorization": f"Bearer {token}"}

    url = f"https://graph.microsoft.com/v1.0/drives/{drive_id}/items/{item_id}/content"

    resp = requests.get(url, headers=headers)

    resp.raise_for_status()

    return resp.content

# COMMAND ----------

os.environ["STORAGE_CONNECTION_STRING"]

# COMMAND ----------

connection_string = os.environ["STORAGE_CONNECTION_STRING"]

# COMMAND ----------

import os

import mimetypes

from io import BytesIO

from typing import Optional, Dict

import requests

from azure.core.exceptions import ResourceExistsError, ServiceRequestError, ClientAuthenticationError, HttpResponseError

from azure.storage.blob import (

    BlobServiceClient,

    BlobClient,

    ContentSettings

)

# ----------------------------

# Azure Blob upload helpers

# ----------------------------

def _get_blob_service_client(

    *,

    connection_string: Optional[str] = None,

    account_url: Optional[str] = None,

    sas_token: Optional[str] = None

) -> BlobServiceClient:

    """

    Create a BlobServiceClient from one of:

      - connection_string

      - account_url + sas_token (e.g., https://<acct>.blob.core.windows.net/?<sas>)

    """

    if connection_string:

        return BlobServiceClient.from_connection_string(connection_string)

    if account_url and sas_token:

        # Ensure sas_token starts with '?'

        sas = sas_token if sas_token.startswith('?') else f'?{sas_token}'

        return BlobServiceClient(account_url=account_url, credential=sas)

    raise ValueError("Provide either connection_string OR (account_url AND sas_token).")

def upload_bytes_to_blob(

    data: bytes,

    *,

    container_name: str,

    blob_name: str,

    connection_string: Optional[str] = None,

    account_url: Optional[str] = None,

    sas_token: Optional[str] = None,

    content_type: Optional[str] = None,

    overwrite: bool = False,

    metadata: Optional[Dict[str, str]] = None

) -> str:

    """

    Uploads a bytes object to Azure Blob Storage and returns the blob URL.

    Parameters:

        data (bytes): File content.

        container_name (str): Target container name.

        blob_name (str): Target blob name (e.g., 'reports/file.pdf').

        connection_string/account_url/sas_token: Auth options.

        content_type (str): MIME type; if None, guessed from blob_name.

        overwrite (bool): Replace if exists.

        metadata (dict): Optional metadata key/value pairs.

    Returns:

        str: The URL of the uploaded blob.

    """

    # Guess content type if not provided

    if content_type is None:

        guessed, _ = mimetypes.guess_type(blob_name)

        content_type = guessed or "application/octet-stream"

    # Build client

    bsc = _get_blob_service_client(

        connection_string=connection_string,

        account_url=account_url,

        sas_token=sas_token

    )

    container_client = bsc.get_container_client(container_name)

    # Ensure container exists (idempotent)

    try:

        container_client.create_container()

    except ResourceExistsError:

        pass # already exists

    blob_client: BlobClient = container_client.get_blob_client(blob_name)

    # Upload

    try:

        content_settings = ContentSettings(content_type=content_type)

        # Use a stream to be memory-friendly for large files, though we already have bytes

        stream = BytesIO(data)

        blob_client.upload_blob(

            stream,

            overwrite=overwrite,

            metadata=metadata,

            content_settings=content_settings

        )

    except ResourceExistsError:

        if not overwrite:

            raise

    except ClientAuthenticationError as e:

        raise RuntimeError(f"Authentication failed when uploading blob: {e}") from e

    except (ServiceRequestError, HttpResponseError) as e:

        raise RuntimeError(f"Blob upload failed: {e}") from e

    # Construct and return URL (works for both conn string and SAS)

    return blob_client.url

# ----------------------------

# Orchestrator

# ----------------------------

def download_and_upload_to_blob(

    *,

    item_id: str,

    container_name: str,

    blob_name: str,

    connection_string: Optional[str] = None,

    account_url: Optional[str] = None,

    sas_token: Optional[str] = None,

    content_type: Optional[str] = None,

    overwrite: bool = False,

    metadata: Optional[Dict[str, str]] = None

) -> str:

    """

    Downloads a file from Microsoft Graph using the provided item_id and uploads it to Azure Blob Storage.

    Returns the blob URL.

    """

    # 1) Download bytes from Graph

    file_bytes = download_file(item_id)

    # 2) Upload to Blob

    blob_url = upload_bytes_to_blob(

        file_bytes,

        container_name=container_name,

        blob_name=blob_name,

        connection_string=connection_string,

        account_url=account_url,

        sas_token=sas_token,

        content_type=content_type,

        overwrite=overwrite,

        metadata=metadata,

    )

    return blob_url

# COMMAND ----------

items = list_all_items()

# COMMAND ----------

print(len(items))

# COMMAND ----------

print(items[0:3])

# COMMAND ----------

import json

print(json.dumps(items[0], indent=4))

# COMMAND ----------

from urllib.parse import urlparse, parse_qs, unquote

def after_ei_from_xml_location(url: str, *, decode: bool = True) -> str:

    """

    Extracts the substring after '/EI/' from the XmlLocation query parameter.

    Args:

        url: The full URL containing the XmlLocation query parameter.

        decode: If True, URL-decodes the result (default True).

    Returns:

        The substring after '/EI/' from XmlLocation, or an empty string if not found.

    """

    parsed = urlparse(url)

    qs = parse_qs(parsed.query)

    xml_loc_values = qs.get("XmlLocation")

    # print(f"xml_loc_values={xml_loc_values}")

    if not xml_loc_values:

        return "" # XmlLocation not present

    # Take the first XmlLocation value

    xml_loc = xml_loc_values[0]

    if decode:

        xml_loc = unquote(xml_loc)

    # print(f"xml_loc={xml_loc}")

    marker = "/EI/"

    if marker not in xml_loc:

        return "" # No /EI/ in the XmlLocation value

    return xml_loc.split(marker, 1)[1]

# COMMAND ----------

download_and_upload_to_blob(item_id=items[0]["id"], container_name="iris", blob_name="domestic/EI/" + after_ei_from_xml_location(url=items[0]['webUrl']), connection_string=connection_string)

# COMMAND ----------

max_size = max(entry["size"] for entry in items)

sum_size = sum(entry["size"] for entry in items)

len_items = len(items)

print(f"max_size={max_size}, sum_size={sum_size}, len_items={len_items}")

# COMMAND ----------

for item in items:

    try:

        download_and_upload_to_blob(item_id=item["id"], container_name="iris", blob_name="domestic/EI/" + after_ei_from_xml_location(url=item['webUrl']), connection_string=connection_string)

        print(f"{item["webUrl"]}")

    except Exception as e:

        print(f"Error: {e}, item_id={item["id"]}, item_url={item["webUrl"]}")

Reference: previous article for context: https://1drv.ms/w/c/d609fb70e39b65c8/IQBV3Sd02qPlRa_y13mxnxHSAa6mrM4rmM3pnvbWPW7RpIE?e=b1nGOi

Sample code to import sharepoint data to Azure Storage Account:

# Databricks notebook source

import os

os.environ["PROD_DSS_SHAREPOINT_CLIENT_ID"] = ""

os.environ["PROD_DSS_SHAREPOINT_CLIENT_SECRET"] = ""

os.environ["PROD_DSS_SHAREPOINT_TENANT_ID"] = ""

# COMMAND ----------

# Optional: use dbutils secrets for sensitive values

# client_secret = dbutils.secrets.get(scope="my-scope", key="sharepoint-client-secret")

# client_id = dbutils.secrets.get(scope="my-scope", key="sharepoint-client-id")

# tenant_id = dbutils.secrets.get(scope="my-scope", key="sharepoint-tenant-id")

# COMMAND ----------

import requests

from msal import ConfidentialClientApplication

from azure.identity import DefaultAzureCredential

from azure.storage.blob import BlobServiceClient

import json

import time

# === SharePoint App Registration ===

import os

client_id = os.environ["PROD_DSS_SHAREPOINT_CLIENT_ID"]

client_secret = os.environ["PROD_DSS_SHAREPOINT_CLIENT_SECRET"]

tenant_id = os.environ["PROD_DSS_SHAREPOINT_TENANT_ID"]

siteId = "site01/EI/"

listId = "Links"

authority = f"https://login.microsoftonline.com/{tenant_id}"

scope = ["https://graph.microsoft.com/.default"]

app = ConfidentialClientApplication(

    client_id,

    authority=authority,

    client_credential=client_secret

)

def get_graph_token():

    token = app.acquire_token_for_client(scopes=scope)

    return token["access_token"]

# COMMAND ----------

# === Azure Storage ===

storage_account = "someaccount01"

container_name = "container01"

credential = DefaultAzureCredential()

blob_service = BlobServiceClient(

    f"https://{storage_account}.blob.core.windows.net",

    credential=credential

)

container_client = blob_service.get_container_client(container_name)

# COMMAND ----------

checkpoint_blob = container_client.get_blob_client("_checkpoints/sharepoint_copied.json")

def load_checkpoint():

    try:

        data = checkpoint_blob.download_blob().readall()

        return set(json.loads(data))

    except Exception:

        return set()

def save_checkpoint(copied_ids):

    checkpoint_blob.upload_blob(

        json.dumps(list(copied_ids)),

        overwrite=True

    )

# COMMAND ----------

import requests

from msal import ConfidentialClientApplication

# The SharePoint host and site path from your URL

hostname = "uhgazure.sharepoint.com"

site_path = "/sites/site01/EI" # server-relative path (no trailing path/to/lists)

# MSAL app-only token

authority = f"https://login.microsoftonline.com/{tenant_id}"

app = ConfidentialClientApplication(client_id, authority=authority, client_credential=client_secret)

token = app.acquire_token_for_client(scopes=["https://graph.microsoft.com/.default"])

access_token = token.get("access_token")

headers = {"Authorization": f"Bearer {access_token}"}

# 1) Resolve site by path -> site-id

site_url = f"https://graph.microsoft.com/v1.0/sites/{hostname}:{site_path}"

r = requests.get(site_url, headers=headers)

r.raise_for_status()

site = r.json()

site_id = site["id"]

print("Site id:", site_id)

# 2) List drives (document libraries) for the site -> find drive id

drives_url = f"https://graph.microsoft.com/v1.0/sites/{site_id}/drives"

r = requests.get(drives_url, headers=headers)

r.raise_for_status()

drives = r.json().get("value", [])

for d in drives:

    print("Drive name:", d["name"], "Drive id:", d["id"])

# If you know the library name, pick it:

target_library = "Documents" # or the library name you expect

drive_id = next((d["id"] for d in drives if d["name"] == target_library), None)

print("Selected drive id:", drive_id)

# COMMAND ----------

drive_id = "pick-a-value-from-above-output"

def list_all_items():

    token = get_graph_token()

    headers = {"Authorization": f"Bearer {token}"}

    url = f"https://graph.microsoft.com/v1.0/drives/{drive_id}/root/children"

    items = []

    while url:

        resp = requests.get(url, headers=headers)

        resp.raise_for_status()

        data = resp.json()

        items.extend(data.get("value", []))

        url = data.get("@odata.nextLink") # pagination

    return items

# COMMAND ----------

def download_file(item_id):

    token = get_graph_token()

    headers = {"Authorization": f"Bearer {token}"}

    url = f"https://graph.microsoft.com/v1.0/drives/{drive_id}/items/{item_id}/content"

    resp = requests.get(url, headers=headers)

    resp.raise_for_status()

    return resp.content

# COMMAND ----------

os.environ["STORAGE_CONNECTION_STRING"]

# COMMAND ----------

connection_string = os.environ["STORAGE_CONNECTION_STRING"]

# COMMAND ----------

import os

import mimetypes

from io import BytesIO

from typing import Optional, Dict

import requests

from azure.core.exceptions import ResourceExistsError, ServiceRequestError, ClientAuthenticationError, HttpResponseError

from azure.storage.blob import (

    BlobServiceClient,

    BlobClient,

    ContentSettings

)

# ----------------------------

# Azure Blob upload helpers

# ----------------------------

def _get_blob_service_client(

    *,

    connection_string: Optional[str] = None,

    account_url: Optional[str] = None,

    sas_token: Optional[str] = None

) -> BlobServiceClient:

    """

    Create a BlobServiceClient from one of:

      - connection_string

      - account_url + sas_token (e.g., https://<acct>.blob.core.windows.net/?<sas>)

    """

    if connection_string:

        return BlobServiceClient.from_connection_string(connection_string)

    if account_url and sas_token:

        # Ensure sas_token starts with '?'

        sas = sas_token if sas_token.startswith('?') else f'?{sas_token}'

        return BlobServiceClient(account_url=account_url, credential=sas)

    raise ValueError("Provide either connection_string OR (account_url AND sas_token).")

def upload_bytes_to_blob(

    data: bytes,

    *,

    container_name: str,

    blob_name: str,

    connection_string: Optional[str] = None,

    account_url: Optional[str] = None,

    sas_token: Optional[str] = None,

    content_type: Optional[str] = None,

    overwrite: bool = False,

    metadata: Optional[Dict[str, str]] = None

) -> str:

    """

    Uploads a bytes object to Azure Blob Storage and returns the blob URL.

    Parameters:

        data (bytes): File content.

        container_name (str): Target container name.

        blob_name (str): Target blob name (e.g., 'reports/file.pdf').

        connection_string/account_url/sas_token: Auth options.

        content_type (str): MIME type; if None, guessed from blob_name.

        overwrite (bool): Replace if exists.

        metadata (dict): Optional metadata key/value pairs.

    Returns:

        str: The URL of the uploaded blob.

    """

    # Guess content type if not provided

    if content_type is None:

        guessed, _ = mimetypes.guess_type(blob_name)

        content_type = guessed or "application/octet-stream"

    # Build client

    bsc = _get_blob_service_client(

        connection_string=connection_string,

        account_url=account_url,

        sas_token=sas_token

    )

    container_client = bsc.get_container_client(container_name)

    # Ensure container exists (idempotent)

    try:

        container_client.create_container()

    except ResourceExistsError:

        pass # already exists

    blob_client: BlobClient = container_client.get_blob_client(blob_name)

    # Upload

    try:

        content_settings = ContentSettings(content_type=content_type)

        # Use a stream to be memory-friendly for large files, though we already have bytes

        stream = BytesIO(data)

        blob_client.upload_blob(

            stream,

            overwrite=overwrite,

            metadata=metadata,

            content_settings=content_settings

        )

    except ResourceExistsError:

        if not overwrite:

            raise

    except ClientAuthenticationError as e:

        raise RuntimeError(f"Authentication failed when uploading blob: {e}") from e

    except (ServiceRequestError, HttpResponseError) as e:

        raise RuntimeError(f"Blob upload failed: {e}") from e

    # Construct and return URL (works for both conn string and SAS)

    return blob_client.url

# ----------------------------

# Orchestrator

# ----------------------------

def download_and_upload_to_blob(

    *,

    item_id: str,

    container_name: str,

    blob_name: str,

    connection_string: Optional[str] = None,

    account_url: Optional[str] = None,

    sas_token: Optional[str] = None,

    content_type: Optional[str] = None,

    overwrite: bool = False,

    metadata: Optional[Dict[str, str]] = None

) -> str:

    """

    Downloads a file from Microsoft Graph using the provided item_id and uploads it to Azure Blob Storage.

    Returns the blob URL.

    """

    # 1) Download bytes from Graph

    file_bytes = download_file(item_id)

    # 2) Upload to Blob

    blob_url = upload_bytes_to_blob(

        file_bytes,

        container_name=container_name,

        blob_name=blob_name,

        connection_string=connection_string,

        account_url=account_url,

        sas_token=sas_token,

        content_type=content_type,

        overwrite=overwrite,

        metadata=metadata,

    )

    return blob_url

# COMMAND ----------

items = list_all_items()

# COMMAND ----------

print(len(items))

# COMMAND ----------

print(items[0:3])

# COMMAND ----------

import json

print(json.dumps(items[0], indent=4))

# COMMAND ----------

from urllib.parse import urlparse, parse_qs, unquote

def after_ei_from_xml_location(url: str, *, decode: bool = True) -> str:

    """

    Extracts the substring after '/EI/' from the XmlLocation query parameter.

    Args:

        url: The full URL containing the XmlLocation query parameter.

        decode: If True, URL-decodes the result (default True).

    Returns:

        The substring after '/EI/' from XmlLocation, or an empty string if not found.

    """

    parsed = urlparse(url)

    qs = parse_qs(parsed.query)

    xml_loc_values = qs.get("XmlLocation")

    # print(f"xml_loc_values={xml_loc_values}")

    if not xml_loc_values:

        return "" # XmlLocation not present

    # Take the first XmlLocation value

    xml_loc = xml_loc_values[0]

    if decode:

        xml_loc = unquote(xml_loc)

    # print(f"xml_loc={xml_loc}")

    marker = "/EI/"

    if marker not in xml_loc:

        return "" # No /EI/ in the XmlLocation value

    return xml_loc.split(marker, 1)[1]

# COMMAND ----------

download_and_upload_to_blob(item_id=items[0]["id"], container_name="iris", blob_name="domestic/EI/" + after_ei_from_xml_location(url=items[0]['webUrl']), connection_string=connection_string)

# COMMAND ----------

max_size = max(entry["size"] for entry in items)

sum_size = sum(entry["size"] for entry in items)

len_items = len(items)

print(f"max_size={max_size}, sum_size={sum_size}, len_items={len_items}")

# COMMAND ----------

for item in items:

    try:

        download_and_upload_to_blob(item_id=item["id"], container_name="iris", blob_name="domestic/EI/" + after_ei_from_xml_location(url=item['webUrl']), connection_string=connection_string)

        print(f"{item["webUrl"]}")

    except Exception as e:

        print(f"Error: {e}, item_id={item["id"]}, item_url={item["webUrl"]}")