Saturday, November 18, 2023

 

Perspectives on Data organization following takeaways from Microsoft Ignite 2023.

Based on the takeaways shared in an article earlier, this section describes how AI will fundamentally change data organizations within enterprises that want to embrace the semantic search facilitated by models such as Llama2 and Phi2. These models bring smartness to workflows and interactions, and it is inevitable that enterprises will strive to leverage this form of understanding of data that was not possible until now. AI needs data and the rich history of data organizations, platforms and their evolutions have many lessons for the forthcoming wave of AI pervasiveness throughout the data usages that will manifest as turnkey creation of custom AI Applications.

When the data warehouses emerged in 1980, structured business data and syntax for querying grew in popularity to the point that they continue to be ubiquitous even after the accumulation of unstructured data by 2010. To cover both forms of data, data lakes became popular and by 2015 both data warehouses and data lakes were required. This dual platform increased the challenges around governance, security, reliability, and management. While file formats supported both schema-based parquet file and delta lake, a unified form of security, governance and cataloguing was required. Now as generative AI helps to understand the semantics of data, a new layer is required to democratize access to data, automating manual administration and seamlessly bridging categories of data platforms.

The challenges data industry has been plagued with will not disappear and the major ones will remain:

-          Technical skills in SQL , python, or Business Intelligence that pose a steep learning curve

-          Enterprises struggling to curate data or scrub it to maintain accuracy.

-          Complexity of managing distinct systems increasing costs and requiring dedicated staff

-          Privacy, governance and compliance will require maintaining lineage, and visibility

And to add to this, emerging data analytics models will require developing and tuning LLMs in platforms that are separate from the data and connect with them manually. Neither the data platforms understand enterprise data organization, nor the data scientists understand why they must be bothered with connecting data

Building on a foundation that unifies querying and managing all forms of data across the enterprise is a great start for overcoming the syntax associated with contents, metadata, queries, reports, lineages and others but the deeper understanding facilitated by custom AI solutions will demand the following features that will add yet another albeit necessary layer for facilitating semantic organization and querying capabilities. These demands include:

-          Natural language processing adaptability and interfaces for the data

-          Semantic cataloguing and discovery because the models can articulate discrepancies in data

-          Automated management as models can optimize data layout, partitioning, and indexing

-          Enhanced governance and privacy as platforms detect, classify, and prevent misuse of sensitive data while simplifying management using natural language.

-          First class support for AI workloads because they deliver accurate results

There is some precedent for this by way of Business Intelligence layer but that was focused on a narrow slice of workloads while this is broader and more sweeping. Ultimately, enterprises will seek to find such a layer either from the products and cloud services they use or will build one themselves.

No comments:

Post a Comment