Perspectives on Data organization following takeaways
from Microsoft Ignite 2023.
Based on the takeaways shared in an article
earlier, this section describes how AI will fundamentally change data
organizations within enterprises that want to embrace the semantic search
facilitated by models such as Llama2 and Phi2. These models bring smartness to
workflows and interactions, and it is inevitable that enterprises will strive
to leverage this form of understanding of data that was not possible until now.
AI needs data and the rich history of data organizations, platforms and their
evolutions have many lessons for the forthcoming wave of AI pervasiveness
throughout the data usages that will manifest as turnkey creation of custom AI
Applications.
When the data warehouses emerged in 1980, structured
business data and syntax for querying grew in popularity to the point that they
continue to be ubiquitous even after the accumulation of unstructured data by
2010. To cover both forms of data, data lakes became popular and by 2015 both
data warehouses and data lakes were required. This dual platform increased the
challenges around governance, security, reliability, and management. While file
formats supported both schema-based parquet file and delta lake, a unified form
of security, governance and cataloguing was required. Now as generative AI
helps to understand the semantics of data, a new layer is required to
democratize access to data, automating manual administration and seamlessly
bridging categories of data platforms.
The challenges data industry has been plagued with will not
disappear and the major ones will remain:
-
Technical skills in SQL , python, or Business
Intelligence that pose a steep learning curve
-
Enterprises struggling to curate data or scrub
it to maintain accuracy.
-
Complexity of managing distinct systems
increasing costs and requiring dedicated staff
-
Privacy, governance and compliance will require
maintaining lineage, and visibility
And to add to this, emerging data analytics models will
require developing and tuning LLMs in platforms that are separate from the data
and connect with them manually. Neither the data platforms understand
enterprise data organization, nor the data scientists understand why they must
be bothered with connecting data
Building on a foundation that unifies querying and managing
all forms of data across the enterprise is a great start for overcoming the
syntax associated with contents, metadata, queries, reports, lineages and
others but the deeper understanding facilitated by custom AI solutions will
demand the following features that will add yet another albeit necessary layer
for facilitating semantic organization and querying capabilities. These demands
include:
-
Natural language processing adaptability and
interfaces for the data
-
Semantic cataloguing and discovery because the
models can articulate discrepancies in data
-
Automated management as models can optimize data
layout, partitioning, and indexing
-
Enhanced governance and privacy as platforms
detect, classify, and prevent misuse of sensitive data while simplifying
management using natural language.
-
First class support for AI workloads because
they deliver accurate results
There is some precedent for this by way of Business
Intelligence layer but that was focused on a narrow slice of workloads while
this is broader and more sweeping. Ultimately, enterprises will seek to find
such a layer either from the products and cloud services they use or will build
one themselves.
No comments:
Post a Comment