Saturday, December 31, 2022

 

Data Modernization

Data technologies in recent years have popularized both structured and unstructured storage. This is fueled by applications that are embracing cloud resources. The two trends are happening simultaneously and are reinforcing each other.

Data modernization means moving data from legacy databases to modern databases. It comes at a time when many databases are doubling their digital footprint. Unstructured data is the biggest contributor to this growth and includes images, audio, video, social media comments, clinical notes, and such others. Organizations have shifted from a data architecture based on relational enterprise-based data warehouses to data lakes based on big data. If the survey from IT spends is to be believed, a great majority of organizations are already on their way towards data modernization with those in the Finance service firms leading the way. These organizations reported data security planning as part of their data modernization activities. They consider the tools and technology that are available in the marketplace as the third most important reason in their decision making.

Drivers for one-time data modernization plans include security and governance, strategy and plan, tools and technology, and talent. Data modernization is a key component of, or reason for, migrating to the cloud. The rate of adoption of external services in the data planning and implementation is about 44% for these organizations.

Data Lakes are popular for storing and handling Big Data and IoT events. It is not a massive virtual data warehouse, but it powers a lot of analytics and is the centerpiece of most solutions that conform to the Big Data architectural style. A data lake must store petabytes of data while handling bandwidths up to Gigabytes of data transfer per second. The hierarchical namespace of the object storage helps organize objects and files into a deep hierarchy of folders for efficient data access. The naming convention recognizes these folder paths by including the folder separator character in the name itself. With this organization and folder access directly to the object store, the performance of the overall usage of data lake is improved. A mere shim over the Data Lake Storage interface that supports file system semantics over blob storage is welcome for organizing and accessing such data. The data management and analytics form the core scenarios supported by Data Lake. For multi-region deployments, it is recommended to have the data landing in one region and then replicated globally. The best practices for Data Lake involve evaluating feature support and known issues, optimizing for data ingestion, considering data structures, performing ingestion, processing and analysis from several data sources and leveraging monitor telemetry. When the Data Lake supports query acceleration and analytics framework, it significantly improves data processing by only retrieving data that is relevant to an operation. This cascades to reduced time and processing power for the end-to-end scenarios that are necessary to gain critical insights into stored data. Both ‘filtering predicates' and ‘column projections’ are enabled, and SQL can be used to describe them. Only the data that meets these conditions are transmitted.  A request processes only one file so joins, aggregates and other query operators are not supported but the request can be in any format such as csv or json file formats. The query acceleration feature isn’t limited to Data Lake Storage. It is supported even on Blobs in storage accounts that form the persistence layer below the containers of the data lake. Even those without hierarchical namespace are supported by the Data Lake query acceleration feature. The query acceleration is part of the data lake so applications can be switched with one another, and the data selectivity and improved latency continues across the switch. Since the processing is on the side of the Data Lake, the pricing model for query acceleration differs from that of the normal transactional model. Fine grained access control lists and active directory integration round up the data security considerati

Friday, December 30, 2022

 Migrating sensitive data to the cloud – a detailed look

A checklist helps with migrating sensitive data to the cloud and provides benefits to overcome the common pitfalls regardless of the source of the data. It serves merely as a blueprint for a smooth secure transition.


Characterizing permitted use is the first step data teams need to take to address data protection for reporting. Modern privacy laws specify not only what constitutes sensitive data but also how the data can be used. Data teams must classify the usages and the consumers. Once sensitive data is classified, and purpose-based usage scenarios are addressed, role-based access control must be defined to protect future growth. Examples of sensitive data include confidential corporate information, information licensed for use under a data use agreement, privileged attorney-client data, export-controlled research, details of implemented security controls, credit card and other payment information, public safety information, username/password combinations, calendars and individual schedules, Email, intellectual property and trade secrets, and corporate operations improvement. Sensitive data detection and classification involves both the automated and human assessments to review potentially sensitive data and categorize it to the appropriate data set. Understanding the data consumers and their data usages simplifies building access control policies which in turn reduces administration and enhances security.

Devising a strategy for governance is the next step. This is meant to prevent intruders and is meant to boost data protection by means of encryption and database management. Fine grained access control such as attribute or purpose-based ones also help in this regard. By separating the storage from compute, erstwhile access control policies can be done away with. With unlimited storage and computational resources and data virtualization that simplify data models, access controls can be reframed to accommodate these characteristics.


Embracing a standard for defining data access policies can help to limit the explosion of mappings between users and the permissions for data access. This gains significance when a monolithic data management environment is migrated to the cloud. Failure to establish a standard for defining data access policies can lead to unauthorized data exposure. This is also an opportunity to simplify standardization of data access policies by replacing user-based data policies with abstractions that can manage different data attributes, abstract roles and defined contexts.


When migrating to the cloud in a single stage, an all-at-once data migration must be avoided as it is operationally risky. It is critical to develop a plan for incremental migration that facilitates development testing and deployment of a data protection framework which can be applied to ensure proper governance. Decoupling data protection and security policies from the underlying platform allows organizations to tolerate subsequent migrations. Data protection and governance can be made portable which allows us to move from one cloud service provider to another. The data security and protection aspect of any migration plan must be simplified so it can be easily understood by teams under rotation.


There are different types of sanitizations such as redaction, masking, obfuscation, encryption tokenization and format preserving encryption. Among these static protection in which clear text values are sanitized and stored in their modified form and dynamic protection in which clear text data is transformed into a ciphertext are most used. Static data protection is not flexible to meet the opposing demands for visibility by different sets of consumers. Dynamic enforcement, on the other hand, avoids the need to make copies and enables control based on user-attributes.

Finally defining and implementing data protection policies brings several additional processes such as validation, monitoring, logging, reporting, and auditing. Having the right tools and processes in place when migrating sensitive data to the cloud will allay concerns about compliance and provide proof that can be submitted to oversight agencies.

Compliance goes beyond applying rules and becomes a process to verify that laws are observed. The right tools and processes can allay concerns about compliance.


Thursday, December 29, 2022

 

Migrating sensitive data to the cloud
This part of the application modernization journey begins with the classification step. With the economic performance and scalability benefits of cloud computing the data breaches go unnoticed until it is too late. Part of the planning for the modernization of the application involves preparation and awareness of all the data either at rest or in transit. The emergence of data protection laws in geographical areas including the United States, such as the GDPR, the CCPA and others aim for protection of personally identifiable information aka PII. Laws add complexity associated with consumer rights over data youth and data sharing restricting access to how the data may be handled development teams often regard these regulations as a pain point but building full transparency that enables detailed audits and reports at the data level is just as important. The data teams must build a level of compliance with information on what data was accessed by whom, when and for what purpose. As personal and sensitive data proliferate to satisfy ever increasing business requirements the potential for internal misuse of data along with the diligence to comply with data regulations poses significant challenges and must be tamed during the planning stage itself. This helps data engineers who fear clauses about their personal liability and promotes mechanisms for managing consent for using the data. Traditional applications might not have prepared for these regulations and consents, so this is an opportunity for application modernization to tackle these along with the migration and modernization stages.
A caveat about these regulations must be called out. Many laws and regulations dictate different aspects of data protection such as disclosure of financial data documentation for Food and Drug production research and other industries might have standards that augment existing regulations, and the public cloud comes with certain built-in considerations and guarantees for data protection however the checklist of certifications to be met must still be ratified by the stakeholders. All these rules required direct careful handling and protection of data against exposure. The legal and ethical implications of mishandling sensitive data is left out of scope and cited as data privacy engineering discipline.
That said, a checklist to help with migrating sensitive data to the cloud can still provide benefits to overcome the common pitfalls regardless of the source of the data. It serves merely as a blueprint endless the foundation for a smooth secure transition.
Characterizing permitted use is the first step data teams need to take to address data protection for reporting. Modern privacy laws specify not only what constitutes sensitive data but also how the data can be used. Data obfuscation and redacting can help with protecting against exposure. In addition, data teams must classify the usages and the consumers. Once sensitive data is classified, and purpose-based usage scenarios are addressed, role-based access control must be defined to protect future growth.
Devising a strategy for governance is the next step; this is meant to prevent intruders and is meant to boost data protection by means of encryption and database management. Fine grained access control such as attribute or purpose-based ones also help in this regard.
Embracing a standard for defining data access policies can help to limit the explosion of mappings between users and the permissions for data access; this gains significance when a monolithic data management environment is migrated to the cloud. Failure to establish a standard for defining data access policies can lead to unauthorized data exposure.
When migrating to the cloud in a single stage with all at once data migration must be avoided as it is operationally risky. It is critical to develop a plan for incremental migration that facilitates development testing and deployment of a data protection framework which can be applied to ensure proper governance. Decoupling data protection and security policies from the underlying platform allows organizations to tolerate subsequent migrations.
There are different types of sanitizations such as redaction masking, obfuscation encryption tokenization and format preserving encryption. Among these static protection in which clear text values are sanitized and stored in their modified form and dynamic protection in which clear text data is transformed into a ciphertext are most used.
Finally defining and implementing data protection policies brings several additional processes such as validation monitoring logging reporting and auditing. Having the right tools and processes in place when migrating sensitive data to the cloud will allay concerns about compliance and provide proof that can be submitted to oversight agencies.

Wednesday, December 28, 2022

 

This is a continuation of a series of articles on Application Modernization. In this section, we take a slightly different perspective on the Build-Deploy-Run aspect of the new software.

When we modernize an existing application, we can ease our move to the cloud with the full promise of cloud technology. With a cloud native microservice approach, scalability, and flexibility inherent to the cloud can be taken advantage of.  Modernizing the cloud native applications enables applications to run concurrently and seamlessly connect with existing investments. Barriers that prohibit productivity and integration are removed.

One of the tenets of modernizing involves "Build-once-and-deploy-on-any-cloud". This process begins with assessing the existing application, building the applications quickly, automating the deployments for productivity and running and consistently managing the modernized application.

Identifying applications that can be readily moved into the cloud platform and those that require refactoring is the first step because the treatments of lift-and-shift and refactoring are quite different. Leveraging containers as the foundation for applications and services is another aspect.

Automating deployments for productivity with a DevOps pipeline makes it quick and reliable.  A common management approach to consolidate operations for all applications ensures faster problem resolution.

When the application readiness is assessed, there are four tracks of investigation: cloud migration, cost reduction, agile delivery and innovation resulting in virtual machines in the cloud for migration purposes or containers for repackaging, re-platforming and refactoring respectively - all of these in the build phase of the build-deploy and run. While VMs are handled by migration accelerators in the deploy phases, the containers are handled by the modern DevOps pipelines in the deploy phase. The modern application runtimes for containers are also different from the common operations on virtual machines between the migration and modernization paths in the run phase. Finally, the migration results in a complex relocated traditional application while the modernization results in traditional application via repackaging, cloud ready application via re-platforming and cloud native application via refactoring.

Application Modernization goes together with cutting costs. Containers and Microservices lower costs not as an afterthought but with the design decision itself. The modernization journey has many approaches between assessment phase and deployment. These can be graded as follows:

1.       Containerizing the whole application simplifies the transition to the cloud. This is a migration-based approach that takes the least toll.

2.       Exposing on-premises assets with API helps to replace a monolith with SaaS. This strategy works well for legacy assets that are difficult to move to the cloud.

3.       When costs must be driven even lower, refactoring the monolith into the services helps a lot.

4.       It might even make sense to add new microservices, to innovate incrementally, reduce complexity, and establish success early. This is a transform and innovate approach.

5.       Finally, with agile delivery, we can sunset the monolith we started out with.

It helps to increase the delivery velocity throughout the modernization journey with one or more of these approaches.

A trusted foundation often helps with re-platforming.  Container platforms like Kubernetes and OpenShift help both the developers and the operations Staff. Native cloud services are also popular for certain resources and for hosting. Often with a continuous integration and continuous deployment involving infrastructure-as-a-code, the destination can be one or another cloud requiring the investment only once. Modernization platforms, tools and services are not mature yet to the point where they onboard customer legacy applications directly so must be done by hand today and possibly with container platforms and other cloud resources.

Application, data, integration, automation, multi-cloud management, and security are all considered to enable a faster and more reliable way to move to the cloud.

It’s best to invest in a strategy that involves the intersection of innovation, modernization, and DevOps. Developing innovative cloud native applications, modernizing, and leveraging investments, and creating an agile DevOps culture are part of this strategy.

Reference:

1.       Reverse engineering: https://1drv.ms/w/s!Ashlm-Nw-wnWhMNPrSvPK-uwYpT3Lw?e=sILWQZ

2.       Dependency Complexity: https://1drv.ms/w/s!Ashlm-Nw-wnWhLlVVpSxd7gT9ddmhw?e=2CAili   

3.       Application Modernization: https://1drv.ms/w/s!Ashlm-Nw-wnWhMMQ1jDEwSF-4ALIOw?e=YsdVQM   

4.       Modernization tool for calculating metrics – Javier Luis et al.

 

Tuesday, December 27, 2022

 Reducing Trials and Errors 

Model: When trials and errors are scattered in their results, an objective function that can measure the cost or benefit will help with convergence. If the samples are large, a batch analysis mode is recommended. The approach to minimize or maximize the objective function is also possible via gradient descent methods but the use of simulated annealing can overcome local minimum even if the cost is higher because it will accept with a certain probability. In Simulated annealing, the current cost is computed, and the new cost is based on the direction of change. If the cost improves, the temperature decreases. 
Sample implementation follows: 

def annealingoptimize(domain,costf,T=10000.0,cool=0.95,step=1): 
     # Initialize the values randomly 
     vec=[float(random.randint(domain[i][0],domain[i][1])) 
          for i in range(len(domain))] 
     while T>0.1: 
          # Choose one of the indices 
          i=random.randint(0,len(domain)-1) 
          # Choose a direction to change it 
          dir=random.randint(-step,step) 
          # Create a new list with one of the values changed 
          vecb=vec[:] 
          vecb[i]+=dir 
          if vecb[i]<domain[i][0]: vecb[i]=domain[i][0] 
          elif vecb[i]>domain[i][1]: vecb[i]=domain[i][1] 

          # Calculate the current cost and the new cost 
          ea=costf(vec) 
          eb=costf(vecb) 
          p=pow(math.e,(-eb-ea)/T) 
          # Is it better, or does it make the probability 
          # cutoff? 
          if(eb<ea or random.random( )<p): 
               vec=vecb 
          # Decrease the temperature 
          T=T*cool 
     return vec