Saturday, April 30, 2022

 

This is a continuation of a series of articles on crowdsourcing application and including the most recent article. The original problem statement is included again for context. 

Social engineering applications provide a wealth of information to the end-user, but the questions and answers received on it are always limited to just that – social circle. Advice solicited for personal circumstances is never appropriate for forums which can remain in public view. It is also difficult to find the right forums or audience where the responses can be obtained in a short time. When we want more opinions in a discrete manner without the knowledge of those who surround us, the options become fewer and fewer. In addition, crowd-sourcing the opinions for a personal topic is not easily available via applications. This document tries to envision an application to meet this requirement. 

The previous article continued the elaboration on the usage of the public cloud services for provisioning queue, document store and compute. It talked a bit about the messaging platform required to support this social-engineering application. The problems encountered with social engineering are well-defined and have precedence in various commercial applications. They are primarily about the feed for each user and the propagation of solicitations to the crowd. The previous article described selective fan out. When the clients wake up, they can request their state to be refreshed. This perfects the write update because the data does not need to be sent out. If the queue sends messages back to the clients, it is a fan-out process. The devices can choose to check-in at selective times and the server can be selective about which clients to update. Both methods work well in certain situations. The fan-out happens in both writing as well as loading. It can be made selective as well. The fan-out can be limited during both pull and push. Disabling the writes to all devices can significantly reduce the cost. Other devices can load these updates only when reading. It is also helpful to keep track of which clients are active over a period so that only those devices get preference.   

Any cloud service or application is not complete without manageability and reporting. The service/application can choose to offer a set of packaged queries available for the user to choose from a dropdown menu while internalizing all query processing, their execution, and the return of the results. One of the restrictions that comes with packaged queries exported via REST APIs is their ability to scale since they consume significant resources on the backend and continue to run for a long time. These restrictions cannot be relaxed without some reduction on their resource usage. The API must provide a way for consumers to launch several queries with trackers and they should be completed reliably even if they are done one by one.  This is facilitated with the help of a reference to the query and a progress indicator. The reference is merely an opaque identifier that only the system issues and uses to look up the status. The indicator could be another api that takes the reference and returns the status. It is relatively easy for the system to separate read-only status information from read-write operations so the number of times the status indicator is called has no degradation on the rest of the system. There is a clean separation of the status information part of the system which is usually periodically collected or pushed from the rest of the system. The separation of read-write from read-only also helps with their treatment differently. For example, it is possible to replace the technology for the read-only separately from the technology for read-write. Even the technology for read-only can be swapped from one to another for improvements on this side.

The design of all REST APIs generally follows a convention. This practice gives well recognized uri qualifier patterns, query parameters and methods. Exceptions, errors and logging are typically done with the best usage of http protocol.

Friday, April 29, 2022

 

This is a continuation of the article that introduces a crowdsourcing application. The original problem statement is included again for context.

Social engineering applications provide a wealth of information to the end-user, but the questions and answers received on it are always limited to just that – social circle. Advice solicited for personal circumstances is never appropriate for forums which can remain in public view. It is also difficult to find the right forums or audience where the responses can be obtained in a short time. When we want more opinions in a discrete manner without the knowledge of those who surround us, the options become fewer and fewer. In addition, crowd-sourcing the opinions for a personal topic is not easily available via applications. This document tries to envision an application to meet this requirement.

The previous approach leveraged public cloud services for provisioning queue and document store. It talked a bit about the messaging platform required to support this social-engineering application. The problems encountered with social engineering are well-defined and have precedence in various commercial applications. They are primarily about the feed for each user and the propagation of solicitations to the crowd.

In this section, we refer to the compute requirements for these posts and their responses. The choice of the products or cloud services or their mode of deployment or their SKU is left out of this discussion. The queue can support millions of requests of a few hundred bytes each. The state of the document whose responses are to be collected, is kept in a document store and the state can be changed both by virtue of the processing of the requests in a queue or by administrative actions on the document. The database does not have any exposure to the clients directly other than the queue. This enables the database to be the source of truth for the client state. The queue can have questions or crowdsourced answers and the update to a document is bidirectional. When the clients wake up, they can request their state to be refreshed. This perfects the write update because the data does not need to be sent out. If the queue sends messages back to the clients, it is a fan out process. The devices can choose to check-in at selective times and the server can be selective about which clients to update. Both methods work well in certain situations. The fan-out happens in both writing as well as loading. It can be made selective as well. The fan-out can be limited during both pull and push. Disabling the writes to all devices can significantly reduce the cost. Other devices can load these updates only when reading. It is also helpful to keep track of which devices are active over a period so that only those devices get preference. 

The library that automates the translation of states to messages and back supports parallelization so that each worker can take one message or client state at a time and perform the conversion. The translation between state and message is one-to-one mapping and the workers are also assigned the ownership of the translation so that there is no overlap between the tasks executed by the workers.  The conversion can happen multiple times so the workers can support multiple stage workflows independent of the clients simply by constructing internal messages for other workers to pick up. All the activities of the workers are logged with the timestamp of the message, the identity of the client for which the state is being synchronized and the identity of the worker. These logs are stored in a way that they can be indexed and searched based on these identifiers for troubleshooting purposes.

The workers can also execute web requests to target the clients directly. They have access to the queue, the database, and the clients. The background jobs that create these workers can be scheduled or periodic or in some cases polled from the queue so that a message on arrival can be associated with a worker. This completes the system of using background workers to perform automation of posting feeds to clients. With a one-to-one mapping between messages and workers and having several workers, it becomes easy to scale the system to handle many clients. Clients are unique by installation on a phone or a mobile handheld device or a web browser.

Thursday, April 28, 2022

Part 2 of previous post

 

This is a continuation of the article that introduces a crowdsourcing application. The original problem statement is included again for context:

Social engineering applications provide a wealth of information to the end-user but the questions and answers received on it are always limited to just that – social circle. Advice solicited for personal circumstances is never appropriate for forums which can remain in public view. It is also difficult to find the right forums or audience where the responses can be obtained in a short time. When we want more opinions in a discrete manner without the knowledge of those who surround us, the options become fewer and fewer. In addition, crowd-sourcing the opinions for a personal topic is not easily available via applications. This document tries to envision an application to meet this requirement.

The solution proposed in the linked article suggests that crowdsourcing is a collaboration software much like the social engineering applications that are powered by massive compute and storage using services in the backend and the clients as mobile device applications.  But the deployment of the software does not have to be directly to several datacenters. Instead, its roadmap could involve the use of a public cloud where the services are already available to enable fewer overheads and less total cost of ownership.

When a user poses a question, she creates a campaign. This campaign is represented by a document that must support collaborative editing. So, it must be versioned, and merges must be resolved by incrementing the version. Cosmos DB works well for document stores and enables services to implement updates and merge resolutions. It is also highly available and scales large storage for hundreds of users. It comes with read-only and read-write separation of regions so that the traffic can be efficiently separated. Since a single document will not likely be answered by more than a thousand responders for the duration of the question and there are likely to be a large number of documents, this problem is solved sufficiently by the suggested database.

Communication between clients is one of the messaging frameworks. This unlike storage has a sliding scale for performance and suitable technology depending on massively scalable queues and topics. The Service Bus messaging architecture works well in this regard. The emphasis is on scale so we can consider a few technologies for comparison.

The system.messaging library transparently exposes the underlying Message queuing windows APIs . For example, it provides GetPublicQueues method that enumerates the public message queues. It takes the message queue criteria as a parameter. This criterion can be specified with parameters such as category and label. It can also take machine name or cluster name, created and modified times as filter parameters. The GetPublicQueuesEnumerator is available to provide an enumerator to iterate over the results. 

Amazon Simple Queue Services offers an almost unparalleled fully managed message queuing service that makes it easy to integrate with microservices and serverless applications. SQS also makes it simple and cost-effective to intermediate between these producers and consumers.

Then, there is the highly performant Erlang-based messaging framework. It offers free texting in a SMS world. There are no ads, no gimmicks, no games and no feeds. They have hundreds of nodes, thousands of cores, hundreds of terabytes of RAM, serve billions of smartphones. Their server infrastructure is based on Erlang/FreeBSD.  The Erlang is a programming language that is used to build massively scalable soft real-time systems with requirements on high availability.  With this programming language they send out about 70 million messages per second.

The WhatsApp architecture is famous for “it just works” because only a phone number is used as the address to send messages to and messages can be voice, video, images or text.   It also features voice message, group chats and read-receipts.

Together with messaging and storage, we address most of the dependencies for compute.

Wednesday, April 27, 2022

Writing an App to crowd-source free advice:


Problem statement:

Social engineering applications provide a wealth of information to the end-user but the questions and answers received on it are always limited to just that – social circle. Advice solicited for personal circumstances is never appropriate for forums which can remain in public view. It is also difficult to find the right forums or audience where the responses can be obtained in a short time. When we want more opinions in a discrete manner without the knowledge of those who surround us, the options become fewer and fewer. In addition, crowd-sourcing the opinions for a personal topic is not easily available via applications. This document tries to envision an application to meet this requirement.

Solution:

Let us say that most users can frame their questions in the form of one that can be answered with a yes or no. Then the problem of finding an answer is merely one of crowdsourcing it. Finding the audience is not at all hard if people are rewarded for their participation. For example, answering a certain number of questions buys the opportunity to ask a question. With a large sample set, the answers can be considered as grounded as possible. With this basic scenario, we now describe the tenets of an application that can serve this purpose.

Let us visualize a landscape of several clients that have a question-and-answer web page with the bidirectional option of responding to questions from others and asking questions for self.  As with all social engineering applications, this requires sharing with a lot of other clients to solicit a reply. These applications are fundamentally designed as a large-scale storage, a relational database for initial lookup and web interfaces to serve the query and response. Presto, for example, makes it easy to query via SQL against a backdrop of big data.

Let us instead consider a cloud-native web service and storage that facilitates this data processing. Then the problem of crowdsourcing simplifies to one of synchronization between many publishers and subscribers. We can take the example of Azure public cloud to discuss a solution for synchronization, but the approach is by no means limited to a cloud or a technology stack. Microsoft Intune is a cloud-based service that manages devices and their applications. These devices can include mobile phones, tablets and notebooks. It can help configure specific policies to control applications. It allows people in the organization to use their devices for school or work. The data stays protected, and the organizational data can be isolated away from the personal data on the same device. It is part of Microsoft’s Enterprise Mobility and Security EMS suite. It integrates with the Azure Active Directory to control who has access and what they can access. It integrates with Azure Information Protection for data protection.  So, if we view an organization as the company that offers this crowdsourcing application with an active directory to manage its members, then we can leverage an out-of-box technology to achieve synchronization between thousands of clients. The quantitative analysis of the application instances, data and compute are data left out of scope of this article for sake of elaboration only on the principles of synchronization and push notifications

Since it is a cloud service, it can work directly with clients over the internet, or be comanaged with Configuration Manager and Intune. The rules and configuration settings can be set on personal, and organization owned devices to access data and networks. Authenticated applications can be deployed on devices. The company information can be protected by controlling the way users' access and share information. The devices and applications can be made compliant with the security requirements. The users must opt into the management with Intune using their devices. Users can opt in for partial or full control by organization administrators. These administrators can add and assign mobile apps to user groups and devices, configure apps to start or run with specific settings enabled and update existing apps already on the device, see reports on which apps are used and track their usage and do a selective wipe by removing only organization data from apps. App protection policies include using Azure AD identity to isolate organization data from personal data, helping secure access on personal devices, and enrolling devices

Tuesday, April 26, 2022

 Q: How does one add a claim to an HTTP request? 

A: ClaimProvisioning topic has been explained here. A claim is a combination of a claim type, right, and a value. A claim set is a set of claims issued by an issuing authority.  A claim can be a DNS, email, hash, name, RSA, sid, SPN, system, thumbprint, Uri, and X500DistinguishedName type.  An evaluation context is a context in which an authorization policy is evaluated. It contains properties and claim sets and once the evaluation is complete, it results in an authorization context once authorization policies are evaluated. An authorization policy is a set of rules for mapping a set of input claims to a set of output claims and when evaluated, the resulting authorization context has a set of claims sets and zero or more properties. An identity claim in an authorization context makes a statement about the identity of the entity. A group of authorization policies can be compared to a machine that makes keys. When the policies are evaluated a set of claims is generated, it is like the shape of the key. This key is stored in the form of an authorization context and can be used to open some locks. The granting of access is the opening of a lock Identity model does not mandate how the claims should be authored but it requires that the set of required claims must be satisfied by those in the authorization context.   

 

An HTTP request is made with the help of an authentication header. This has a bearer token when the request needs to be authenticated and authorized. The token is issued to an identity and Windows lists many forms of identity including but not restricted to UserCredential, ApplicationCertificateThumbprint, ApplicationClientId, ApplicationKey, ApplicationNameForTracing, ApplicationToken, Authority, EmbeddedManagedIdentity, ApplicationCertificateSubjectDistinguishedName, ApplicationCertificateIssuerDistinguishedName, and ApplicationCertificateSendPublicCertificate. These are all convertible to a claims principal as follows: 

var claimsPrincipal = HttpContext.Current.User as ClaimsPrincipal; 

 This claims Principal can now be converted to a ClaimsIdentity to which additional claims can be added. The following is an example to do just that. 

using System.Security.Claims; 

using System.Security.Cryptography.X509Certificates; 

using System.Security.Principal; 

                    bool found = false; 

                    var claimsPrincipal = HttpContext.Current.User as ClaimsPrincipal; 

                    foreach (var claim in claimsPrincipal.Claims) 

                    { 

                        if (claim.ToString().Contains("Administrator")) 

                        { 

                            found = true; 

                        } 

                    } 

                    if (!found) 

                    { 

                        var claimValue = string.Format("claim://{0}", "Administrator"); 

                        claimsPrincipal.AddIdentity(new ClaimsIdentity(new List<Claim>() { new Claim(ClaimTypes.Role, claimValue) })); 

                        HttpContext.Current.User = claimsPrincipal; 

                    } 


The HttpContext is resolved on the server side. 

Note that on the client side, the user is usually resolved as System.Security.Principal.WindowsIdentity.GetCurrent()

And then this can be used as a ClaimsPrincipal. 

Again, once the claimsPrincipal has been granted all the additional claims, then the token can retrieved with the new ClaimsPrincipal. 

The issuing authority for a security token does not have to be of the same type as the consumer. Domain controllers issue Kerberos tickets and X.509 certificate authorities issue chained certificates. A token that contains claims is issued by a web application or web service that is dedicated to this purpose. The relying parties are the claim-aware applications and the claims-based applications. These can also be web applications and services, but they are usually different from the issuing authorities. When it gets a token, the relying parties extract claims from the tokens to perform specific identity related tasks.

Interoperability between issuing authorities and relying parties is maintained by a set of industry standards. A policy for the interchange is retrieved with the help of a metadata exchange and the policy itself is structured. Sample standards include Security Assertion Markup Language which is an industry recognized XML vocabulary to represent claims. 

A claim to token conversion service is common to an identity foundation. It extracts the user's principal name as a claim from heterogeneous devices, applications and services and generates an impersonation token granting user level access to those entities.

Once a claim has been added to the claimsPrincipal, a token can be retrieved like the example here with suitable parameters for the AcquireTokenAsync call:

        private static string GetToken()

        {

            string clientID = "Client ID";

            string AuthEndPoint = "https://login.microsoftonline.com/{0}/oauth2/token";

            string TenantId = "Tenant ID";

            string redirectUri = "https://login.microsoftonline.com/common/oauth2/nativeclient";

            string resourceUri = "https://analysis.windows.net/powerbi/api";

            string authority = string.Format(CultureInfo.InvariantCulture, AuthEndPoint, TenantId);

            AuthenticationContext authContext = new AuthenticationContext(authority);

            string token = authContext.AcquireTokenAsync(resourceUri, clientID, new Uri(redirectUri), new PlatformParameters(PromptBehavior.Auto)).Result.AccessToken;

            return token;

        }

Monday, April 25, 2022

Improving queries part 9

Improving queries part 9

 

This is a continuation of the best practice in writing Kusto queries as introduced here.    

and continued with the log queries for Analytics such as with Azure Monitor Logs. While part 5, 6, and 7 recognized optimizations, best practices and the management aspects of Kusto queries, this discusses expressions which come useful in log queries that span across workspaces and apps. Such queries give a system wide view of the data.  The following section describes the translations between SQL queries and Kusto.

Kusto supports a subset of SQL language.  These translations do not necessarily need to be worked out. The Kusto operator EXPLAIN when prefixed to a SQL statement translates it to the corresponding Kusto query.

Some operators are now called out with tips for best performance. Selection of data from a SQL table usually involves a row set or a column set. These can be correspondingly called in KQL with the where filter and the project operator for selecting columns. The rules referred to in the earlier parts of this article about aggressively reducing the size of the data holds.

The popular test for data to be null can be tried out with isnotnull(operator). Data checking must be towards the last after the query has utilized the schema to narrow down the result-sets.

The comparisons with date are simplified with the ago(1d) like operators. The datetime range can be described with the <= and >= operators. Time part of the queries must be applied earliest in the set of filters.

The test for data to have literals and patterns also have equivalents with the has, contains and approximation operators.

The grouping operations are substituted with the summarize operator. The dcount(type) operator uses hyperloglog algorithm. These can be computationally expensive so the data spread can be reduced whenever possible.

The column aliases and calculations are made possible with the Extend operator.

The ordering operators and behavior remain similar.

Packing and unpacking data between json and table is made possible with the mv-extend, bag_remove_keys and extractjson operators.

Convert to json:

T

| extend PackedRecord = pack_all()

| summarize Result = make_list(PackedRecord)

And back with:

let result =

print myDynamicValue = dynamic(<jsonArray>

)

| mvexpand myDynamicValue

| evaluate bag_unpack(myDynamicValue);

result

 

The jsonpath notations are supported by the ‘$’ for root object, ‘.’ or ‘[]’ for child object and [] for array subscript.

Reference: Sample queries

Sunday, April 24, 2022

 Improving queries part 8 

This is a continuation of the best practice in writing Kusto queries as introduced here.    

and continued with the log queries for Analytics such as with Azure Monitor Logs. While parts 5, 6, and 7 recognized optimizations, best practices, and the management aspects of Kusto queries, this discusses expressions that come useful in log queries that span across workspaces and apps. Such queries give a system-wide view of the data. 

When the data is stored in multiple workspaces and apps, the log queries can either specify the workspace and app details or they can use resource-context queries and the query in the context of a specific resource. It is important to remember that the limits of a hundred resources and workspaces apply to a single query. Workspaces can be referred to by their resource name, qualified name, workspace ID, and Azure Resource ID 

The app() expression is used to refer to a Workspace-based application insights resource and is used to retrieve data from a specific application in the same resource group, another resource group, or another subscription. An App can show all the requests it has received as well as heartbeats. 

The workspace() expression is used to refer to a workspace in the same resource group, another resource group,  or another subscription. Read access to the workspace is required. 

The resource() expression is used in the Azure Monitor query() scoped to a resource to retrieve data from other resources. The URL path for the resource from the Azure Portal can be used to refer to the resource and the same holds for the resource group. Read access to the resource is required and multiple resources can be queried. 

 

The Application Insights and Log Analytics workspace schema differences show between their properties. UserId, AppId, and name will be found in both.  

Availability count, type, duration, message, run location, id, name, and timestamp in Log Analytics have a corresponding itemCount, duration, message, location, id, name, and timestamp in Application insights, resource properties. Browser, city, client, and country pertaining to the client are prefixed by the ‘client_’ in the Application Insights resource properties. Custom events count, dimensions and name also have an equivalent property without referring to by the ‘custom’ prefix. The device model and device type are available as client_model and client_type. Exception count, HandledAt, Message, and Type have equivalent in the latter. operation and OperationName are similarly named. Pageview count, duration, and name have their pageview prefix dropped. Similarly, Request count, duration, id, name, and success have their prefix dropped. The role, RoleInstance, and SessionID are also similar. SessionId, source system, telemetry type, URL, and user accountID also have underscored equivalents. 

 

Azure Monitor supports cross-service queries between Azure Data Explorer, Application Insights, and Log Analytics. This enables a Kusto cluster and database to be available to Log Analytics/Application Insights tools and refers to it in a cross-service query.  The .show functions, .show function {functionName} and .show database {DatabaseName} schema as JSON commands are all supported with cross-service queries. This is a really powerful feature because it brings external data sources directly into the tool.  

 

Reference: Sample queries