Cluster computing

Tuesday, July 30, 2013

stored procedure to get stores by latitude and longitude

USE Store
GO

/****** Object: StoredProcedure [dbo].[GetStoresByGeo] Script Date: 7/30/2013 6:36:37 PM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
-- =============================================
-- Author: Ravishankar Rajamani
-- Create date: July 2013
-- Description: Get a list of all Stores by co-ordinates
-- =============================================

CREATE PROCEDURE [dbo].[GetStoresByGeo]
@Latitude VARCHAR(50),
@Longitude VARCHAR(50),
@Total int OUTPUT
AS
BEGIN
SET NOCOUNT ON;

DECLARE @Total int; -- get neighboring coordinates DECLARE @g geography; -- center SET @g = geography::STPointFromText('POINT( ' + '49.261923' + ' ' + '-123.068225' + ')', 4326); DECLARE @n geography; -- neighbors SET @n = @g.STBuffer(1000 * 1609.344); -- get count SELECT @Total = count(*) FROM Store (nolock) WHERE Coordinates.STWithin(@n) = 1 -- the STBuffer computed column could be persisted.
SELECT @Total as Total -- Get a table of StoreIds of results DECLARE @StoreIds Table(ID int); INSERT INTO @StoreIds SELECT StoreID FROM Store (nolock) as S WHERE Coordinates.STWithin(@n) = 1 ORDER BY StoreID -- Get stores SELECT * FROM Store (nolock) as S inner join @StoreIds as Ids on Ids.ID = S.StoreID END

Caveat : This stored procedure has not been tried and merely tries to illustrate an idea from a similar real world example.

Here's the location usage in NerdDinner sample that we could modify with the geography data type as per the previous post:

public IQueryable<Dinner> FindByLocation(float latitude, float longitude)
{
var dinners = from dinner in FindUpcomingDinners()
join i in NearestDinners(latitude, longitude)
on dinner.DinnerID equals i.DinnerID
select dinner;

return dinners;
}

/****** Object: UserDefinedFunction [dbo].[NearestDinners] Script Date: 7/30/2013 7:39:11 AM ******/
DROP FUNCTION [dbo].[NearestDinners]
GO

/****** Object: UserDefinedFunction [dbo].[NearestDinners] Script Date: 7/30/2013 7:39:11 AM ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE FUNCTION [dbo].[NearestDinners]
(
@lat real,
@long real
)
RETURNS TABLE
AS
RETURN
SELECT Dinners.DinnerID
FROM Dinners
WHERE dbo.DistanceBetween(@lat, @long, Latitude, Longitude) <100

GO

Monday, July 29, 2013

The advantage of using Geography data types for the REST based location services is that the URI can continue to remain the same for the customer. The latitude and longitude is anyways represented in a well-known text format that is described by the Open Geospatial Consortium.
For example,

DECLARE @g geography;
SET @g = geography::STPointFromText('POINT(-122.34900 47.65100)', 4326);
SELECT @g.ToString();

Internally these WKTs are translated to Geography data type instances and stored like regular columns along with the attributes of the location.
They are queried using functions based on the geography data types. The functions could be based on distance between two points or their membership to a polygon such as one representing a zip.
They invoke corresponding static methods on the geography data type such as for example STDistance()

DECLARE @g geography;
DECLARE @h geography;
SET @g = geography::STGeomFromText('LINESTRING(-122.360 47.656, -122.343 47.656)', 4326);
SET @h = geography::STGeomFromText('POINT(-122.34900 47.65100)', 4326);
SELECT @g.STDistance(@h);

or find all intersection points with a polygon such as follows

DECLARE @Seattle geography = 'POLYGON(....)'; 
SELECT c.customerid FROM customer c WHERE c.geog.STIntersects(@Seattle.STBuffer(10 * 1609.344));

In either case the we can retrieve the points that are found in the vicinity of a location of interest.
By changing the column data type, we may have to handle all dependencies on the earlier data type. If location data was used for any index creation or used in conjunction with other tables or database objects, they would need to be modified to use the geography data type. In general, this should not be a problem since pretty much all the semantic operations performed on the co-ordinates are supported by the geography data type.
Also, one more thing to point out is that the geography column does not need an index creation. In fact, this might be a good opportunity to refactor all references to the geography column and limit their usage only to the database objects that support querying by the user.
Finally, database unit-tests (DUT) must be authored before the changes are made so that we can keep the same behaviour before and after we make the changes. These DUTs could include specific use case scenarios, as well as column scanning to see if any of the conversions failed, or relative distances between select points are maintained with the modified column type.

Location services using spatial data

In applications like the NerdDinner samples posted online, the location data is often represented with latitudes and longitudes which are stored in the database as not nullable doubles or floating points. Stores or restaurants in the neighborhood of a location are found by their cartesian distance between the latitude and longitude of the location and the store / restaurant usually implemented as a database function or stored procedure. The table for the stores or restaurants with their locations are then scanned to find the distances with the input location and ordered . This is then filtered for the top handful and displayed on the maps such as with Google API. Application developers find it easy to write the controllers to use the said database objects or LINQ to SQL to display the appropriate views.
However, there are some limitations with these approach. First this doesn't scale when there are hundreds of stores or restaurants in the radius of interest. Second, it doesn't answer the often repeated queries such as the points in a polygon such as that formed by a zipcode. Queries could also more efficiently find distance between two points if the data type to store location was say GEOGRAPHY and GEOMETRY data type in SQL Server. The Geography data type stores ellipsoidal data such as GPS Latitude and Longitude and the geometry data type stores Euclidean (flat) co-ordinate system One could then have a table such as :
ZipCodes
- ZipCodeId
- Code
- StateID
- Boundary
- Center Point
Boundary could be considered the polygon formed by the zip and the Center point is the central location in this zip. Distances between stores and their membership to a zip can be calculated based on this center point. Geography data type also lets you perform clustering analytics which answer questions such as the number of stores or restaurants satisfying a certain spatial condition and/or matching certain attributes. These are implemented using R-Tree data structures which support such clustering techniques.
Spatial data types such as the Geography data type now enjoys support in Entity Framework 5 as described here and therefore is available in LINQ and explained here. .Net also supports these data types with the SQLGeography and SQLGeometry data types for easy translation and mapping to their equivalent sql server data types.
One approach is to store the co-ordinates in a separate table where the primary keys are saved as the pair of latitude and longitude and then to describe them as unique such that a pair of latitude and longitude does not repeat. Such approach is questionable because the uniqueness constraint for locations has a maintenance overhead. For example, two locations could refer to the same point and then unreferenced rows might need to be cleaned up. Locations also change ownership for example, a store A could own a location that was previously owned by store B but B never updates its location. Moreover, stores could undergo renames or conversions. Thus it may be better to keep the spatial data associated in a repeatable way along with the information about the location.
Map APIs such as Google Maps or Bing Maps lets you work with spatial data types along with their usual caveat of not to store or cache locations independently.
Referred StackOverflow and MSDN

Sunday, July 28, 2013

Let us look at an implementation for the map showing the coffee stores in your neighborhood and your friends who had visited these same stores in the past day, month or year.

The APIs to retrieve store information in your neighborhood is given by the Starbucks location API. The friends information is given by the Facebook API. The location updates by your friends are made on Facebook through automatic push notifications including a proposed Starbucks application.
Different applications make Facebook posts on their customer's wall using the permission requested from them. These show up under Location updates on the FB profiles. The same can be queried for other facebook friends for a given location. A list of such friends who have the given location on their wall is collected and this should be rendered alongwith the store mentioned.
The Starbucks API provides the following services:
1) OAuth APIs : provides access tokens based on one or more methods to acquire them, relying on
a) password b) client id and user id c) authentication code, d) client credentials. Refresh tokens are also provided.
2) Loyalty API : The loyalty API provides Licenses with a way to integrate their Siebel loyalty systems
3) Location API : Provides methods to list all stores, stores near a geographic point, stores by specific ids, stores within a region.
4) Account API: such as to create US Starbucks account and non-US Starbucks account, to validate and recover credentials, to create account profiles and to update social profile data.
5) Card API: Provides customer's card information to keep track of their purchases. This gives information such as Card URL, Card details, Card balances, and the ability to reload a card.
6) Payment methods : Enables collection of sums from the customer using a payment method such as a credit card or PayPal
7) Rewards API: provides rewards summary and history information based on application user, points needed for next level, points needed for next free reward, points needed for re-evaluation etc.
8) Barcode API: The barcode API generates Barcodes to communicate with in-store scanners and applications.
9) Content API: These are used to retrieve localized contents for any web client. Content is stored and retrieved in site core.
10) Transaction History API: These are used to retrieve the history data for a specified user, for all cards.
11) eGift API: These are used to retrieve the themes associated with eGifts.
12) Device Relationship API: These are used to insert or update a device, to get reporting actions such as touch-to-pay to display the barcode for scanning and touch-when-done to report when completing the barcode for scanning.
13) Product API : to browse food, beverage, Coffee, eGifts etc.

Saturday, July 27, 2013

API Explorer for Starbucks and a Graph API implementation

Starbucks API are OAuth enabled. This means that they don't just grant access based on api keys but require an access token that is provided by an OAuth Provider. Starbucks APIs are available from Mashery that provides a redirect to Starbucks Authorization endpoint and this is where API users get their access token. OAuth enables one of four different workflows to get access tokens.
Implicit Grant - such as when a mobile application tries to get an access token from the authorization endpoint based on client id and user id.
Authorization Code grant - such as when a user login to an IIS hosted site and the user's browser is redirected to the Starbucks' authorization endpoint to get a one time short lived authorization code. The client can then exchange the code for an access token.
Credentials Grant - such as when a user provides his or her username and password for a token.
Client Credentials Grant - such as when an application from a secured kiosk or site provides context regardless of the user.
In building an explorer for Starbucks API, we will need to get an access token to make the API calls. Since this application that we call the API explorer enables API users to try out the different APIs based on input parameters and responses, we will choose either the client credentials grant or the implicit grant to retrieve an access token at push button demand. Both XML and JSON responses can be displayed in the text area panel of the API explorer. This is conceived to be very similar to the Graph API Explorer from Facebook.

Another application of Starbucks API could be a deeper integration with the FaceBook's location data. For example Starbucks customers would like to know which of their friends from FaceBook frequented the same Starbucks store the same day as the one they are at currently. Starbucks mobile application today maintains card history and rewards on their application. If they could push FaceBook location updates on purchases that they track with their mobile application at the store that they visit, then Facebook friends could see where each other have been on a given day. This could encourage more sales at the Starbucks store as friends try to catch up with each other and at the very least provides useful knowledge to the Starbucks coffee customer of who else has been doing the same at this store. Additionally Starbucks mobile application need not take the user to their Facebook page to view or post this data, but offer a tip or balloon notification of which of the application user's friends had been at this store and when, if any. Such tips are non-invasive, information only and enables the coffee experience to be an avenue for social networking. Interested users could be taken to a map that displays not just the stores but the Facebook friends that have visited that store in the past day, week or month.

Localizatioon and globalization testing of websites

usually referred to by their notations L10N and I18N, locale specific website rendering is a significant test concern both in terms of resources required for the testing and the time it consumes. The primary considerations for this testing is the linguistic, cosmetic or basic functionality issue in displaying information in the culture specific manner. Some languages such as German require around 30% more space while Chinese for instance requires around 30% less. Morever, right to left languages such as Arabic and Hebrew require alignments, proper indentations and layout. Since UI resources for a website are typically collected and stored in resx files their collation and translation is made easy with tools such as resgen.exe. However the content alone does not guarantee their appropriateness to the website rendering, hence additional testing is required. As with any variation of the website, a full test pass using functionality test and load test is incurred. These sites also require significant environment resources to be allocated, including culture specific domain name registrations and associated servers. Each such resource requires setup, maintenance and constant tracking in various measurement and reporting systems. Such tasks increase the matrix of the web testing. Fundamentally, these testings are rigorous, end to end and repeated for each locale. What would be desirable is to unify the testing for the common content and factor out the testing specific to the locale. By unifying the tests upstream for much of the content and their display, there are significant savings made in the test cost. Consider the steps involved in the culture specific testing today as depicted below. Each of them is a full iteration of a common content with repeated functionality and load tests even though the locale specific testing is focused on linguistic translation and cosmetics.
test-en-us : setup ->deployment->functionality testing->load testing->translation and cosmetics->completion
test-uk-en : setup ->deployment->functionality testing->load testing->translation and cosmetics->completion
test-de-de : setup ->deployment->functionality testing->load testing->translation and cosmetics->completion
If there were a solution that enables a common test bed for much of the redundancies such as below
-> linguistic and translation tests
test-neutral: setup->deployment->functionality testing->load testing -> layout, width, height, indentation checks from static resource checking
-> variations of locale for repeating the above.
This way, the redundancies are removed, testing is more streamlined and focused on explicit culture specific tasks.
Moreover, in the earlier model, test failures with one locale environment could be different from other local environment on a case by case basis. By unifying the resources and the operations, much of this triage and variations can be avoided. The blogposts on Pseudoizer can be very helpful here.