Cluster computing

Wednesday, July 31, 2013

The datetime conversions (SQL Server) include the following functions :
The Convert function is a good general choice but DateName and DatePart provide flexibility.
SET DATEFORMAT dmy is used
SELECT CAST('7/31/2013' AS DateTime)
You can also use different styles to convert to and from text:
SELECT CONVERT(VARCHAR, CONVERT(DATETIME, '31-Jul-2013', 106), 106)
will display 31 Jul 2013
The allowed styles include 0, 100 Default:mon dd yyyy hh:mi AM (or PM)
Style 101 : USA mm/dd/yyyy
Style 102 : ANSI yyyy. mm. dd
Style 103 : British /French dd/mm/yyyy
Style 104: German dd.mm.yyyy
Style 105: Italian dd-mm-yyyy
Style 106: dd mon yyyy
Style 107: mon dd, yyyy
Style 108 : hh:mm:ss
Style 9, 109: Default with milliseconds
Style 110 USA:mm-dd-yyyy
Style 111 : Japan: yyyy/mm/dd
Style 112: ISO:yyyymmdd
Style 13, 113: Europe default with milliseconds and 24hour clock
Style 114: hh:mm:ss:mmm with a 24 hour clock
Style 20, 120 ODBC canonical, 24 hour clock: yyyy-mm-dd hh:mi:ss
Style 21, 121 ODBC canonical with milliseconds, 24 hour clock:yyyy-mm-dd hh:mi:ss.mmm

A look at some of the SQL constructs in no particular order:
1) common table expression - This is a temporary result set maintained for the duration of a query execution such as an insert, update or delete.
It can be thought of as a substitute for a view and doesn't require the view definition to be stored in the metadata. Follows the syntax: With CTE_Name(Col1, Col2, Col3) (SELECT query to populate) specify the outer query.
A CTE can be written in a recursive manner as follows : With CTE_name(column_name) AS
(CTE_query_definition -- anchor member is defined
union all
CTE_query_definition -- recursive member is defined in referencing cte_name
)
-- statement using the CTE
SELECT * from cte_name
For example With DirectReports (ManagerID, EmployeeID, Level) AS (
select e.ManagerID, e.EmployeeID, 0 as Level from dbo.Employee as e where ManagerID is null
union all
select e.ManagerID, e.EmployeeID, Level + 1 from dbo.Employee as e inner join DirectReports as d on e.ManagerId = d.EmployeeID
)
SELECT ManagerID, EmployeeID, Level from DirectReports
2) Some query hint options include
hash or order group which specifies if the aggregations use hash or order group
Merge or hash or concat specifies which should be used for union
Loop or merge or hash specifies which should be used for join
FAST number_rows that specifies fast retrieval of number_rows
FORCE ORDER specifies that the default source order is preserved.
MAXDOP overrides the max degree of parallelization setting, 0 => max
optimize for unknown informs the query optimizer to use statistical data instead of the initial values for variables
RECOMPILE discards the plan generated for the query after it executes, forcing the query optimizer to recompile

Tuesday, July 30, 2013

stored procedure to get stores by latitude and longitude

USE Store
GO

/****** Object: StoredProcedure [dbo].[GetStoresByGeo] Script Date: 7/30/2013 6:36:37 PM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
-- =============================================
-- Author: Ravishankar Rajamani
-- Create date: July 2013
-- Description: Get a list of all Stores by co-ordinates
-- =============================================

CREATE PROCEDURE [dbo].[GetStoresByGeo]
@Latitude VARCHAR(50),
@Longitude VARCHAR(50),
@Total int OUTPUT
AS
BEGIN
SET NOCOUNT ON;

DECLARE @Total int; -- get neighboring coordinates DECLARE @g geography; -- center SET @g = geography::STPointFromText('POINT( ' + '49.261923' + ' ' + '-123.068225' + ')', 4326); DECLARE @n geography; -- neighbors SET @n = @g.STBuffer(1000 * 1609.344); -- get count SELECT @Total = count(*) FROM Store (nolock) WHERE Coordinates.STWithin(@n) = 1 -- the STBuffer computed column could be persisted.
SELECT @Total as Total -- Get a table of StoreIds of results DECLARE @StoreIds Table(ID int); INSERT INTO @StoreIds SELECT StoreID FROM Store (nolock) as S WHERE Coordinates.STWithin(@n) = 1 ORDER BY StoreID -- Get stores SELECT * FROM Store (nolock) as S inner join @StoreIds as Ids on Ids.ID = S.StoreID END

Caveat : This stored procedure has not been tried and merely tries to illustrate an idea from a similar real world example.

Here's the location usage in NerdDinner sample that we could modify with the geography data type as per the previous post:

public IQueryable<Dinner> FindByLocation(float latitude, float longitude)
{
var dinners = from dinner in FindUpcomingDinners()
join i in NearestDinners(latitude, longitude)
on dinner.DinnerID equals i.DinnerID
select dinner;

return dinners;
}

/****** Object: UserDefinedFunction [dbo].[NearestDinners] Script Date: 7/30/2013 7:39:11 AM ******/
DROP FUNCTION [dbo].[NearestDinners]
GO

/****** Object: UserDefinedFunction [dbo].[NearestDinners] Script Date: 7/30/2013 7:39:11 AM ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE FUNCTION [dbo].[NearestDinners]
(
@lat real,
@long real
)
RETURNS TABLE
AS
RETURN
SELECT Dinners.DinnerID
FROM Dinners
WHERE dbo.DistanceBetween(@lat, @long, Latitude, Longitude) <100

GO

Monday, July 29, 2013

The advantage of using Geography data types for the REST based location services is that the URI can continue to remain the same for the customer. The latitude and longitude is anyways represented in a well-known text format that is described by the Open Geospatial Consortium.
For example,

DECLARE @g geography;
SET @g = geography::STPointFromText('POINT(-122.34900 47.65100)', 4326);
SELECT @g.ToString();

Internally these WKTs are translated to Geography data type instances and stored like regular columns along with the attributes of the location.
They are queried using functions based on the geography data types. The functions could be based on distance between two points or their membership to a polygon such as one representing a zip.
They invoke corresponding static methods on the geography data type such as for example STDistance()

DECLARE @g geography;
DECLARE @h geography;
SET @g = geography::STGeomFromText('LINESTRING(-122.360 47.656, -122.343 47.656)', 4326);
SET @h = geography::STGeomFromText('POINT(-122.34900 47.65100)', 4326);
SELECT @g.STDistance(@h);

or find all intersection points with a polygon such as follows

DECLARE @Seattle geography = 'POLYGON(....)'; 
SELECT c.customerid FROM customer c WHERE c.geog.STIntersects(@Seattle.STBuffer(10 * 1609.344));

In either case the we can retrieve the points that are found in the vicinity of a location of interest.
By changing the column data type, we may have to handle all dependencies on the earlier data type. If location data was used for any index creation or used in conjunction with other tables or database objects, they would need to be modified to use the geography data type. In general, this should not be a problem since pretty much all the semantic operations performed on the co-ordinates are supported by the geography data type.
Also, one more thing to point out is that the geography column does not need an index creation. In fact, this might be a good opportunity to refactor all references to the geography column and limit their usage only to the database objects that support querying by the user.
Finally, database unit-tests (DUT) must be authored before the changes are made so that we can keep the same behaviour before and after we make the changes. These DUTs could include specific use case scenarios, as well as column scanning to see if any of the conversions failed, or relative distances between select points are maintained with the modified column type.

Location services using spatial data

In applications like the NerdDinner samples posted online, the location data is often represented with latitudes and longitudes which are stored in the database as not nullable doubles or floating points. Stores or restaurants in the neighborhood of a location are found by their cartesian distance between the latitude and longitude of the location and the store / restaurant usually implemented as a database function or stored procedure. The table for the stores or restaurants with their locations are then scanned to find the distances with the input location and ordered . This is then filtered for the top handful and displayed on the maps such as with Google API. Application developers find it easy to write the controllers to use the said database objects or LINQ to SQL to display the appropriate views.
However, there are some limitations with these approach. First this doesn't scale when there are hundreds of stores or restaurants in the radius of interest. Second, it doesn't answer the often repeated queries such as the points in a polygon such as that formed by a zipcode. Queries could also more efficiently find distance between two points if the data type to store location was say GEOGRAPHY and GEOMETRY data type in SQL Server. The Geography data type stores ellipsoidal data such as GPS Latitude and Longitude and the geometry data type stores Euclidean (flat) co-ordinate system One could then have a table such as :
ZipCodes
- ZipCodeId
- Code
- StateID
- Boundary
- Center Point
Boundary could be considered the polygon formed by the zip and the Center point is the central location in this zip. Distances between stores and their membership to a zip can be calculated based on this center point. Geography data type also lets you perform clustering analytics which answer questions such as the number of stores or restaurants satisfying a certain spatial condition and/or matching certain attributes. These are implemented using R-Tree data structures which support such clustering techniques.
Spatial data types such as the Geography data type now enjoys support in Entity Framework 5 as described here and therefore is available in LINQ and explained here. .Net also supports these data types with the SQLGeography and SQLGeometry data types for easy translation and mapping to their equivalent sql server data types.
One approach is to store the co-ordinates in a separate table where the primary keys are saved as the pair of latitude and longitude and then to describe them as unique such that a pair of latitude and longitude does not repeat. Such approach is questionable because the uniqueness constraint for locations has a maintenance overhead. For example, two locations could refer to the same point and then unreferenced rows might need to be cleaned up. Locations also change ownership for example, a store A could own a location that was previously owned by store B but B never updates its location. Moreover, stores could undergo renames or conversions. Thus it may be better to keep the spatial data associated in a repeatable way along with the information about the location.
Map APIs such as Google Maps or Bing Maps lets you work with spatial data types along with their usual caveat of not to store or cache locations independently.
Referred StackOverflow and MSDN

Sunday, July 28, 2013

Let us look at an implementation for the map showing the coffee stores in your neighborhood and your friends who had visited these same stores in the past day, month or year.

The APIs to retrieve store information in your neighborhood is given by the Starbucks location API. The friends information is given by the Facebook API. The location updates by your friends are made on Facebook through automatic push notifications including a proposed Starbucks application.
Different applications make Facebook posts on their customer's wall using the permission requested from them. These show up under Location updates on the FB profiles. The same can be queried for other facebook friends for a given location. A list of such friends who have the given location on their wall is collected and this should be rendered alongwith the store mentioned.
The Starbucks API provides the following services:
1) OAuth APIs : provides access tokens based on one or more methods to acquire them, relying on
a) password b) client id and user id c) authentication code, d) client credentials. Refresh tokens are also provided.
2) Loyalty API : The loyalty API provides Licenses with a way to integrate their Siebel loyalty systems
3) Location API : Provides methods to list all stores, stores near a geographic point, stores by specific ids, stores within a region.
4) Account API: such as to create US Starbucks account and non-US Starbucks account, to validate and recover credentials, to create account profiles and to update social profile data.
5) Card API: Provides customer's card information to keep track of their purchases. This gives information such as Card URL, Card details, Card balances, and the ability to reload a card.
6) Payment methods : Enables collection of sums from the customer using a payment method such as a credit card or PayPal
7) Rewards API: provides rewards summary and history information based on application user, points needed for next level, points needed for next free reward, points needed for re-evaluation etc.
8) Barcode API: The barcode API generates Barcodes to communicate with in-store scanners and applications.
9) Content API: These are used to retrieve localized contents for any web client. Content is stored and retrieved in site core.
10) Transaction History API: These are used to retrieve the history data for a specified user, for all cards.
11) eGift API: These are used to retrieve the themes associated with eGifts.
12) Device Relationship API: These are used to insert or update a device, to get reporting actions such as touch-to-pay to display the barcode for scanning and touch-when-done to report when completing the barcode for scanning.
13) Product API : to browse food, beverage, Coffee, eGifts etc.