Friday, July 21, 2017

As we discussed Content Databases, document libraries and S3 storage together with their benefits for text summarization service, we now review NoSQL databases as a store for the summaries. We argued that the summaries can be represented as JSON documents and that the store can grow arbitrarily large to the tune of 40GB per user and that the text of the original document may also be stored. Consequently one of the approaches could be to use a MongoDB store which can be a wonderful in-premise solution but one that requires significant processing for the purposes of analysis and reporting especially when the datasets are large However we talked about the text summarization service as a truly cloud based offering. The challenges faced in scaling MongoDB and to use a solution that does not pose any restrictions around limits, we could do well to review the case study of DoubleDown in their migration of data from MongoDB to Snowflake.  Towards the end of the discussion, we will reflect on whether Snowflake can be built on  top of DynamoDB or CosmosDB.
DoubleDown is an internet casino games provider that was acquired by International Game Technology. Its games are available through different applications and platforms. While these games are free to play, it makes money from in game purchases and advertising partners. With existing and new games, it does analysis to gain insights  that influence game design, marketing campaign evaluation and management. It improves the understanding of player behaviour, assess user experience, and uncover bugs and defects. This data comes from MySQL databases, internal production databases, cloud based game servers, internal Splunk servers,  and several third party vendors. All this continuous data feeds amount to about 3.5 terabytes of data per day which come from separate data paths,  ETL transformations and stored in large JSON files. MongoDB was used for processing this data which supported a collection of collectors and aggregators. The data was then pulled into a staging area where it was cleaned, transformed and conformed to a star schema before loading into a data warehouse. This warehouse then supported analysis and reporting tools including Tableau. Snowflake not only replaced the MongoDB but it also streamlined the data operation while expediting the process that took nearly a day earlier. Snowflake brought the following advantages:
Its query language is SQL so development pace was rapid
It loads JSON natively so several lossy data transformations were avoided
It was able to store highly granular stage and store data in S3 which made it very effective.
It was able to process JSON data using SQL which did away with a lot of map-reduce
Snowflake utilizes micro partitions to securely and efficiently store customer data
#codingexercise
int GetMaxValue(List<int> A)
{
var max = INT_MIN;
for (int i = 0; i < A.Count; i++)
    if (A[i] > max)
        max = A[i];
return max;
}

No comments:

Post a Comment