We continue our discussion of System design for online store
as mentioned here
and here.
We now discuss the data storage aspects across all services from the point of
scalability. We assume the store will have infinite users at some point and
plan accordingly. The services will need
to store large volumes of data. This data will be both user data and logs. The user portal may be composed of data from
many different data sources. Images and large static content will likely be served
from storage that is optimized for blobs. Most of the per-user information is
stored from sharded relational databases.
High volume short text such as from community feedback forums, social
engineering transcripts, chats and messages, ticket and case troubleshooting conversations
will likely be stored in a large distributed key-value store. A conventional relational
database may be used as a queuing system on top of this store. Almost all of
this data is still corresponding to a user by user basis. It is the data
generated by user. Unlike user data, log
data is generated by the system from the various operations of the services in
the form of log events. Log Events help with analytics. For example, the log
events may be used for correlation and as feedback which then leads to
improvements in the operations of the services. This feedback-improvement virtuous
cycle can go on and on regardless of which user is using the system. Log Events
translate to feedback only with analytics. For example, users may be shown the trending
bestsellers or newcomers to the store. This may require correlation and collaborative
filtering to provide a ranked list. Analytics come with beautiful charts. User
and Log data may also be used in many other ways. Data may appear in the form
of feeds to users to improve the shopping experience around a product. Data may
come in the form of recommendations such as people who liked this also liked
that. Data may be represented as graph and used with search. Data may also be
used for the integrity of the site. Ads,
reviews and insights may also appear as additional data while being separate
and distinct in their purpose or usage. Data expands possibilities for the
business and hence it eventually becomes the center of gravity. For example, Logging
may be used to channel all logs to a central repository which may grow with
time in a time series database on a dedicated cluster or a data warehouse. As
data expands, scalability concerns grow. Systems may become mature but when
size grows, even architectures change. Embrace of Big Data over relational is a
trend that comes directly from scalability. Developers may find it enticing to
use SQL statements instead of map-reduce to get to the same result.
Consequently they may require additional stack over data. Visualization will
pull data and will also come with its own stack. Data tools may evolve over the data stack. And
the tools and stack will both evolve to better suit the scalability and
functionality. The design discussion here is borrowed from what has already been shown to work in companies like Facebook that have grown significantly.
#codingexercise
input [2,3,1,4]
output [12,8,24,6]
Multiply all fields except it's own position.
List<int> GetSumProduct(List<int> A)
{
assert(A.Any(x => x == 0) == false);
var product = 1;
A.ForEach( x => {product *= x;});
var ret = new List<int>();
A.ForEach( x => { ret.Add(product/x); });
return ret;
}
if we were to avoid division, we could use multiply for every entry other than itself in each iteration. If we were to make it linear and without division, we would keep track of front and rear products in separate passes and combine for the results. We need to start with 1.
#codingexercise
input [2,3,1,4]
output [12,8,24,6]
Multiply all fields except it's own position.
List<int> GetSumProduct(List<int> A)
{
assert(A.Any(x => x == 0) == false);
var product = 1;
A.ForEach( x => {product *= x;});
var ret = new List<int>();
A.ForEach( x => { ret.Add(product/x); });
return ret;
}
if we were to avoid division, we could use multiply for every entry other than itself in each iteration. If we were to make it linear and without division, we would keep track of front and rear products in separate passes and combine for the results. We need to start with 1.
No comments:
Post a Comment