Today we continue discussing another System Design question. We have been covering data architectures and store services. We were reviewing Payment Systems from Airbnb. Airbnb is a global online marketplace where payments are one of the largest in-house payment systems. They had three standalone systems - a backend billing API, a payment gateway to external processors, and a financial pipeline for Spark based financial reporting. The platform events from billing and the payment events from gateway flow to the pipeline via Kafka. Billing interface allowed addition of items other than the usual reservation and payment which includes reservation nights, fees, taxes, photography costs, coupons by representing them with abstract data models and attributes. The checkout flow now becomes general to all items and streamlined while publishing events to all interested subscribers.
We were comparing their Kafka and data pipeline based solution with SQL database clusters. The SQL database cluster is usually used as a failover cluster not a performance or parallel cluster. Although it uses SAN, it is a shared storage and does not go offline. The clustering does not save space or efforts for backup or maintenance. And it does not scale out the reads for the database. Moreover, it does not give a 100% uptime. Similar considerations are true when the database server is hosted on a container and the database is on a shared volume. But a database server is capable of scaling up for large online transaction processing. Reporting stack is usually a pull and transformation operation on any database and is generally independent of the data manipulation from online transactions.
The analysis platform can now be built on top of the data pipeline. With Apache Spark for instance, we can run queries in both SQL as well as stream events.
#codingexercise
Check if a subset of an array has a given sum
bool HasSubsetSum(List<int> A, int n, int sum)
{
if (A == null) return false;
if (sum == 0)
return true;
if (n == 0 && sum != 0)
return false;
if (A[n-1] > sum)
return HasSubsetSum(A, n-1, sum);
return HasSubsetSum(A, n-1, sum) ||
HasSubsetSum(A, n-1, sum-A[n-1]);
}
We were comparing their Kafka and data pipeline based solution with SQL database clusters. The SQL database cluster is usually used as a failover cluster not a performance or parallel cluster. Although it uses SAN, it is a shared storage and does not go offline. The clustering does not save space or efforts for backup or maintenance. And it does not scale out the reads for the database. Moreover, it does not give a 100% uptime. Similar considerations are true when the database server is hosted on a container and the database is on a shared volume. But a database server is capable of scaling up for large online transaction processing. Reporting stack is usually a pull and transformation operation on any database and is generally independent of the data manipulation from online transactions.
The analysis platform can now be built on top of the data pipeline. With Apache Spark for instance, we can run queries in both SQL as well as stream events.
#codingexercise
Check if a subset of an array has a given sum
bool HasSubsetSum(List<int> A, int n, int sum)
{
if (A == null) return false;
if (sum == 0)
return true;
if (n == 0 && sum != 0)
return false;
if (A[n-1] > sum)
return HasSubsetSum(A, n-1, sum);
return HasSubsetSum(A, n-1, sum) ||
HasSubsetSum(A, n-1, sum-A[n-1]);
}