Cluster computing

These are a few SQL exercises to demonstrate querying beyond spreadsheets:

Let us consider that there are

-- Employees at Company X

-- their names must be unique.

CREATE TABLE employees (

id SERIAL PRIMARY KEY,

name TEXT,

);

-- Contacts are people an employee knows

CREATE TABLE contacts (

id SERIAL PRIMARY KEY,

name TEXT,

email TEXT

);

-- employees will refer people they know

-- must be cleaned up after an employee leaves

CREATE TABLE referrals (

employee_id INT references employees(id),

contact_id INT references contacts(id)

);

Q.1: Which of the following statements is TRUE?

Creating employees with the same name will generate a SQL Error (FALSE)

Creating employees with the same name will not generate a SQL Error (TRUE)

Q.2: Which of the following statements is TRUE?

When an employee or contact is deleted, referrals will be deleted (FALSE)

When an employee or contact is deleted, referrals will NOT be deleted (TRUE)

Query ONE

SELECT * FROM (

SELECT

employees.name AS employee_name

contact_id,

contacts.name as contacts_name,

contacts.email as contacts_email,

(SELECT COUNT (*) FROM referrals as r WHERE r.contact_id = referrals.contact_id) as num_referrals

FROM referrals

JOIN employees on employees.id= employee_id

JOIN contacts on contacts.id = contact_id

) As q

where q.num_referrals > 1;

Query TWO

SELECT

employees.name AS employee_name

contact_id,

contacts.name as contacts_name,

contacts.email as contacts_email,

(SELECT COUNT (*) FROM referrals as r WHERE r.contact_id = referrals.contact_id) as num_referrals

FROM referrals

JOIN employees on employees.id= employee_id

JOIN contacts on contacts.id = contact_id

WHERE

contact_id IN (SELECT contact_id FROM REFERRALS GROUP BY contact_id HAVING COUNT (*) > 1);

Q.3: Which of the two queries is more efficient?

Query ONE (TRUE)

Query TWO (FALSE)

Q.4: A ride hailing company has their DB structured in 3 major tables as described in the SCHEMA

section below.

Write a query to fetch the top 100 users who traveled the most distance using the service.

The output should be structured as: users.name distance_traveled

Sort the output by distance_traveled in descending order, then by user name in ascending order. Show

only the top 100 users, ignoring the ties at the last position.

Note: There could be multiple users with the same name but they will have different IDs

CITIES: id string assigned id to the city represented as 32 characters UUID

name string the name of the city

USERS: id string assigned id to the city represented as 32 characters UUID

city_id string the id of the city in which this user resides.

name string the name of the user.

email string the email of the user.

SELECT DISTINCT u.name, SUM(r.distance) as distance_traveled

FROM RIDES r

JOIN USERS u

ON r.user_id = u.id

GROUP BY r.user_id, u.name

ORDER BY distance_traveled DESC, u.name ASC

LIMIT 0,100;

Cluster computing

Sunday, January 29, 2023

No comments:

Post a Comment