Corrigé TP - Extensions OLAP et Fenêtres de Partitionnement PostgreSQL
Section 1 : Premiers pas
Exercice 1.1 : Chargement et vérification de la base
-- Vérifier la liste des tables
\dt
-- Vérifier le schéma de quelques tables importantes
\d orders
\d orderdetails
\d customers
\d suppliers
Exercice 1.2 : Identification de la table de faits et dimension récursive
Réponse :
- Table de faits :
orderdetails(elle contient les mesures quantitatives : quantity, unitprice, discount) - Dimension récursive :
employees(l’attributreportstofait référence àemployeeiddans la même table, créant une hiérarchie managériale)
Section 2 : Extensions OLAP de la clause GROUP BY
Exercice 2.3.1 : Nombre de clients par pays
SELECT country, COUNT(*) as nb
FROM customers
GROUP BY country
ORDER BY country;
Exercice 2.3.2 : Nombres de commandes par pays et ville avec totaux (ROLLUP)
-- Solution avec ROLLUP
SELECT shipcountry AS ship_country,
shipcity AS ship_city,
COUNT(*) AS nborders
FROM orders
GROUP BY ROLLUP(shipcountry, shipcity)
ORDER BY shipcountry, shipcity;
-- Alternative avec GROUPING SETS (plus explicite)
SELECT shipcountry AS ship_country,
shipcity AS ship_city,
COUNT(*) AS nborders
FROM orders
GROUP BY GROUPING SETS (
(shipcountry, shipcity), -- Détail par pays et ville
(shipcountry), -- Total par pays
() -- Total général
)
ORDER BY shipcountry, shipcity;
Exercice 2.3.3 : Commandes et quantités par paire (pays Client, pays Fournisseur)
SELECT c.country AS c_country,
s.country AS s_country,
SUM(od.quantity) AS quantity,
COUNT(DISTINCT o.orderid) AS nborder
FROM orders o
JOIN customers c ON o.customerid = c.customerid
JOIN orderdetails od ON o.orderid = od.orderid
JOIN products p ON od.productid = p.productid
JOIN suppliers s ON p.supplierid = s.supplierid
GROUP BY c.country, s.country
ORDER BY c.country, s.country;
Exercice 2.3.4 : Totaux à tous les niveaux (ROLLUP complet)
SELECT c.country AS c_country,
s.country AS s_country,
SUM(od.quantity) AS quantity,
COUNT(DISTINCT o.orderid) AS nborder
FROM orders o
JOIN customers c ON o.customerid = c.customerid
JOIN orderdetails od ON o.orderid = od.orderid
JOIN products p ON od.productid = p.productid
JOIN suppliers s ON p.supplierid = s.supplierid
GROUP BY ROLLUP(c.country, s.country)
ORDER BY c.country, s.country;
Exercice 2.3.5 : Prix total avec fournisseur français (CUBE vs ROLLUP)
-- Solution 1 : avec CUBE (tous les croisements possibles)
SELECT shipcountry AS ship_country,
shipregion AS ship_region,
shipcity AS ship_city,
ROUND(SUM(od.unitprice * od.quantity * (1 - od.discount))::numeric, 2) AS price
FROM orders o
JOIN orderdetails od ON o.orderid = od.orderid
JOIN products p ON od.productid = p.productid
JOIN suppliers s ON p.supplierid = s.supplierid
WHERE s.country = 'France'
GROUP BY CUBE(shipcountry, shipregion, shipcity)
HAVING GROUPING(shipcountry) = 0 -- Toujours afficher le pays
AND NOT (GROUPING(shipcountry) = 1 AND GROUPING(shipregion) = 1 AND GROUPING(shipcity) = 1) -- Exclure le total général
ORDER BY shipcountry, shipregion, shipcity;
-- Solution 2 : avec GROUPING SETS (contrôle précis des niveaux)
SELECT shipcountry AS ship_country,
shipregion AS ship_region,
shipcity AS ship_city,
ROUND(SUM(od.unitprice * od.quantity * (1 - od.discount))::numeric, 2) AS price
FROM orders o
JOIN orderdetails od ON o.orderid = od.orderid
JOIN products p ON od.productid = p.productid
JOIN suppliers s ON p.supplierid = s.supplierid
WHERE s.country = 'France'
GROUP BY GROUPING SETS (
(shipcountry, shipregion, shipcity), -- Détail complet
(shipcountry, shipregion), -- Par pays et région
(shipcountry) -- Par pays seulement
)
ORDER BY shipcountry, shipregion, shipcity;
Exercice 2.3.6 : Remplacer NULL par ‘whole country’
SELECT shipcountry AS s_country,
COALESCE(shipcity, 'whole country') AS s_city,
COUNT(*) AS nborders
FROM orders
GROUP BY ROLLUP(shipcountry, shipcity)
ORDER BY shipcountry, shipcity;
-- Alternative avec CASE et GROUPING()
SELECT shipcountry AS s_country,
CASE
WHEN GROUPING(shipcity) = 1 THEN 'whole country'
ELSE shipcity
END AS s_city,
COUNT(*) AS nborders
FROM orders
GROUP BY ROLLUP(shipcountry, shipcity)
ORDER BY shipcountry, GROUPING(shipcity), shipcity;
Section 3 : Fenêtres de partitionnement
Exercice 3.4.1 : Statistiques par commande avec fenêtres multiples
SELECT orderid AS order_id,
orderdate AS order_date,
shipcity AS ship_city,
shipcountry AS ship_country,
freight,
COUNT(*) OVER (PARTITION BY shipcity) AS n_v,
COUNT(*) OVER (PARTITION BY shipcountry) AS n_p,
SUM(freight) OVER (PARTITION BY shipcountry) AS poids_p,
SUM(freight) OVER (PARTITION BY shipcountry, orderdate) AS poids_p_j
FROM orders
ORDER BY orderid;
Exercice 3.4.2 : Comptages cumulatifs
SELECT orderid AS order_id,
orderdate AS order_date,
shipcity AS ship_city,
shipcountry AS ship_country,
COUNT(*) OVER (PARTITION BY shipcity ORDER BY orderdate
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS nc_v,
COUNT(*) OVER (PARTITION BY shipcountry ORDER BY orderdate
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS nc_p,
COUNT(*) OVER (ORDER BY orderdate
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS nc,
COUNT(*) OVER (PARTITION BY shipcountry, orderdate) AS n_pj
FROM orders
ORDER BY orderdate, orderid;
Exercice 3.4.3 : Ajout du comptage excluant la date courante
SELECT orderid AS order_id,
orderdate AS order_date,
shipcity AS ship_city,
shipcountry AS ship_country,
COUNT(*) OVER (PARTITION BY shipcity ORDER BY orderdate
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS nc_v,
COUNT(*) OVER (PARTITION BY shipcountry ORDER BY orderdate
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS nc_p,
COUNT(*) OVER (ORDER BY orderdate
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS nc,
COUNT(*) OVER (PARTITION BY shipcountry, orderdate) AS n_pj,
COUNT(*) OVER (PARTITION BY shipcountry ORDER BY orderdate
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS nc_p2
FROM orders
ORDER BY orderdate, orderid;
Exercice 3.4.4 : Commandes précédentes avec LAG
SELECT orderid AS order_id,
LAG(orderid) OVER (ORDER BY orderdate, orderid) AS num_p,
LAG(orderid) OVER (PARTITION BY shipcity ORDER BY orderdate, orderid) AS num_p_v
FROM orders
ORDER BY orderdate, orderid;
Exercice 3.4.5 : Statistiques par pays et ville
SELECT DISTINCT
shipcountry AS ship_country,
shipcity AS ship_city,
COUNT(*) OVER (PARTITION BY shipcountry, shipcity) AS nborders,
COUNT(*) OVER (PARTITION BY shipcountry) AS nbordcty,
MAX(COUNT(*) OVER (PARTITION BY shipcountry, shipcity))
OVER (PARTITION BY shipcountry) AS nbormaxcty
FROM orders
ORDER BY shipcountry, shipcity;
Exercice 3.4.6 : Rang des villes par pays (DENSE_RANK)
WITH city_counts AS (
SELECT shipcountry, shipcity, COUNT(*) AS nborders
FROM orders
GROUP BY shipcountry, shipcity
)
SELECT shipcountry AS ship_country,
shipcity AS ship_city,
nborders,
DENSE_RANK() OVER (PARTITION BY shipcountry ORDER BY nborders) AS rank
FROM city_counts
ORDER BY shipcountry, nborders, shipcity;
Exercice 3.4.7 : Ajout du pourcentage
WITH city_counts AS (
SELECT shipcountry, shipcity, COUNT(*) AS nborders
FROM orders
GROUP BY shipcountry, shipcity
)
SELECT shipcountry AS ship_country,
shipcity AS ship_city,
nborders,
DENSE_RANK() OVER (PARTITION BY shipcountry ORDER BY nborders) AS rank,
ROUND(nborders::numeric / SUM(nborders) OVER (PARTITION BY shipcountry), 2) AS percentg
FROM city_counts
ORDER BY shipcountry, nborders, shipcity;
Exercice 3.4.8 : Filtrage basé sur comparaison avec LAG
WITH order_prices AS (
SELECT o.orderid,
SUM(od.unitprice * od.quantity * (1 - od.discount)) AS price
FROM orders o
JOIN orderdetails od ON o.orderid = od.orderid
GROUP BY o.orderid
),
filtered_orders AS (
SELECT orderid,
price,
LAG(price) OVER (ORDER BY orderid) AS prev_price
FROM order_prices
)
SELECT orderid AS order_id,
ROUND(price::numeric, 3) AS price
FROM filtered_orders
WHERE prev_price IS NULL
OR price <= prev_price * 1.1
ORDER BY orderid;
Exercice 3.4.9 : Produits les plus vendus par année
-- Solution 1 : avec fenêtrage
WITH yearly_products AS (
SELECT EXTRACT(YEAR FROM o.orderdate) AS year,
p.productname,
SUM(od.quantity) AS qtity
FROM orders o
JOIN orderdetails od ON o.orderid = od.orderid
JOIN products p ON od.productid = p.productid
GROUP BY EXTRACT(YEAR FROM o.orderdate), p.productname
),
ranked_products AS (
SELECT year,
productname,
qtity,
ROW_NUMBER() OVER (PARTITION BY year ORDER BY qtity DESC) AS rn
FROM yearly_products
)
SELECT year,
productname AS product_name,
qtity
FROM ranked_products
WHERE rn = 1
ORDER BY year;
-- Solution 2 : sans fenêtrage (avec sous-requête corrélée)
WITH yearly_products AS (
SELECT EXTRACT(YEAR FROM o.orderdate) AS year,
p.productname,
SUM(od.quantity) AS qtity
FROM orders o
JOIN orderdetails od ON o.orderid = od.orderid
JOIN products p ON od.productid = p.productid
GROUP BY EXTRACT(YEAR FROM o.orderdate), p.productname
)
SELECT yp.year,
yp.productname AS product_name,
yp.qtity
FROM yearly_products yp
WHERE yp.qtity = (
SELECT MAX(qtity)
FROM yearly_products yp2
WHERE yp2.year = yp.year
)
ORDER BY year;
Section 4 : Fonctions de classement
Exercice 4.1 : RANK avec ex-aequo
SELECT orderid AS order_id,
orderdate AS order_date,
freight,
RANK() OVER (ORDER BY freight DESC) AS rank
FROM orders
ORDER BY freight DESC, orderid;
Exercice 4.2 : DENSE_RANK sans saut de position
SELECT orderid AS order_id,
orderdate AS order_date,
freight,
DENSE_RANK() OVER (ORDER BY freight DESC) AS rank
FROM orders
ORDER BY freight DESC, orderid;
Exercice 4.3 : ROW_NUMBER pour numérotation unique
SELECT orderid AS order_id,
orderdate AS order_date,
freight,
ROW_NUMBER() OVER (ORDER BY freight DESC) AS row_number
FROM orders
ORDER BY freight DESC, orderid;
Exercice 4.4 : NTILE pour segmentation
SELECT orderid AS order_id,
orderdate AS order_date,
freight,
NTILE(120) OVER (ORDER BY freight DESC) AS tile
FROM orders
ORDER BY orderid;
Exercice 4.5 : LEAD pour valeur suivante
SELECT orderid AS order_id,
orderdate AS order_date,
freight,
LEAD(freight, 2) OVER (ORDER BY freight DESC) AS next_freight
FROM orders
ORDER BY orderid;
Exercice 4.6 : LAG pour valeur précédente
SELECT orderid AS order_id,
orderdate AS order_date,
freight,
COALESCE(LAG(freight) OVER (ORDER BY orderdate, orderid), 0) AS previous_freight
FROM orders
ORDER BY orderid;
Exercice 4.7 : FIRST_VALUE
SELECT orderid AS order_id,
orderdate AS order_date,
freight,
FIRST_VALUE(freight) OVER (ORDER BY orderdate, orderid
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS first_freight
FROM orders
ORDER BY orderid;
Exercice 4.8 : LAST_VALUE
SELECT orderid AS order_id,
orderdate AS order_date,
freight,
LAST_VALUE(freight) OVER (ORDER BY orderdate, orderid
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS last_freight
FROM orders
ORDER BY orderid;
Section 5 : Requêtes récursives
Exercice 5.1 : Table des entiers de 1 à 60
WITH RECURSIVE numbers AS (
-- Cas de base
SELECT 1 AS n
UNION ALL
-- Cas récursif
SELECT n + 1
FROM numbers
WHERE n < 60
)
SELECT n FROM numbers;
Exercice 5.2 : Hiérarchie des employés
WITH RECURSIVE employee_hierarchy AS (
-- Cas de base : le grand patron (celui qui n'a pas de supérieur)
SELECT employeeid,
firstname || ' ' || lastname AS fullname,
reportsto,
0 AS level,
firstname || ' ' || lastname AS path,
ARRAY[employeeid] AS path_array -- Pour gérer l'ordre
FROM employees
WHERE reportsto IS NULL
UNION ALL
-- Cas récursif : les subordonnés
SELECT e.employeeid,
e.firstname || ' ' || e.lastname AS fullname,
e.reportsto,
eh.level + 1 AS level,
eh.path || ' > ' || e.firstname || ' ' || e.lastname AS path,
eh.path_array || e.employeeid AS path_array
FROM employees e
INNER JOIN employee_hierarchy eh ON e.reportsto = eh.employeeid
)
SELECT LPAD('', level * 4, ' ') || fullname AS employee, -- Indentation
path AS hierarchy_path
FROM employee_hierarchy
ORDER BY path_array; -- Ordre préfixe (parcours en profondeur)
-- Alternative avec numérotation pour l'ordre préfixe
WITH RECURSIVE employee_hierarchy AS (
SELECT employeeid,
firstname || ' ' || lastname AS fullname,
reportsto,
0 AS level,
firstname || ' ' || lastname AS path,
'001'::text AS sort_path -- Numérotation pour tri
FROM employees
WHERE reportsto IS NULL
UNION ALL
SELECT e.employeeid,
e.firstname || ' ' || e.lastname AS fullname,
e.reportsto,
eh.level + 1 AS level,
eh.path || ' > ' || e.firstname || ' ' || e.lastname AS path,
eh.sort_path || '.' || LPAD(ROW_NUMBER() OVER (PARTITION BY eh.employeeid ORDER BY e.employeeid)::text, 3, '0')
FROM employees e
INNER JOIN employee_hierarchy eh ON e.reportsto = eh.employeeid
)
SELECT REPEAT(' ', level) || fullname AS employee,
path AS hierarchy_path
FROM employee_hierarchy
ORDER BY sort_path;
Notes importantes et bonnes pratiques
1. Performance
- Indexes : Assurez-vous que les colonnes utilisées dans les JOIN et WHERE ont des indexes appropriés
- EXPLAIN ANALYZE : Utilisez cette commande pour analyser les plans d’exécution
- Matérialized Views : Pour les requêtes complexes fréquemment exécutées
2. Différences entre fonctions de fenêtrage
- ROWS : Compte les lignes physiques (utile pour LAG/LEAD)
- RANGE : Considère les valeurs (utile pour les agrégations sur des plages de dates)
- UNBOUNDED PRECEDING/FOLLOWING : Depuis/jusqu’à l’extrémité de la partition
3. GROUPING SETS vs ROLLUP vs CUBE
- ROLLUP(a,b,c) : Génère (a,b,c), (a,b), (a), ()
- CUBE(a,b,c) : Génère toutes les combinaisons possibles (2^n groupes)
- GROUPING SETS : Contrôle explicite des groupes générés
4. Fonctions utiles
- GROUPING() : Renvoie 1 si la colonne est un total agrégé, 0 sinon
- COALESCE() : Remplace les NULL par une valeur par défaut
- NULLIF() : Renvoie NULL si deux valeurs sont égales
5. Optimisation des CTE récursifs
- Limitez toujours la récursion avec une condition d’arrêt claire
- Évitez les cycles avec des vérifications appropriées
- Utilisez des arrays pour stocker les chemins et détecter les cycles
6. Debugging
Pour déboguer les requêtes complexes :
- Décomposez en CTE intermédiaires
- Testez chaque CTE individuellement
- Utilisez des LIMIT pour tester sur des échantillons
- Vérifiez les types de données et les conversions implicites