Jan 1, 0001

Corrigé TP - Extensions OLAP et Fenêtres de Partitionnement PostgreSQL

Section 1 : Premiers pas

Exercice 1.1 : Chargement et vérification de la base

-- Vérifier la liste des tables
\dt

-- Vérifier le schéma de quelques tables importantes
\d orders
\d orderdetails
\d customers
\d suppliers

Exercice 1.2 : Identification de la table de faits et dimension récursive

Réponse :

Table de faits : orderdetails (elle contient les mesures quantitatives : quantity, unitprice, discount)
Dimension récursive : employees (l’attribut reportsto fait référence à employeeid dans la même table, créant une hiérarchie managériale)

Section 2 : Extensions OLAP de la clause GROUP BY

Exercice 2.3.1 : Nombre de clients par pays

SELECT country, COUNT(*) as nb
FROM customers
GROUP BY country
ORDER BY country;

Exercice 2.3.2 : Nombres de commandes par pays et ville avec totaux (ROLLUP)

-- Solution avec ROLLUP
SELECT shipcountry AS ship_country, 
       shipcity AS ship_city,
       COUNT(*) AS nborders
FROM orders
GROUP BY ROLLUP(shipcountry, shipcity)
ORDER BY shipcountry, shipcity;

-- Alternative avec GROUPING SETS (plus explicite)
SELECT shipcountry AS ship_country, 
       shipcity AS ship_city,
       COUNT(*) AS nborders
FROM orders
GROUP BY GROUPING SETS (
    (shipcountry, shipcity),  -- Détail par pays et ville
    (shipcountry),             -- Total par pays
    ()                         -- Total général
)
ORDER BY shipcountry, shipcity;

Exercice 2.3.3 : Commandes et quantités par paire (pays Client, pays Fournisseur)

SELECT c.country AS c_country,
       s.country AS s_country,
       SUM(od.quantity) AS quantity,
       COUNT(DISTINCT o.orderid) AS nborder
FROM orders o
JOIN customers c ON o.customerid = c.customerid
JOIN orderdetails od ON o.orderid = od.orderid
JOIN products p ON od.productid = p.productid
JOIN suppliers s ON p.supplierid = s.supplierid
GROUP BY c.country, s.country
ORDER BY c.country, s.country;

Exercice 2.3.4 : Totaux à tous les niveaux (ROLLUP complet)

SELECT c.country AS c_country,
       s.country AS s_country,
       SUM(od.quantity) AS quantity,
       COUNT(DISTINCT o.orderid) AS nborder
FROM orders o
JOIN customers c ON o.customerid = c.customerid
JOIN orderdetails od ON o.orderid = od.orderid
JOIN products p ON od.productid = p.productid
JOIN suppliers s ON p.supplierid = s.supplierid
GROUP BY ROLLUP(c.country, s.country)
ORDER BY c.country, s.country;

Exercice 2.3.5 : Prix total avec fournisseur français (CUBE vs ROLLUP)

-- Solution 1 : avec CUBE (tous les croisements possibles)
SELECT shipcountry AS ship_country,
       shipregion AS ship_region,
       shipcity AS ship_city,
       ROUND(SUM(od.unitprice * od.quantity * (1 - od.discount))::numeric, 2) AS price
FROM orders o
JOIN orderdetails od ON o.orderid = od.orderid
JOIN products p ON od.productid = p.productid
JOIN suppliers s ON p.supplierid = s.supplierid
WHERE s.country = 'France'
GROUP BY CUBE(shipcountry, shipregion, shipcity)
HAVING GROUPING(shipcountry) = 0  -- Toujours afficher le pays
   AND NOT (GROUPING(shipcountry) = 1 AND GROUPING(shipregion) = 1 AND GROUPING(shipcity) = 1)  -- Exclure le total général
ORDER BY shipcountry, shipregion, shipcity;

-- Solution 2 : avec GROUPING SETS (contrôle précis des niveaux)
SELECT shipcountry AS ship_country,
       shipregion AS ship_region,
       shipcity AS ship_city,
       ROUND(SUM(od.unitprice * od.quantity * (1 - od.discount))::numeric, 2) AS price
FROM orders o
JOIN orderdetails od ON o.orderid = od.orderid
JOIN products p ON od.productid = p.productid
JOIN suppliers s ON p.supplierid = s.supplierid
WHERE s.country = 'France'
GROUP BY GROUPING SETS (
    (shipcountry, shipregion, shipcity),  -- Détail complet
    (shipcountry, shipregion),             -- Par pays et région
    (shipcountry)                          -- Par pays seulement
)
ORDER BY shipcountry, shipregion, shipcity;

Exercice 2.3.6 : Remplacer NULL par ‘whole country’

SELECT shipcountry AS s_country,
       COALESCE(shipcity, 'whole country') AS s_city,
       COUNT(*) AS nborders
FROM orders
GROUP BY ROLLUP(shipcountry, shipcity)
ORDER BY shipcountry, shipcity;

-- Alternative avec CASE et GROUPING()
SELECT shipcountry AS s_country,
       CASE 
           WHEN GROUPING(shipcity) = 1 THEN 'whole country'
           ELSE shipcity 
       END AS s_city,
       COUNT(*) AS nborders
FROM orders
GROUP BY ROLLUP(shipcountry, shipcity)
ORDER BY shipcountry, GROUPING(shipcity), shipcity;

Section 3 : Fenêtres de partitionnement

Exercice 3.4.1 : Statistiques par commande avec fenêtres multiples

SELECT orderid AS order_id,
       orderdate AS order_date,
       shipcity AS ship_city,
       shipcountry AS ship_country,
       freight,
       COUNT(*) OVER (PARTITION BY shipcity) AS n_v,
       COUNT(*) OVER (PARTITION BY shipcountry) AS n_p,
       SUM(freight) OVER (PARTITION BY shipcountry) AS poids_p,
       SUM(freight) OVER (PARTITION BY shipcountry, orderdate) AS poids_p_j
FROM orders
ORDER BY orderid;

Exercice 3.4.2 : Comptages cumulatifs

SELECT orderid AS order_id,
       orderdate AS order_date,
       shipcity AS ship_city,
       shipcountry AS ship_country,
       COUNT(*) OVER (PARTITION BY shipcity ORDER BY orderdate 
                      RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS nc_v,
       COUNT(*) OVER (PARTITION BY shipcountry ORDER BY orderdate 
                      RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS nc_p,
       COUNT(*) OVER (ORDER BY orderdate 
                      RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS nc,
       COUNT(*) OVER (PARTITION BY shipcountry, orderdate) AS n_pj
FROM orders
ORDER BY orderdate, orderid;

Exercice 3.4.3 : Ajout du comptage excluant la date courante

SELECT orderid AS order_id,
       orderdate AS order_date,
       shipcity AS ship_city,
       shipcountry AS ship_country,
       COUNT(*) OVER (PARTITION BY shipcity ORDER BY orderdate 
                      RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS nc_v,
       COUNT(*) OVER (PARTITION BY shipcountry ORDER BY orderdate 
                      RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS nc_p,
       COUNT(*) OVER (ORDER BY orderdate 
                      RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS nc,
       COUNT(*) OVER (PARTITION BY shipcountry, orderdate) AS n_pj,
       COUNT(*) OVER (PARTITION BY shipcountry ORDER BY orderdate 
                      ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS nc_p2
FROM orders
ORDER BY orderdate, orderid;

Exercice 3.4.4 : Commandes précédentes avec LAG

SELECT orderid AS order_id,
       LAG(orderid) OVER (ORDER BY orderdate, orderid) AS num_p,
       LAG(orderid) OVER (PARTITION BY shipcity ORDER BY orderdate, orderid) AS num_p_v
FROM orders
ORDER BY orderdate, orderid;

Exercice 3.4.5 : Statistiques par pays et ville

SELECT DISTINCT 
       shipcountry AS ship_country,
       shipcity AS ship_city,
       COUNT(*) OVER (PARTITION BY shipcountry, shipcity) AS nborders,
       COUNT(*) OVER (PARTITION BY shipcountry) AS nbordcty,
       MAX(COUNT(*) OVER (PARTITION BY shipcountry, shipcity)) 
           OVER (PARTITION BY shipcountry) AS nbormaxcty
FROM orders
ORDER BY shipcountry, shipcity;

Exercice 3.4.6 : Rang des villes par pays (DENSE_RANK)

WITH city_counts AS (
    SELECT shipcountry, shipcity, COUNT(*) AS nborders
    FROM orders
    GROUP BY shipcountry, shipcity
)
SELECT shipcountry AS ship_country,
       shipcity AS ship_city,
       nborders,
       DENSE_RANK() OVER (PARTITION BY shipcountry ORDER BY nborders) AS rank
FROM city_counts
ORDER BY shipcountry, nborders, shipcity;

Exercice 3.4.7 : Ajout du pourcentage

WITH city_counts AS (
    SELECT shipcountry, shipcity, COUNT(*) AS nborders
    FROM orders
    GROUP BY shipcountry, shipcity
)
SELECT shipcountry AS ship_country,
       shipcity AS ship_city,
       nborders,
       DENSE_RANK() OVER (PARTITION BY shipcountry ORDER BY nborders) AS rank,
       ROUND(nborders::numeric / SUM(nborders) OVER (PARTITION BY shipcountry), 2) AS percentg
FROM city_counts
ORDER BY shipcountry, nborders, shipcity;

Exercice 3.4.8 : Filtrage basé sur comparaison avec LAG

WITH order_prices AS (
    SELECT o.orderid,
           SUM(od.unitprice * od.quantity * (1 - od.discount)) AS price
    FROM orders o
    JOIN orderdetails od ON o.orderid = od.orderid
    GROUP BY o.orderid
),
filtered_orders AS (
    SELECT orderid,
           price,
           LAG(price) OVER (ORDER BY orderid) AS prev_price
    FROM order_prices
)
SELECT orderid AS order_id,
       ROUND(price::numeric, 3) AS price
FROM filtered_orders
WHERE prev_price IS NULL 
   OR price <= prev_price * 1.1
ORDER BY orderid;

Exercice 3.4.9 : Produits les plus vendus par année

-- Solution 1 : avec fenêtrage
WITH yearly_products AS (
    SELECT EXTRACT(YEAR FROM o.orderdate) AS year,
           p.productname,
           SUM(od.quantity) AS qtity
    FROM orders o
    JOIN orderdetails od ON o.orderid = od.orderid
    JOIN products p ON od.productid = p.productid
    GROUP BY EXTRACT(YEAR FROM o.orderdate), p.productname
),
ranked_products AS (
    SELECT year,
           productname,
           qtity,
           ROW_NUMBER() OVER (PARTITION BY year ORDER BY qtity DESC) AS rn
    FROM yearly_products
)
SELECT year,
       productname AS product_name,
       qtity
FROM ranked_products
WHERE rn = 1
ORDER BY year;

-- Solution 2 : sans fenêtrage (avec sous-requête corrélée)
WITH yearly_products AS (
    SELECT EXTRACT(YEAR FROM o.orderdate) AS year,
           p.productname,
           SUM(od.quantity) AS qtity
    FROM orders o
    JOIN orderdetails od ON o.orderid = od.orderid
    JOIN products p ON od.productid = p.productid
    GROUP BY EXTRACT(YEAR FROM o.orderdate), p.productname
)
SELECT yp.year,
       yp.productname AS product_name,
       yp.qtity
FROM yearly_products yp
WHERE yp.qtity = (
    SELECT MAX(qtity)
    FROM yearly_products yp2
    WHERE yp2.year = yp.year
)
ORDER BY year;

Section 4 : Fonctions de classement

Exercice 4.1 : RANK avec ex-aequo

SELECT orderid AS order_id,
       orderdate AS order_date,
       freight,
       RANK() OVER (ORDER BY freight DESC) AS rank
FROM orders
ORDER BY freight DESC, orderid;

Exercice 4.2 : DENSE_RANK sans saut de position

SELECT orderid AS order_id,
       orderdate AS order_date,
       freight,
       DENSE_RANK() OVER (ORDER BY freight DESC) AS rank
FROM orders
ORDER BY freight DESC, orderid;

Exercice 4.3 : ROW_NUMBER pour numérotation unique

SELECT orderid AS order_id,
       orderdate AS order_date,
       freight,
       ROW_NUMBER() OVER (ORDER BY freight DESC) AS row_number
FROM orders
ORDER BY freight DESC, orderid;

Exercice 4.4 : NTILE pour segmentation

SELECT orderid AS order_id,
       orderdate AS order_date,
       freight,
       NTILE(120) OVER (ORDER BY freight DESC) AS tile
FROM orders
ORDER BY orderid;

Exercice 4.5 : LEAD pour valeur suivante

SELECT orderid AS order_id,
       orderdate AS order_date,
       freight,
       LEAD(freight, 2) OVER (ORDER BY freight DESC) AS next_freight
FROM orders
ORDER BY orderid;

Exercice 4.6 : LAG pour valeur précédente

SELECT orderid AS order_id,
       orderdate AS order_date,
       freight,
       COALESCE(LAG(freight) OVER (ORDER BY orderdate, orderid), 0) AS previous_freight
FROM orders
ORDER BY orderid;

Exercice 4.7 : FIRST_VALUE

SELECT orderid AS order_id,
       orderdate AS order_date,
       freight,
       FIRST_VALUE(freight) OVER (ORDER BY orderdate, orderid 
                                   ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS first_freight
FROM orders
ORDER BY orderid;

Exercice 4.8 : LAST_VALUE

SELECT orderid AS order_id,
       orderdate AS order_date,
       freight,
       LAST_VALUE(freight) OVER (ORDER BY orderdate, orderid 
                                 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS last_freight
FROM orders
ORDER BY orderid;

Section 5 : Requêtes récursives

Exercice 5.1 : Table des entiers de 1 à 60

WITH RECURSIVE numbers AS (
    -- Cas de base
    SELECT 1 AS n
    
    UNION ALL
    
    -- Cas récursif
    SELECT n + 1
    FROM numbers
    WHERE n < 60
)
SELECT n FROM numbers;

Exercice 5.2 : Hiérarchie des employés

WITH RECURSIVE employee_hierarchy AS (
    -- Cas de base : le grand patron (celui qui n'a pas de supérieur)
    SELECT employeeid,
           firstname || ' ' || lastname AS fullname,
           reportsto,
           0 AS level,
           firstname || ' ' || lastname AS path,
           ARRAY[employeeid] AS path_array  -- Pour gérer l'ordre
    FROM employees
    WHERE reportsto IS NULL
    
    UNION ALL
    
    -- Cas récursif : les subordonnés
    SELECT e.employeeid,
           e.firstname || ' ' || e.lastname AS fullname,
           e.reportsto,
           eh.level + 1 AS level,
           eh.path || ' > ' || e.firstname || ' ' || e.lastname AS path,
           eh.path_array || e.employeeid AS path_array
    FROM employees e
    INNER JOIN employee_hierarchy eh ON e.reportsto = eh.employeeid
)
SELECT LPAD('', level * 4, ' ') || fullname AS employee,  -- Indentation
       path AS hierarchy_path
FROM employee_hierarchy
ORDER BY path_array;  -- Ordre préfixe (parcours en profondeur)

-- Alternative avec numérotation pour l'ordre préfixe
WITH RECURSIVE employee_hierarchy AS (
    SELECT employeeid,
           firstname || ' ' || lastname AS fullname,
           reportsto,
           0 AS level,
           firstname || ' ' || lastname AS path,
           '001'::text AS sort_path  -- Numérotation pour tri
    FROM employees
    WHERE reportsto IS NULL
    
    UNION ALL
    
    SELECT e.employeeid,
           e.firstname || ' ' || e.lastname AS fullname,
           e.reportsto,
           eh.level + 1 AS level,
           eh.path || ' > ' || e.firstname || ' ' || e.lastname AS path,
           eh.sort_path || '.' || LPAD(ROW_NUMBER() OVER (PARTITION BY eh.employeeid ORDER BY e.employeeid)::text, 3, '0')
    FROM employees e
    INNER JOIN employee_hierarchy eh ON e.reportsto = eh.employeeid
)
SELECT REPEAT('    ', level) || fullname AS employee,
       path AS hierarchy_path
FROM employee_hierarchy
ORDER BY sort_path;

Notes importantes et bonnes pratiques

1. Performance

Indexes : Assurez-vous que les colonnes utilisées dans les JOIN et WHERE ont des indexes appropriés
EXPLAIN ANALYZE : Utilisez cette commande pour analyser les plans d’exécution
Matérialized Views : Pour les requêtes complexes fréquemment exécutées

2. Différences entre fonctions de fenêtrage

ROWS : Compte les lignes physiques (utile pour LAG/LEAD)
RANGE : Considère les valeurs (utile pour les agrégations sur des plages de dates)
UNBOUNDED PRECEDING/FOLLOWING : Depuis/jusqu’à l’extrémité de la partition

3. GROUPING SETS vs ROLLUP vs CUBE

ROLLUP(a,b,c) : Génère (a,b,c), (a,b), (a), ()
CUBE(a,b,c) : Génère toutes les combinaisons possibles (2^n groupes)
GROUPING SETS : Contrôle explicite des groupes générés

4. Fonctions utiles

GROUPING() : Renvoie 1 si la colonne est un total agrégé, 0 sinon
COALESCE() : Remplace les NULL par une valeur par défaut
NULLIF() : Renvoie NULL si deux valeurs sont égales

5. Optimisation des CTE récursifs

Limitez toujours la récursion avec une condition d’arrêt claire
Évitez les cycles avec des vérifications appropriées
Utilisez des arrays pour stocker les chemins et détecter les cycles

6. Debugging

Pour déboguer les requêtes complexes :

Décomposez en CTE intermédiaires
Testez chaque CTE individuellement
Utilisez des LIMIT pour tester sur des échantillons
Vérifiez les types de données et les conversions implicites

Corrigé TP - Extensions OLAP et Fenêtres de Partitionnement PostgreSQL

Section 1 : Premiers pas

Exercice 1.1 : Chargement et vérification de la base

Exercice 1.2 : Identification de la table de faits et dimension récursive

Section 2 : Extensions OLAP de la clause GROUP BY

Exercice 2.3.1 : Nombre de clients par pays

Exercice 2.3.2 : Nombres de commandes par pays et ville avec totaux (ROLLUP)

Exercice 2.3.3 : Commandes et quantités par paire (pays Client, pays Fournisseur)

Exercice 2.3.4 : Totaux à tous les niveaux (ROLLUP complet)

Exercice 2.3.5 : Prix total avec fournisseur français (CUBE vs ROLLUP)

Exercice 2.3.6 : Remplacer NULL par ‘whole country’

Section 3 : Fenêtres de partitionnement

Exercice 3.4.1 : Statistiques par commande avec fenêtres multiples

Exercice 3.4.2 : Comptages cumulatifs

Exercice 3.4.3 : Ajout du comptage excluant la date courante

Exercice 3.4.4 : Commandes précédentes avec LAG

Exercice 3.4.5 : Statistiques par pays et ville

Exercice 3.4.6 : Rang des villes par pays (DENSE_RANK)

Exercice 3.4.7 : Ajout du pourcentage

Exercice 3.4.8 : Filtrage basé sur comparaison avec LAG

Exercice 3.4.9 : Produits les plus vendus par année

Section 4 : Fonctions de classement

Exercice 4.1 : RANK avec ex-aequo

Exercice 4.2 : DENSE_RANK sans saut de position

Exercice 4.3 : ROW_NUMBER pour numérotation unique

Exercice 4.4 : NTILE pour segmentation

Exercice 4.5 : LEAD pour valeur suivante

Exercice 4.6 : LAG pour valeur précédente

Exercice 4.7 : FIRST_VALUE

Exercice 4.8 : LAST_VALUE

Section 5 : Requêtes récursives

Exercice 5.1 : Table des entiers de 1 à 60

Exercice 5.2 : Hiérarchie des employés

Notes importantes et bonnes pratiques

1. Performance

2. Différences entre fonctions de fenêtrage

3. GROUPING SETS vs ROLLUP vs CUBE

4. Fonctions utiles

5. Optimisation des CTE récursifs

6. Debugging

Pierre-Henri Paris

Associate Professor in Artificial Intelligence