Corrigé TP - Extensions OLAP et Fenêtres de Partitionnement PostgreSQL

Section 1 : Premiers pas

Exercice 1.1 : Chargement et vérification de la base

-- Vérifier la liste des tables
\dt

-- Vérifier le schéma de quelques tables importantes
\d orders
\d orderdetails
\d customers
\d suppliers

Exercice 1.2 : Identification de la table de faits et dimension récursive

Réponse :

  • Table de faits : orderdetails (elle contient les mesures quantitatives : quantity, unitprice, discount)
  • Dimension récursive : employees (l’attribut reportsto fait référence à employeeid dans la même table, créant une hiérarchie managériale)

Section 2 : Extensions OLAP de la clause GROUP BY

Exercice 2.3.1 : Nombre de clients par pays

SELECT country, COUNT(*) as nb
FROM customers
GROUP BY country
ORDER BY country;

Exercice 2.3.2 : Nombres de commandes par pays et ville avec totaux (ROLLUP)

-- Solution avec ROLLUP
SELECT shipcountry AS ship_country, 
       shipcity AS ship_city,
       COUNT(*) AS nborders
FROM orders
GROUP BY ROLLUP(shipcountry, shipcity)
ORDER BY shipcountry, shipcity;

-- Alternative avec GROUPING SETS (plus explicite)
SELECT shipcountry AS ship_country, 
       shipcity AS ship_city,
       COUNT(*) AS nborders
FROM orders
GROUP BY GROUPING SETS (
    (shipcountry, shipcity),  -- Détail par pays et ville
    (shipcountry),             -- Total par pays
    ()                         -- Total général
)
ORDER BY shipcountry, shipcity;

Exercice 2.3.3 : Commandes et quantités par paire (pays Client, pays Fournisseur)

SELECT c.country AS c_country,
       s.country AS s_country,
       SUM(od.quantity) AS quantity,
       COUNT(DISTINCT o.orderid) AS nborder
FROM orders o
JOIN customers c ON o.customerid = c.customerid
JOIN orderdetails od ON o.orderid = od.orderid
JOIN products p ON od.productid = p.productid
JOIN suppliers s ON p.supplierid = s.supplierid
GROUP BY c.country, s.country
ORDER BY c.country, s.country;

Exercice 2.3.4 : Totaux à tous les niveaux (ROLLUP complet)

SELECT c.country AS c_country,
       s.country AS s_country,
       SUM(od.quantity) AS quantity,
       COUNT(DISTINCT o.orderid) AS nborder
FROM orders o
JOIN customers c ON o.customerid = c.customerid
JOIN orderdetails od ON o.orderid = od.orderid
JOIN products p ON od.productid = p.productid
JOIN suppliers s ON p.supplierid = s.supplierid
GROUP BY ROLLUP(c.country, s.country)
ORDER BY c.country, s.country;

Exercice 2.3.5 : Prix total avec fournisseur français (CUBE vs ROLLUP)

-- Solution 1 : avec CUBE (tous les croisements possibles)
SELECT shipcountry AS ship_country,
       shipregion AS ship_region,
       shipcity AS ship_city,
       ROUND(SUM(od.unitprice * od.quantity * (1 - od.discount))::numeric, 2) AS price
FROM orders o
JOIN orderdetails od ON o.orderid = od.orderid
JOIN products p ON od.productid = p.productid
JOIN suppliers s ON p.supplierid = s.supplierid
WHERE s.country = 'France'
GROUP BY CUBE(shipcountry, shipregion, shipcity)
HAVING GROUPING(shipcountry) = 0  -- Toujours afficher le pays
   AND NOT (GROUPING(shipcountry) = 1 AND GROUPING(shipregion) = 1 AND GROUPING(shipcity) = 1)  -- Exclure le total général
ORDER BY shipcountry, shipregion, shipcity;

-- Solution 2 : avec GROUPING SETS (contrôle précis des niveaux)
SELECT shipcountry AS ship_country,
       shipregion AS ship_region,
       shipcity AS ship_city,
       ROUND(SUM(od.unitprice * od.quantity * (1 - od.discount))::numeric, 2) AS price
FROM orders o
JOIN orderdetails od ON o.orderid = od.orderid
JOIN products p ON od.productid = p.productid
JOIN suppliers s ON p.supplierid = s.supplierid
WHERE s.country = 'France'
GROUP BY GROUPING SETS (
    (shipcountry, shipregion, shipcity),  -- Détail complet
    (shipcountry, shipregion),             -- Par pays et région
    (shipcountry)                          -- Par pays seulement
)
ORDER BY shipcountry, shipregion, shipcity;

Exercice 2.3.6 : Remplacer NULL par ‘whole country’

SELECT shipcountry AS s_country,
       COALESCE(shipcity, 'whole country') AS s_city,
       COUNT(*) AS nborders
FROM orders
GROUP BY ROLLUP(shipcountry, shipcity)
ORDER BY shipcountry, shipcity;

-- Alternative avec CASE et GROUPING()
SELECT shipcountry AS s_country,
       CASE 
           WHEN GROUPING(shipcity) = 1 THEN 'whole country'
           ELSE shipcity 
       END AS s_city,
       COUNT(*) AS nborders
FROM orders
GROUP BY ROLLUP(shipcountry, shipcity)
ORDER BY shipcountry, GROUPING(shipcity), shipcity;

Section 3 : Fenêtres de partitionnement

Exercice 3.4.1 : Statistiques par commande avec fenêtres multiples

SELECT orderid AS order_id,
       orderdate AS order_date,
       shipcity AS ship_city,
       shipcountry AS ship_country,
       freight,
       COUNT(*) OVER (PARTITION BY shipcity) AS n_v,
       COUNT(*) OVER (PARTITION BY shipcountry) AS n_p,
       SUM(freight) OVER (PARTITION BY shipcountry) AS poids_p,
       SUM(freight) OVER (PARTITION BY shipcountry, orderdate) AS poids_p_j
FROM orders
ORDER BY orderid;

Exercice 3.4.2 : Comptages cumulatifs

SELECT orderid AS order_id,
       orderdate AS order_date,
       shipcity AS ship_city,
       shipcountry AS ship_country,
       COUNT(*) OVER (PARTITION BY shipcity ORDER BY orderdate 
                      RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS nc_v,
       COUNT(*) OVER (PARTITION BY shipcountry ORDER BY orderdate 
                      RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS nc_p,
       COUNT(*) OVER (ORDER BY orderdate 
                      RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS nc,
       COUNT(*) OVER (PARTITION BY shipcountry, orderdate) AS n_pj
FROM orders
ORDER BY orderdate, orderid;

Exercice 3.4.3 : Ajout du comptage excluant la date courante

SELECT orderid AS order_id,
       orderdate AS order_date,
       shipcity AS ship_city,
       shipcountry AS ship_country,
       COUNT(*) OVER (PARTITION BY shipcity ORDER BY orderdate 
                      RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS nc_v,
       COUNT(*) OVER (PARTITION BY shipcountry ORDER BY orderdate 
                      RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS nc_p,
       COUNT(*) OVER (ORDER BY orderdate 
                      RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS nc,
       COUNT(*) OVER (PARTITION BY shipcountry, orderdate) AS n_pj,
       COUNT(*) OVER (PARTITION BY shipcountry ORDER BY orderdate 
                      ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS nc_p2
FROM orders
ORDER BY orderdate, orderid;

Exercice 3.4.4 : Commandes précédentes avec LAG

SELECT orderid AS order_id,
       LAG(orderid) OVER (ORDER BY orderdate, orderid) AS num_p,
       LAG(orderid) OVER (PARTITION BY shipcity ORDER BY orderdate, orderid) AS num_p_v
FROM orders
ORDER BY orderdate, orderid;

Exercice 3.4.5 : Statistiques par pays et ville

SELECT DISTINCT 
       shipcountry AS ship_country,
       shipcity AS ship_city,
       COUNT(*) OVER (PARTITION BY shipcountry, shipcity) AS nborders,
       COUNT(*) OVER (PARTITION BY shipcountry) AS nbordcty,
       MAX(COUNT(*) OVER (PARTITION BY shipcountry, shipcity)) 
           OVER (PARTITION BY shipcountry) AS nbormaxcty
FROM orders
ORDER BY shipcountry, shipcity;

Exercice 3.4.6 : Rang des villes par pays (DENSE_RANK)

WITH city_counts AS (
    SELECT shipcountry, shipcity, COUNT(*) AS nborders
    FROM orders
    GROUP BY shipcountry, shipcity
)
SELECT shipcountry AS ship_country,
       shipcity AS ship_city,
       nborders,
       DENSE_RANK() OVER (PARTITION BY shipcountry ORDER BY nborders) AS rank
FROM city_counts
ORDER BY shipcountry, nborders, shipcity;

Exercice 3.4.7 : Ajout du pourcentage

WITH city_counts AS (
    SELECT shipcountry, shipcity, COUNT(*) AS nborders
    FROM orders
    GROUP BY shipcountry, shipcity
)
SELECT shipcountry AS ship_country,
       shipcity AS ship_city,
       nborders,
       DENSE_RANK() OVER (PARTITION BY shipcountry ORDER BY nborders) AS rank,
       ROUND(nborders::numeric / SUM(nborders) OVER (PARTITION BY shipcountry), 2) AS percentg
FROM city_counts
ORDER BY shipcountry, nborders, shipcity;

Exercice 3.4.8 : Filtrage basé sur comparaison avec LAG

WITH order_prices AS (
    SELECT o.orderid,
           SUM(od.unitprice * od.quantity * (1 - od.discount)) AS price
    FROM orders o
    JOIN orderdetails od ON o.orderid = od.orderid
    GROUP BY o.orderid
),
filtered_orders AS (
    SELECT orderid,
           price,
           LAG(price) OVER (ORDER BY orderid) AS prev_price
    FROM order_prices
)
SELECT orderid AS order_id,
       ROUND(price::numeric, 3) AS price
FROM filtered_orders
WHERE prev_price IS NULL 
   OR price <= prev_price * 1.1
ORDER BY orderid;

Exercice 3.4.9 : Produits les plus vendus par année

-- Solution 1 : avec fenêtrage
WITH yearly_products AS (
    SELECT EXTRACT(YEAR FROM o.orderdate) AS year,
           p.productname,
           SUM(od.quantity) AS qtity
    FROM orders o
    JOIN orderdetails od ON o.orderid = od.orderid
    JOIN products p ON od.productid = p.productid
    GROUP BY EXTRACT(YEAR FROM o.orderdate), p.productname
),
ranked_products AS (
    SELECT year,
           productname,
           qtity,
           ROW_NUMBER() OVER (PARTITION BY year ORDER BY qtity DESC) AS rn
    FROM yearly_products
)
SELECT year,
       productname AS product_name,
       qtity
FROM ranked_products
WHERE rn = 1
ORDER BY year;

-- Solution 2 : sans fenêtrage (avec sous-requête corrélée)
WITH yearly_products AS (
    SELECT EXTRACT(YEAR FROM o.orderdate) AS year,
           p.productname,
           SUM(od.quantity) AS qtity
    FROM orders o
    JOIN orderdetails od ON o.orderid = od.orderid
    JOIN products p ON od.productid = p.productid
    GROUP BY EXTRACT(YEAR FROM o.orderdate), p.productname
)
SELECT yp.year,
       yp.productname AS product_name,
       yp.qtity
FROM yearly_products yp
WHERE yp.qtity = (
    SELECT MAX(qtity)
    FROM yearly_products yp2
    WHERE yp2.year = yp.year
)
ORDER BY year;

Section 4 : Fonctions de classement

Exercice 4.1 : RANK avec ex-aequo

SELECT orderid AS order_id,
       orderdate AS order_date,
       freight,
       RANK() OVER (ORDER BY freight DESC) AS rank
FROM orders
ORDER BY freight DESC, orderid;

Exercice 4.2 : DENSE_RANK sans saut de position

SELECT orderid AS order_id,
       orderdate AS order_date,
       freight,
       DENSE_RANK() OVER (ORDER BY freight DESC) AS rank
FROM orders
ORDER BY freight DESC, orderid;

Exercice 4.3 : ROW_NUMBER pour numérotation unique

SELECT orderid AS order_id,
       orderdate AS order_date,
       freight,
       ROW_NUMBER() OVER (ORDER BY freight DESC) AS row_number
FROM orders
ORDER BY freight DESC, orderid;

Exercice 4.4 : NTILE pour segmentation

SELECT orderid AS order_id,
       orderdate AS order_date,
       freight,
       NTILE(120) OVER (ORDER BY freight DESC) AS tile
FROM orders
ORDER BY orderid;

Exercice 4.5 : LEAD pour valeur suivante

SELECT orderid AS order_id,
       orderdate AS order_date,
       freight,
       LEAD(freight, 2) OVER (ORDER BY freight DESC) AS next_freight
FROM orders
ORDER BY orderid;

Exercice 4.6 : LAG pour valeur précédente

SELECT orderid AS order_id,
       orderdate AS order_date,
       freight,
       COALESCE(LAG(freight) OVER (ORDER BY orderdate, orderid), 0) AS previous_freight
FROM orders
ORDER BY orderid;

Exercice 4.7 : FIRST_VALUE

SELECT orderid AS order_id,
       orderdate AS order_date,
       freight,
       FIRST_VALUE(freight) OVER (ORDER BY orderdate, orderid 
                                   ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS first_freight
FROM orders
ORDER BY orderid;

Exercice 4.8 : LAST_VALUE

SELECT orderid AS order_id,
       orderdate AS order_date,
       freight,
       LAST_VALUE(freight) OVER (ORDER BY orderdate, orderid 
                                 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS last_freight
FROM orders
ORDER BY orderid;

Section 5 : Requêtes récursives

Exercice 5.1 : Table des entiers de 1 à 60

WITH RECURSIVE numbers AS (
    -- Cas de base
    SELECT 1 AS n
    
    UNION ALL
    
    -- Cas récursif
    SELECT n + 1
    FROM numbers
    WHERE n < 60
)
SELECT n FROM numbers;

Exercice 5.2 : Hiérarchie des employés

WITH RECURSIVE employee_hierarchy AS (
    -- Cas de base : le grand patron (celui qui n'a pas de supérieur)
    SELECT employeeid,
           firstname || ' ' || lastname AS fullname,
           reportsto,
           0 AS level,
           firstname || ' ' || lastname AS path,
           ARRAY[employeeid] AS path_array  -- Pour gérer l'ordre
    FROM employees
    WHERE reportsto IS NULL
    
    UNION ALL
    
    -- Cas récursif : les subordonnés
    SELECT e.employeeid,
           e.firstname || ' ' || e.lastname AS fullname,
           e.reportsto,
           eh.level + 1 AS level,
           eh.path || ' > ' || e.firstname || ' ' || e.lastname AS path,
           eh.path_array || e.employeeid AS path_array
    FROM employees e
    INNER JOIN employee_hierarchy eh ON e.reportsto = eh.employeeid
)
SELECT LPAD('', level * 4, ' ') || fullname AS employee,  -- Indentation
       path AS hierarchy_path
FROM employee_hierarchy
ORDER BY path_array;  -- Ordre préfixe (parcours en profondeur)

-- Alternative avec numérotation pour l'ordre préfixe
WITH RECURSIVE employee_hierarchy AS (
    SELECT employeeid,
           firstname || ' ' || lastname AS fullname,
           reportsto,
           0 AS level,
           firstname || ' ' || lastname AS path,
           '001'::text AS sort_path  -- Numérotation pour tri
    FROM employees
    WHERE reportsto IS NULL
    
    UNION ALL
    
    SELECT e.employeeid,
           e.firstname || ' ' || e.lastname AS fullname,
           e.reportsto,
           eh.level + 1 AS level,
           eh.path || ' > ' || e.firstname || ' ' || e.lastname AS path,
           eh.sort_path || '.' || LPAD(ROW_NUMBER() OVER (PARTITION BY eh.employeeid ORDER BY e.employeeid)::text, 3, '0')
    FROM employees e
    INNER JOIN employee_hierarchy eh ON e.reportsto = eh.employeeid
)
SELECT REPEAT('    ', level) || fullname AS employee,
       path AS hierarchy_path
FROM employee_hierarchy
ORDER BY sort_path;

Notes importantes et bonnes pratiques

1. Performance

  • Indexes : Assurez-vous que les colonnes utilisées dans les JOIN et WHERE ont des indexes appropriés
  • EXPLAIN ANALYZE : Utilisez cette commande pour analyser les plans d’exécution
  • Matérialized Views : Pour les requêtes complexes fréquemment exécutées

2. Différences entre fonctions de fenêtrage

  • ROWS : Compte les lignes physiques (utile pour LAG/LEAD)
  • RANGE : Considère les valeurs (utile pour les agrégations sur des plages de dates)
  • UNBOUNDED PRECEDING/FOLLOWING : Depuis/jusqu’à l’extrémité de la partition

3. GROUPING SETS vs ROLLUP vs CUBE

  • ROLLUP(a,b,c) : Génère (a,b,c), (a,b), (a), ()
  • CUBE(a,b,c) : Génère toutes les combinaisons possibles (2^n groupes)
  • GROUPING SETS : Contrôle explicite des groupes générés

4. Fonctions utiles

  • GROUPING() : Renvoie 1 si la colonne est un total agrégé, 0 sinon
  • COALESCE() : Remplace les NULL par une valeur par défaut
  • NULLIF() : Renvoie NULL si deux valeurs sont égales

5. Optimisation des CTE récursifs

  • Limitez toujours la récursion avec une condition d’arrêt claire
  • Évitez les cycles avec des vérifications appropriées
  • Utilisez des arrays pour stocker les chemins et détecter les cycles

6. Debugging

Pour déboguer les requêtes complexes :

  1. Décomposez en CTE intermédiaires
  2. Testez chaque CTE individuellement
  3. Utilisez des LIMIT pour tester sur des échantillons
  4. Vérifiez les types de données et les conversions implicites
Pierre-Henri Paris
Pierre-Henri Paris
Associate Professor in Artificial Intelligence

My research interests include Knowlegde Graphs, Information Extraction, and NLP.