Find MEDIAN of a column in SQL (Hackerrank SQL: Weather Observation Station 20)

Kazi Zaber
3 min readNov 12, 2023

A median is defined as a number separating the higher half of a data set from the lower half. Query the median of the Northern Latitudes (LAT_N) from STATION and round your answer to 4 decimal places.

Question Source: Hackerrank SQL: Weather Observation Station 20

Input Format

The STATION table is described as follows:

where LAT_N is the northern latitude and LONG_W is the western longitude.

We can solve this in two different way,

Approach A:

For MYSQL 8.0 we can use window function ROW_NUMBER()
We first set a row_number for each row based on column lat_n (order by lat_n)


SELECT lat_n, ROW_NUMBER() OVER (ORDER BY lat_n ASC) AS rn
FROM station

The output of the above query looks like below. The query output has mapped each row with a rank number based on lat_n. We will use the query output as a temporary virtual table and do search on that.

In next step, we find MEDIAN row (the row in the middle of sorted data).
Since the above query output can be considered as a sorted temporary table, we do the following search

SELECT 
*
FROM
(
SELECT lat_n, ROW_NUMBER() OVER (ORDER BY lat_n) AS rn
FROM station
) AS subq
WHERE
rn = (SELECT CEIL((COUNT(rn)+1)/2) FROM station)
OR
rn = (SELECT FLOOR((COUNT(rn)+1)/2) FROM station)

The above query outputs -

Now, we got the middle row(s).

Finally, we can return the average lat_n value.

SELECT 
ROUND(AVG(lat_n), 4) AS median
FROM
(
SELECT lat_n, ROW_NUMBER() OVER (ORDER BY lat_n) AS rn
FROM station
) AS subq
WHERE
rn = (SELECT CEIL((COUNT(rn)+1)/2) FROM station)
OR
rn = (SELECT FLOOR((COUNT(rn)+1)/2) FROM station)

The final output is

Now, Can you think why AVG was necessary?

Approach B:

In older mysql versions, row_number() isn’t available. However, we can simply generate it by looping through all rows and using a SESSION VARIABLE

SET @rn := -1;
SELECT lat_n AS med, (@rn := @rn + 1) AS rn
FROM station
ORDER BY lat_n

The above query yields the following 0 based ranking

We can now use the output of the query as a searchable temporary table.

SET @rn := -1;
SELECT ROUND(AVG(subq.med), 4) AS median
FROM
(
SELECT lat_n AS med, (@rn := @rn + 1) AS rn
FROM station
ORDER BY lat_n
) AS subq
WHERE
subq.rn IN (FLOOR(@rn / 2), CEIL(@rn / 2));

Can you think how can we modify the query to use 1 based ranking instead of 0 based ranking above?

--

--