June 2026
The 2025 VCE General Mathematics exams showed that Data Analysis was not difficult because the mathematics was unfamiliar.
It was difficult because the questions demanded precision.
Students needed to read statistical displays carefully, select the correct statistic, interpret calculator outputs, understand association, avoid causation claims, manage rounding and connect results to the context of the data.
Many errors came from small but costly decisions.
Was the graph showing customers or avocados?
Was the variable nominal or numerical?
Was the question asking for range, standard deviation or IQR?
Was the association positive or negative?
Was the prediction interpolation or extrapolation?
Was the residual actual minus predicted?
Should the answer be rounded, or left exact?
These decisions shaped performance.
In General Mathematics, Data Analysis rewards students who are careful with both the numbers and the words.
Histograms required frequency reasoning
Examination 1 began with a histogram showing the number of customers who bought bags of avocados of different sizes.
Question 1 asked for the median bag size.
The report explained that students needed to sum all the frequencies:
4 + 11 + 3 + 9 + 5 + 7 = 39
With 39 customers, the median is the 20th value. The 20th value falls in the fourth column, so the median bag size is 4.
This is a simple calculation, but it tests an important skill.
Students must use the frequencies.
The median is not found by looking at the middle of the horizontal axis. It is found by counting through the data values according to how often they occur.
A histogram is not just a picture.
It is a frequency display.
Total quantity was not the same as total frequency
Question 2 used the same avocado data but asked for the total number of avocados sold in bags.
This required multiplying each bag size by the number of customers who bought that bag size:
1 × 4 + 2 × 11 + 3 × 3 + 4 × 9 + 5 × 5 + 6 × 7 = 138
The answer was 138.
This question is a classic General Mathematics trap.
Adding the frequencies gives the number of customers. Multiplying each value by its frequency and then adding gives the total number of avocados.
Students needed to understand what the graph represented and what the question asked.
The graph showed bag sizes and customers.
The question asked for avocados.
That change mattered.
Boxplots needed statement-by-statement checking
Question 3 in Examination 1 asked students to compare two boxplots showing life expectancy for two samples of countries.
The correct statement was that life expectancy in all Sample T countries exceeded the median life expectancy in Sample H.
The report showed that the other statements were not supported by the graph:
- Sample T did not have a greater interquartile range than Sample H.
- The median for Sample T was not more than 10 years greater than the median for Sample H.
- The third quartile for Sample H was not greater than the first quartile for Sample T.
This is an important skill.
In multiple-choice boxplot questions, students should not choose the first statement that sounds plausible. They need to test each claim against the display.
Boxplot questions often compare medians, quartiles, ranges, interquartile ranges, minimums, maximums and outliers.
Each word matters.
Outliers required the fence calculation
Examination 1 Question 6 asked how many lower outliers would appear in a boxplot for Oceania life expectancy data.
The first quartile was 74.9 and the third quartile was 78.5.
So:
IQR = 78.5 − 74.9 = 3.6
The lower fence was:
74.9 − 1.5 × 3.6 = 69.5
The listed lower values were 68.5, 68.6, 69.0, 70.1 and 74.8. Three values were below 69.5, so there were three lower outliers.
This question rewarded a structured method.
Calculate the IQR.
Calculate the lower fence.
Compare values with the fence.
Students should avoid relying on visual instinct or assuming that the smallest value is always the only outlier.
Categorical data required the correct graph
Question 8 in Examination 1 asked for the most appropriate graphical display for car colour preference by gender.
The correct answer was a segmented bar chart.
This is because the data was categorical: gender and preferred car colour. A segmented bar chart allows each gender to be represented by a bar, with segments showing the number or percentage of respondents preferring each colour.
A histogram was not appropriate because histograms display numerical data grouped into intervals. Back-to-back stem plots and parallel boxplots also suit numerical data, not categorical preferences.
This is one of the simplest but most important Data Analysis distinctions.
Before choosing a display, identify the variable type.
Numerical data and categorical data are not displayed the same way.
Variable type mattered
In Examination 2 Question 1b, students had to classify the variable type for homes sold as apartment or house.
The answer was nominal.
This is because the categories have names but no inherent order.
Apartment and house are labels. They are not numerical values. They are not ordered categories.
This matters because variable type affects which statistics and displays are appropriate.
Nominal variables can be summarised using counts, percentages and segmented bar charts. They should not be treated like numerical measurements with means or standard deviations.
High-scoring students know the difference between numerical, nominal and ordinal variables.
Standard deviation had to be interpreted using the requested table
Examination 2 Question 1c.i asked students to find the standard deviation of apartment sale prices.
The report gave the value as $346 466 and noted that students needed to enter the data carefully into their calculators.
Question 1c.ii then asked students to comment on the relative spread in sale prices using the information in Table 2.
The report made a very important point: students needed to use the standard deviations in Table 2. It was not appropriate to use other statistics such as range or IQR.
The correct comparison was that house sale prices had a lower spread than apartment sale prices because the standard deviation for houses was lower.
This question shows why reading the instruction matters.
A student may know how to compare spread using range or IQR, but if the question directs them to Table 2, the response must use standard deviation.
The correct statistic is the statistic the question asks for.
Rounding instructions were not optional
Examination 2 Question 3 involved a normally distributed set of home sale prices.
The report stated that students sometimes rounded when they should not have. The correct percentage was 83.85%, not 84%.
This matters because the front of Examination 2 instructed students to round only when instructed.
General Mathematics students often round too early because it feels harmless. But rounding can change final answers, especially in multi-step questions. It can also fail to follow the stated examination instruction.
Students should develop a simple habit:
Do not round unless the question tells you to.
When it does tell you to round, follow the instruction exactly.
This is an easy way to protect marks.
The 68–95–99.7 rule needed careful positioning
Normal distribution questions appeared in both exams.
In Examination 1 Question 5, students were told that:
- 2.5% of heights were greater than 178.9 cm
- 16% of heights were less than 157.6 cm
Using the 68–95–99.7 rule, this meant:
mean + 2 standard deviations = 178.9
mean − 1 standard deviation = 157.6
Solving gave:
mean = 164.7
standard deviation = 7.1
This question tested whether students could map percentages to standard deviation positions.
2.5% in the upper tail corresponds to two standard deviations above the mean.
16% below corresponds to one standard deviation below the mean.
Students should draw a quick normal curve or mark the positions mentally.
The rule is not just a memory item.
It is a positioning tool.
Scatterplots required equation and context
Examination 1 Question 9 asked students to choose the equation of a least squares line for games won against goals against.
The report suggested using two points from the scatterplot, such as (27, 12) and (44, 9), to estimate the slope.
The slope was negative:
(9 − 12) ÷ (44 − 27) = −0.176
This supported the equation:
games won = 16.8 − 0.178 × goals against
This question shows how students can use the graph to check options.
They did not need exact regression output. They needed to identify the approximate slope and intercept.
A positive slope option could be eliminated because the scatterplot and context showed that more goals against was associated with fewer wins.
Association was not causation
Examination 1 Question 10 gave r = −0.466 for the association between games won and goals against.
The correct conclusion was that more goals scored against is associated with a smaller number of wins.
The report rejected options that claimed causation.
This is a central Data Analysis principle.
Correlation does not prove causation.
Even when a relationship seems logical, the statistical conclusion must be phrased as association unless the study design supports causation.
Students should use language such as:
is associated with
tends to be related to
has a negative association with
They should avoid saying:
causes
leads to
results in
unless the question explicitly supports a causal conclusion.
The sign of r came from the slope
Examination 2 Question 4b asked students to calculate the correlation coefficient from a coefficient of determination of 0.0806.
The square root gives:
√0.0806 = 0.284
But the least squares line had a negative slope:
sale price = 1 765 353 − 35 054 × distance from city centre
Therefore:
r = −0.284
The report noted that a large proportion of students left the answer as positive 0.284.
This is a major trap.
The coefficient of determination gives r², which is always non-negative. To find r, students must choose the sign using the direction of the association.
If the slope is negative, r is negative.
If the slope is positive, r is positive.
The square root alone is not enough.
Strength and direction needed separate answers
Examination 2 Question 4e asked students to describe the linear association between sale price and distance from the city centre in terms of strength and direction.
The correct description was:
strength: weak
direction: negative
This came from the value of r = −0.284 and the negative slope.
Students should answer these questions in the requested form. If a table is provided, use it. If strength and direction are asked separately, provide both.
A response such as “there is a relationship” is too vague.
Data Analysis expects precise language.
Weak, moderate or strong.
Positive or negative.
Interpolation and extrapolation depended on the x-value
Examination 2 Question 4d asked whether predicting the sale price of a home two kilometres from the city centre was interpolation or extrapolation.
The report made a subtle but important point.
Students needed to recognise that interpolation or extrapolation depends on the explanatory variable, not the response variable.
The explanatory variable was distance from city centre.
Therefore, students needed to check whether 2 km was inside or outside the range of distances in the original data. Since 2 km lay outside the explanatory variable data range, the prediction was extrapolation.
It was not appropriate to refer to the sale price being outside the data range.
Regression predicts response values from explanatory values. That is why the explanatory variable controls interpolation and extrapolation.
Residuals required actual minus predicted
Examination 2 Question 4f.i asked students to show that a missing residual was 27 984.
The least squares equation was:
sale price = 1 765 353 − 35 054 × distance from city centre
For a home 15.5 km from the city centre:
predicted value = 1 765 353 − 35 054 × 15.5 = 1 222 016
The actual sale price was 1 250 000.
So:
residual = actual − predicted
residual = 1 250 000 − 1 222 016 = 27 984
The report noted that students needed to show all working that led to the given residual value.
This is a key Exam 2 habit.
Even when the answer is supplied in the question, students must show the calculation that verifies it.
Plotting residuals required grid accuracy
Question 4f.ii then asked students to plot the missing residual on the residual plot.
The report noted that students needed to be precise when marking a point on a grid, taking particular care with the scale.
This is an execution skill.
The x-coordinate was 15.5.
The residual was 27 984, which is just above zero compared with the large vertical scale.
A careless plot could place the point too high or at the wrong x-value.
Graph marks are often lost not because students do not know what to plot, but because they ignore the scale.
Students should always read the axes before placing a point.
Calculator output needed interpretation
Several questions required calculator use, but the reports show that calculator use was not enough.
Students needed to:
- enter data correctly
- choose the correct statistic
- interpret exponent notation
- preserve signs
- avoid premature rounding
- connect output to context
For example, in Examination 2 Question 5a, the report noted that many students struggled with significant figures and calculator outputs such as 1.05E6.
This kind of issue is preventable.
Students should practise reading CAS outputs until they are completely comfortable with scientific notation, regression coefficients and large financial or statistical values.
A calculator can produce the right number.
The student still needs to understand it.
Data Analysis was contextual
The 2025 Data Analysis questions were not abstract.
They involved avocados, life expectancy, population density, car colour, football teams, doctors per 1000 people, home sale prices and distance from the city centre.
The context determined the meaning of the numbers.
For example:
A median bag size is not the same as total avocados sold.
A tax or variable type is not numerical just because it appears in a table.
A lower standard deviation means lower spread in that sample.
A negative residual means actual value below predicted value.
A negative association means as one variable increases, the other tends to decrease.
Students should always translate the mathematics back into the scenario.
That is what makes the answer complete.
Why Data Analysis errors happen
Data Analysis errors often happen because students rush.
They read the graph but not the question.
They calculate a statistic but do not use the one requested.
They take the square root of r² but forget the sign.
They describe association as causation.
They round when not instructed.
They use the response variable to judge extrapolation.
They plot a residual without checking the scale.
They misread calculator notation.
These are not usually gaps in the course content.
They are execution problems.
The 2025 exams rewarded students who slowed down and checked the meaning of each step.
What future General Mathematics students should learn from 2025
The 2025 VCE General Mathematics exams show that Data Analysis preparation needs to focus on accuracy, interpretation and context.
Students should practise:
- reading frequencies from histograms
- distinguishing total frequency from weighted totals
- comparing boxplot features carefully
- calculating fences for outliers
- choosing graph types based on variable type
- identifying nominal, ordinal and numerical variables
- comparing spread using the statistic requested
- applying the 68–95–99.7 rule
- avoiding unnecessary rounding
- interpreting calculator outputs correctly
- finding the sign of r from slope
- distinguishing association from causation
- identifying interpolation and extrapolation from the explanatory variable
- calculating residuals as actual minus predicted
- plotting points accurately using grid scales
These skills are fundamental because Data Analysis questions often appear accessible.
That accessibility can be deceptive.
The marks depend on precision.
How ATAR STAR approaches Data Analysis in General Mathematics
At ATAR STAR, Data Analysis is taught as interpretation first and calculation second.
Students learn to read statistical displays carefully, identify the variable type, choose the correct statistic, use CAS efficiently and explain results in context. They practise the common VCAA traps: rounding, residuals, association versus causation, interpolation versus extrapolation, and selecting the statistic specified by the question.
The 2025 Examination Reports confirm why this matters. High-scoring students did not simply calculate.
They interpreted accurately.