This post is using data available from motor trend. The data is available in the R dataset package and its called mtcars. So this is the scenario,
You work for Motor Trend, a magazine about the automobile industry. Looking at a data set of a collection of cars, they are interested in exploring the relationship between a set of variables and miles per gallon (MPG) (outcome). They are particularly interested in the following two questions:
* Is an automatic or manual transmission better for MPG
* Quantify the MPG difference between automatic and manual transmissions
Description of data
The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models).
The data set mtcars is data frame with 32 observations on 11 variables.
[, 1] mpg Miles/(US) gallon
[, 2] cyl Number of cylinders
[, 3] disp Displacement (cu.in.)
[, 4] hp Gross horsepower
[, 5] drat Rear axle ratio
[, 6] wt Weight (1000 lbs)
[, 7] qsec 1/4 mile time
[, 8] vs V/S
[, 9] am Transmission (0 = automatic, 1 = manual)
[,10] gear Number of forward gears
[,11] carb Number of carburetors
Source
Henderson and Velleman (1981), Building multiple regression models interactively. Biometrics, 37, 391–411.
Data Processing
library(datasets)
data(“mtcars”)
head(mtcars)
Which is better for MGP, Manual or Automatic ?
We find the mean for Manual and Automatic from the data set and compare
mnManual <- mean(mtcars[mtcars$am==1,”mpg”])
mnAutomatic <- mean(mtcars[mtcars$am==0,”mpg”])
mnManual
mnAutomatic
for
The mean MPG for manual is greater than Automatic. We can give initial conclusion that the manual vehicles have better MPG than Automatic but by how much?
Lets investigate further by quantifying the difference.
Quantify the MPG difference between automatic and manual transmissions.
let us visualize the data using a boxplot
boxplot(mpg ~ am, data = mtcars, xlab = “Transmission”, ylab = “Miles per gallon”, main=”Miles per gallon by Transmission Type”)
From the plot, Manual (represented by 1) has a higher mean for mpg than automatic (represented by 0).
The mean mpg of Manual cars is greater than that of Automatic but we can not conclude since there are other variables in the data that may affect the mean difference which are not considered yet so we still have to do more testing.
Hypothesis testing.
Null hypothesis is that the mean MPG is the same for both Manual and Automatic cars.
aggregate(mpg~am, data = mtcars, mean)
manual <- mtcars[mtcars$am==1,]
automatic <- mtcars[mtcars$am==0,]
alpha <- 0.05
t.test(manual$mpg,automatic$mpg)
compare pvalue with alpha. Is pvalue less than Alpha?
pvalue<-t.test(manual$mpg,automatic$mpg)$p.value
pvalue < alpha
Since p-value = 0.001374 is less than alpha = 0.05, we reject the null hypothesis. There is a major difference between mpg of manual and automatic transmissions.
Next lets perform a linear regression on the data and see what the model says.
m1 <- lm(mpg~am,data =mtcars)
summary(m1)
From the summary of model m1, the intercept (am = 0 for automatic) is 17.147 and the cofficent of am(manual) is 7.245. which means the mean for manual is 7.245 more than that of automatic but R squared for this model is 0.3598 which means this model is explaining only 36% of the of the variance.
Lets consider multivariate linear regression with additional variables in the data set that can affect the mean of MPG
m2 <- lm(mpg~am+wt+hp+cyl,data=mtcars)
summary(m2)
From the model m2, it shows that manual cars (am = 1) has mean of MPG of 1.47805 greater than Automatic cars.
The R- squared value for this model is 0.849 which means this model is explaining 85% of the variance.
Let us run Analysis of variance on the two models and see what is looks like
anova(m1,m2)
From anova, we see including the other variables in our model is very significant so we choose model m2 for representing our data.
Now lets look at plot of the model and see what the residuals are saying about the distribution.
par(mfrow = c(2,2))
plot(m2)
Looking at the residuals plot, the residuals are normally distributed and homoskedastic, meaning the variance of the errors over the sample data are similar.
We conclude that manual cars have greater MPG than automatic cars by 1.47805 and we choose to model the data as m2 <- lm(mpg~am+wt+hp+cyl,data=mtcars)
so let me get this, your analysis of the mtcars dataset with 11 variables is saying that a 8 cyl, 460 displacement, 215 hp Lincoln Continental gets worst gas miles per gallon then a 4 cyl, 95.1 displacement 113 hp Lotus Europa because the Continental is an automatic? I think I see a flaw in your analysis.
HI Russel,
The analysis is performed over range of cars over data from certain period of time (1970s).
Since the analysis is based solely on the data to give over all performance difference between the two transmissions types, it is concluded to hold true that manual cars and automatic cars with the same specified variables will have a higher MPG.