This analysis shows the raw and normalised distribution of base earning, driven distance and delivery time for each delivery run.
summary(uberdata[c('delivery_base_earning','distance','delivery_time_min')])
## delivery_base_earning distance delivery_time_min
## Min. : 4.62 Min. : 0.100 Min. : 6.50
## 1st Qu.: 7.34 1st Qu.: 2.800 1st Qu.: 16.60
## Median :10.36 Median : 5.100 Median : 24.47
## Mean :11.53 Mean : 5.834 Mean : 27.94
## 3rd Qu.:14.56 3rd Qu.: 7.900 3rd Qu.: 36.30
## Max. :36.87 Max. :27.900 Max. :103.00
# Data Preparation
dis<-uberdata$distance
time<-uberdata$delivery_time_min
earn<-uberdata$delivery_base_earning
# Normal Distribution
dis_norm <- rnorm(1300,mean = mean(dis,na.rm = TRUE), sd=sd(dis,na.rm = TRUE))
time_norm <- rnorm(1300,mean = mean(time,na.rm = TRUE), sd=sd(time,na.rm = TRUE))
earn_norm <- rnorm(1300,mean = mean(earn,na.rm = TRUE), sd=sd(earn,na.rm = TRUE))
#Visualisation
boxplot(dis, dis_norm, time, time_norm, earn, earn_norm,
main = "Multiple boxplots for comparision",
at = c(1,2,3,4,5,6),
names = c("distance", "normal", "time", "normal", "earning", "normal"),
las = 2,
col = c("blue","yellow"),
border = "brown",
horizontal = TRUE,
notch = TRUE)
Regarding the raw distribution it can be seen that each variable has a positively skewed distribution. As we normalize them, there is a negligible movement in quartiles, therefore we can proceed with the raw distribution.
This analysis compares two major variables, where driven distance acts as an independent variable to base earnings.
ggplot(uberdata,aes(x= distance, y=delivery_base_earning))+
geom_point(color = "Purple")+
geom_smooth(method=lm)+
stat_cor(method = "pearson")+
labs(title ="Distance vs. Base Earnings")
Regarding the r and p value we may say there is a significant positive strong correlation between two variables. But as we look at the R^2( 0.536), this correlation can only explain the 54% of the change in base earnings, which is a moderate level in model fitness.
This analysis compares other two major variables, where delivery duration acts as an independent variable to base earnings.
ggplot(uberdata,aes(x= delivery_time_min, y=delivery_base_earning)) +
geom_point(color = "Blue")+
geom_smooth(method=lm, color = "Orange")+
stat_cor(method = "pearson")+
labs(title ="Delivery Time vs. Base Earnings")
Compared to distance, driving time for each delivery run has a more significant effect on base earnings, regarding the r and p values.There is a significant positive very strong correlation among them. Moreover, we can explain 78% (R^2) of the change in earning via time, that is a strong fitness level.
Both variables, distance and time, have a significant positive correlation with base earning, which means as you drive longer in distance and time, it’s most likely to earn more money per delivery run. But despite the strong correlation, it can be deduced that time’s effect on earning is more explainable. It’s most likely to earn more as you spend more time on a single delivery rather than drive for more distance.