1.2.1 Exercise 3
Calculate the square root of 1369 using the
1.2.2 Exercise 4
Square the number 13 using the
1.2.3 Exercise 5
What is the result of summing all numbers from 1 to 100?
# sequence of numbers from 1 to 100 in steps of 1 numbers_1_to_100 <- seq(from = 1, to = 100, by = 1) # sum over the vector result <- sum(numbers_1_to_100) # print the result result
The result is 5050.
1.2.4 Exercise 6
Create the variable income with the values form our Berlin sample in R.
# create the income variable using the c() function income <- c( 19395, 22698, 40587, 25705, 26292, 42150, 29609, 12349, 18131, 20543, 37240, 28598, 29007, 26106, 19441, 42869, 29978, 5333, 32013, 20272, 14321, 22820, 14739, 17711, 18749 )
1.2.5 Exercise 7
Describe Berlin income using the appropriate measures of central tendency and dispersion.
We use the mean for the central tendency of income. The variable is interval scaled and the mean is the appropriate measure of central tendency for interval scaled variables. Our income variable is also normally distributed. Income distributions in most countries are right skewed. Therefore, the central tendency of income is often described using the median.
When asked, e.g., in an exam, to describe the central tendency of an interval scaled variable, use the mean. You can also use the median if you tell us why.
# central tendency of income mean(income)
# dispersion sd(income)
Average income in our Berlin sample is 24666.24. The average difference from that value is 9467.38.
1.2.6 Exercise 8
Compute the average deviation without using the sd() function.
We do this in several steps. First, we compute the mean.
mean.income <- sum(income) / length(income) # let's print the mean mean.income
Second, we take the differences between each individual realisation of income and the mean of income. The result must be a vector with the same amount of elements as the income vector.
# individual differences between each realisation of income and the mean of income diffs.from.mean <- income - mean.income # let's print the vector of differences diffs.from.mean
 -5271.24 -1968.24 15920.76 1038.76 1625.76 17483.76 4942.76  -12317.24 -6535.24 -4123.24 12573.76 3931.76 4340.76 1439.76  -5225.24 18202.76 5311.76 -19333.24 7346.76 -4394.24 -10345.24  -1846.24 -9927.24 -6955.24 -5917.24
You may be surprised that this works. After all, income is a vector with 25 elements and mean.income is a scalar (only one value). R treats all variables as vectors. It notices that mean.income is a shorter vector than income. The former has 1 element and the latter 25. The vector mean.income is recycled, so that it has the same length as income where each element is the same: the mean of income. If you did not understand this don’t worry. The important thing is that it works.
Our next step is to square the differences from the mean.
# square each element in the diffs.from.mean vector squared.diffs.from.mean <- diffs.from.mean^2 # print the squared vecto squared.diffs.from.mean
 27785971 3873969 253470599 1079022 2643096 305681864 24430876  151714401 42709362 17001108 158099441 15458737 18842197 2072909  27303133 331340472 28214794 373774169 53974882 19309345 107023991  3408602 98550094 48375363 35013729
We squared each individual element in the vector. Therefore, our new variable squared.diffs.from.mean still has 25 elements.
Squaring a value does two things. First, all values in our vector have become positive. Second, the marginal increase increases with distance, i.e., values that are close to the mean are only somewhat larger whereas values that are further from the mean become way larger. To see this, lets plot the square (we haven’t shown you the plot function yet, but we will do this next seminar).
# a vector of x values from negative 100 to positive 100 a <- seq(from = -100, to = 100, length.out = 200) # the square of that vector b <- a^2 # we plot the input vector a against b, where b is on the y-axis plot( x = a, # x-axis values y = b, # y-axis values bty = "n", # no border around plot type = "l", # connect individual dots to a line xlab = "input values from vector a", # x axis label ylab = "b = a^2" # y axis label )
In this plot, you should see that the slope of the line increases, the further we are from 0. We are taking individual differences from the mean. Hence, if a value is exactly at the mean, the difference is zero. The further, the value is from the mean (in any direction), the larger the output value.
We will sum over the individual elements in the next step. Hence, values that are further from the mean have a larger impact on the sum than values that are closer to the mean.
In the next step, we take the sum over our squared deviations from the mean
# sum over squared deviations vector sum.of.squared.deviations <- sum(squared.diffs.from.mean) # print the sum sum.of.squared.deviations
By summing over all elements of a vector, we end up with a scalar. The sum is 2151152126.56.
We divide the sum of squared deviations by \(n-1\). Recall, that \(n\) is the number of observations (elements in the vector) and \(-1\) is our sample adjustment.
# get the variance var.income <- sum.of.squared.deviations / ( length(income) - 1 ) # print the variance var.income
The squared average deviation from mean income is 89631338.61.
In the last step, we take the square root over the variance to return to our original units of income.
# get the standard deviation sqrt(var.income)
The average deviation from mean income in Berlin (24666.24) is 9467.38.
1.2.7 Exercise 9
What is the level of measurement of the variable in the Sunday Question?
The variable measures vote choice. The answers are categories, the parties, without any specific ordering. The level of measurement is called categorical or nominal.
1.2.8 Exercise 10
Take the most recent poll and describe what you see in terms of central tendency and dispersion.
The most recent poll was carried out by Infratest/dimap on Thursday, 6 September. The most common value, the mode, is the appropriate measure of central tendency. Christian Democrat (CDU/CSU) is the modal category. Dispersion of a categorical variable is the proportion in each category which we see displayed on the website: