In our previous post we explored the complex relationship between the Open Knowledge Index and wealth for a sample of countries. The basic linear model looked like this:

OKI = -2.34 + 0.28*Log GDP + error,

which we interpreted that a 10% increase in GDP/capita would yield, on average, a 0.028 increase in the Open Knowledge Index. The observation of the data, though, showed that high increases in the Open Knowledge Index are easier for richer countries than poorer, which raised a couple of questions regarding whether we can identify other factors that help build theories on *why *richer countries present higher increases in the open knowledge index than poorer ones.

Let’s begin observing the data again. A closer look at the graphic of the previous post allows us to see that apart from a low GDP/capita, most of the countries that present lower values in *both* GDP/capita and OKI are also highly populated countries. In fact, except Chile, the poorest countries in the sample are among the 20 most populated in the world, while the countries with the highest scores in the open knowledge index (Iceland, Norway, Sweden, Luxembourg, and Estonia) are among the less populated ones.

In order to assess the effect of population size on the relationship between weatlh and open knowledge, we add population size to the original linear model represented in our original figure. The results are as following (standard errors in parentheses):

(Intercept) -1.13 (0.685)

log(GDP) 0.22 (0.048)

log(population) -0.034 (0.016)

The new model is, therefore:

OKI = -1.13 + 0.22*Log GDP – 0.034*Log Population + error

With the new model, the marginal effect of wealth (GDP/capita) controlling for population size (setting population size to its mean) is represented in the figure below.

The dotted line represents the original relationship between both variables (wealth and OKI), and the blue line, on the other hand, represents this relationship once we take population size into account. This line shows that there still exist a positive relationship between wealth and open knowledge, but the relationship is mitigated once we account for population size.

It seems, though, that the fact that the poorer countries are also more populated could affect our analysis. Let’s recall that the sample used to generate the Open Knowledge Index was made of countries from the OECD plus the BRIC countries: Brazil, Russia, India, and China. The question is: since these countries are clearly biasing the relationship between population and OKI, how different would our model be should we drop the BRICs out of the sample?

The following table shows the results once we drop the BRIC countries from the sample (standard errors in parentheses):

(Intercept) -1.764 (0.724)

log(GDP) 0.293 (0.058)

log(population) -0.040 (0.016)

The new model is as follows:

OKI = -1.18 + 0.29*Log GDP – 0.04*Log Population + error

From the coefficients of the new model we observe that the removal of the BRIC countries did not have a large effect on the model. Once we plot the predicted values of the effect of wealth on the Open Knowledge Index, again keeping population to its mean, we can see that those highly populated countries had a huge effect on the previous model.

The new model is represented here by the red dotted line, which shows that once we remove the highly populated countries of the sample, the slope of the regression line is basically parallel to the one that we had when we didn’t control for population size.

Below is the R source code to freely reproduce the analyses and graphics of this post:

################################################## #AUTHOR: JOAN-JOSEP VALLBE #THEME: OPEN KNOWLEDGE, WEALTH, and POPULATION #DATA SOURCE: <http://openeconomics.net/open-knowledge-indicator/> #SOFTWARE: R version 2.15.0 (2012-03-30) #MACHINE OS: LINUX UBUNTU 12.04 ################################################## ################################# #LOAD THE DATA ################################# data <- read.csv("data/open_knowledge_indicator_0.1.csv", header=TRUE, sep=",") ord.gdp <- order(data$GDP) data <- data[ord.gdp,] ##################### #RESCALE #################### oki.res <- scale(data$Open_Knowledge_Index,center=FALSE) require(plotrix) oki.res <- rescale(data$Open_Knowledge_Index,c(0,1)) data <- data.frame(data,oki.res) ###################### #LINEAR MODEL ###################### mod.1 <- lm(oki.res~log(GDP),data=data) mod.2 <- update(mod.1,.~. + log(population)) #Without BRIC countries mod.1.b <- lm(oki.res~log(GDP),data=data[-c(1,2,3,7),]) mod.2.b <- update(mod.1.b,.~. + log(population)) ######################## #GENERATE FITTED VALUES ######################## a <- predict(mod.1,interval="confidence") b <- predict(mod.2,interval="confidence", newdata=within(data,{population <- mean(population)})) c <- predict(mod.2.b,interval="confidence", newdata=within(data[-c(1,2,3,7),],{population <- mean(population)})) ###################################### #PLOTTING THE DATA AND FITTING VALUES ###################################### plot(data$GDP,data$oki.res, pch=20, type="p", col="black", log="x", xlab="Log GDP per capita (PWT 7.0)", ylab="Open Knowledge Index 2009-10 (normalized)") text(data$GDP[-15],data$oki.res[-15], as.character(data$iso[-15]), cex=0.6, pos=1) text(data$GDP[15],data$oki.res[15], as.character(data$iso[15]), cex=0.6, pos=2) lines(data$GDP,a[,1],col="black",lty=2) lines(data$GDP,b[,1],col="blue") lines(data$GDP[-c(1,2,3,7)],c[,1],col="red",lty=2)