Education, class, and turnout: is Spain so special?

Introduction

A few months ago, political scientist Aina Gallego published this excellent post explaining why, according to her analysis, and contrary to general wisdom and past empirical research, in Spain the better off do not present higher turnout rates than those with lower income levels. She has studied unequal turnout in industrial democracies thoroughly, with special focus on the interaction between individual and contextual factors.

According to the traditional argument, political participation is costly—it requires the mobilization of individual resources such as time, money, and information. Tipically richer people are better educated, thus incurring in relatively lower costs of information. Therefore, those with higher levels of education will be more prone to vote than less educated citizens. Since in most democracies the better educated are also richer, usually it will be the case that richer people will present higher rates of voting than poorer people. In particular, the level of eduation has largely been found a strong predictor of both political participation and social status, to the extent that “[e]ducational measures can be better compared across countries than other factors, such as income” (Gallego 2010). Therefore, measuring the effect of education on turnout, it is argued, is a way to measure unequal turnout—i.e., the effect of social stuatus on political participation.

I wouldn’t discuss Gallego’s general findings, neither in ther paper nor in the blog post: among Spanish citizens, the level of education does not seem to be a good predictor of electoral turnout. But, is this finding sufficient to state that in Spain there’s not turnout inequality—that richer people vote more than poorer people?

I argue that unlike other industrial democracies, in Spain:

  • the level of education is not a straightforward way to compare between income levels;
  • turnout inequality is empirically strong when we use a measure of social class instead the level of education.

To carry out the analysis I shall use official survey data from the Spanish Center of Sociological Research (CIS 2923), from December 2011.

Education and electoral turnout

We first focus on the relationship between level of education and turnout, which was the main issue of the author.

Table 1 presents this relationship in percentages: the data do not show a clear, positive relationship between education and turnout. Should there be a strong relationship, less educated people would present a much lower percentage of voting than the one presented in the table. Indeed, the percentage of turnout among least educated people is higher than the one among those who have primary or vocational education. Is this, though, sufficient to conclude that in Spain the relationship between education (and therefore income) and  turnout is inexistent or extremely weak? I argue it is not.

Relationship between the level of education and voting.

Table 1. Relationship between the level of education and voting.

The first reason is demographic. In Spain, the level of education attainment among citizens older than 55 is below the OECD average. Figure 1 shows the evolution of the percentage of people with less than primary education and with primary education among the working population.

Distribution of the lower levels of education among Spanish working population, 1977-2000 (EAP Survey)

We see that while the percentage of pople with only primary education has decreased dramatically in the last years, the percentage of non-educated workers has decreased at a much slower pace. Spanish non-educated workers are a whole generational group.

Table 2 below shows the distribution of each level of education along age groups. If we focus on the first row of the table, 56.4 percent of non-educated prople are between 56 and 76 years old, while there is no single person younger than 37 without at least a primary education certificate. Globally, non-educated people represent 6.5 percent of the sample, only a bit less than those with superior education.

Relationship between level of education and age in Spain

Table 2. Relationship between level of education and age in Spain

In order to test the relationship between education and turnout, I fit a simple logistic regression model with binary response (vote=1, non-vote=0).  Results are presented in Figure 2 below, which plots the predicted probability of voting among the different levels of education.

Predicted effect of level of education on electoral turnout (logistic regression)

Figure 2. Predicted effect of level of education on electoral turnout (logistic regression)

It shows, indeed, that it is precisely the non-educated cateogory the one that breaks an otherwise classic pattern: those with primary education tend to vote less than those better educated.

Why do I believe that Spanish non-educated citizens tend to vote more than expected? I would point to at least two main reasons. First, in terms of age, Table 2 above tells us that more than half of non-educated citizens belong to the age groups that present higher levels of overall turnout (middle-aged and early retirement). Should they only be much older (>80) or poorer (e.g., marginal social groups) their likelihood of voting would be much lower. Additional to this age effect, this is also the generation of the Spanish democratic transition, which might also present a specific level of mobilization. This, though, I cannot measure. The second reason is that non-educated people in Spain are not necessarily the poorer people.

Table 3 below shows the relationship between level of education and social class, in column percentages.  Obviously there is a general relationship between level of education and social class. None of the non-educated are upper/upper-middle class. Most people with college education belong to upper/middle classses, and just a few of those with lower educational attainment belong to the upper classes. This is indisputable. 

Yet, what’s the matter with non-educated citizens? Should there be a strong relationship between class and education, we would expect that most people in this group would be non-qualified workers. The data, though, show that 40 percent of non-educated people belong to the old middle class, I guess typically shopkeepers and self-employed owners of small workshops, bars, etc. 40 percent is not a small amount: notice that among those with primary education only 20 percent are middle class. An additional 40 percent of non-educated people are skilled workers. They are typically insiders 

Therefore, the statement that in Spain the level of education is a good predictor of social class should be taken with care. Tables 3 and 4 present this relationship in column and row percentages, and show that while things are much clear among college and higher educated people, the other categories present a fuzzier profile.

Relationship between social class and level of education.

Table 3. Relationship between social class and level of education (column %)

Relationship between class and level of education (row %)

Table 4. Relationship between class and level of education (row %)

Social class and electoral turnout

What happens when we explore turnout inequality in Spain through social class instead of the level of education? The objective social class variable in CIS is the result of combining  several variables. Table 5 below shows the relationship between social class and electoral turnout.

Table 5. Relationship between social class and electoral turnout.

Table 5. Relationship between social class and electoral turnout.

If richer people don’t tend to turnout more than poorer citizens, we would observe similar turnout percentages along classes. That is not the case: unskilled workers fail to vote at a higher level than upper-middle class citizens. We have modeled this relationship, again, through a logistic regression model with binary response (vote=1, non-vote=0) and results are in Figure 3, showing that citizens in upper classes (i.e., richer people) are more likely to turn out in elections than those of lower social class.

logist_estatus_participacio

Figure 3. Predicted effect of social class on electoral turnout (logistic regression)

Conclusions

Education attainment in Spain is a complex matter with deep roots in history (Franco dictatorship, the role of the Catholic church, etc.), posing problems in the use of education as a predictor for exploring participation inequality and generally political behavior. We have shown that social class might be a more straightforward way to predict the relationship between wealth and participation, and that when we use it, we observe a clear pattern of inequality: in Spain, as in most industrialized countries, those with higher income levels tend to vote more than poorer citizens.

Advertisements
Posted in Uncategorized | Leave a comment

Class and vote in Catalonia (part 2)

I recently posted on the relationship between class and vote in Catalonia [please check the error correction on the first table of the old post]. The main point of the post was to discuss one of the most widespread political beliefs in our country, namely that the two main parties of the Catalan party system (center-right CiU, center-left PSC) have a well-defined  and distinct class-dependent vote bases. Yet, we saw that data tell us that classwise both parties have only slightly different voters, both may be defined as inter-class parties, and that explanations of voting preferences should be found elsewhere.

Before digging into voter profiles, though, a further inquiry on the class distribution of Catalan voters should be carried out in order to test the second most spread belief about the Catalan party system—that class differences among voters of both parties really show up in Spanish legislative elections but not in Catalan elections due to differences in voter mobilization and self-selection, or what has been called dual voting and differential abstention.

In other words, the point of the differential abstention argument is that there’s a significant set of Spanish-origin working-class voters (i.e., likely to vote for the left) who never turn out in Catalan legislative elections but only in Spanish legislative elections to vote for the left (mainly PSC). This way, the argument goes, the class structure of vote gets blurred in Catalan elections due to the absence of a significant portion of the potential voting population, which in turn is very homogeneous in its socioeconomic status and cultural (or national) identity.

The discussion of our previous post was based upon 2010 panel data, and in particular on voter preferences in 2006 Catalan elections (CIS, 2660). Vote preferences in the 2008 Spanish legislative election is also part of the questionnaire. Let’s see.

The table below shows how did social groups vote in the 2008 Spanish legislative elections (to be read row wise). This was Zapatero’s second election and as can be seen in the table a majority of voters in Catalonia voted for PSC (center-left), this party being dominant among all classes, high/upper-middle class included. In fact, 65% of the high and upper-middle class voted for a left party in that election (PSC + ICV + ERC), as did a similar percentage of each socioeconomic group (with a peak of 70% of unqualified workers voting for the left).

Vote in Catalonia by socioeconomic status in 2008 Spanish legislative elections.

Regarding the conservative, Catalan nationalist party CiU, the table shows that even in Spanish legislative elections (in which it traditionally has lower support) it keeps receiving compact support within all economic groups, unqualified workers included. Similarly to what we saw for the 2006 Catalan legislative election in the previous post, then, PSC is clearly dominant among the working class (even more so in Spanish elections), but it is not the case that CiU is the preferred party of the high and upper-middle class, nor is it most voted among the old, rural middle-class. Suming up, compared to the 2006 Catalan election, in the 2008 Spanish legislative election PSC recieved more votes among all classes, and CiU received less votes among all classes, too. The plot below shows how much more and less vote each party received between both elections.

% vote difference in 2008 compared to 2006 elections

The PSC received on average 21.6 percentage points more in % vote within all socioeconomic groups, while CiU lost on average 3 percentage points in each class. Since the change in support received by each party between the Catalan and the Spanish election was quite homogeneous among social classes, we expect that the socioeconomic distribution of each party’s voters remains unchanged compared to the Catalan election. The table below shows exactly that (to be read column wise).

Socioeconomic distribution within each Catalan party’s voters in the 2008 Spanish legislative election.

It is stricking how little some things changed between those two elections when you look into the social distribution of vote. Again, PSC and CiU present only slight differences in their social base even in Spanish elections (since they gained or lost votes very uniformly from all social groups). If any, the only common pattern between both elections is that the weight of high/upper-middle class voters in each party is higher in the Spanish 2008 election than in the Catalan election, reaching a noteworthy peak in the ex-communists, green ICV, where almost half of its votes come from the better off.

[There will be a third part of this series of posts. In it I will explore inter-party vote transfers between Catalan and Spanish elections in order to test the dual voting and differential abstention argument.]

Posted in Uncategorized | Tagged , , , , | Leave a comment

Class and vote in Catalonia (part 1)

ESADE sociologist José Luis Álvarez recently published  this opinion piece in the Spanish newspaper El País, in which he makes some statements about class and vote in the Catalan party system. His main remarks [in Spanish, emphasize is mine] on this are:

Si el catalanismo se permite este crescendo reivindicativo es porque ha dejado atrás su gran peligro histórico: que las clases trabajadoras, de cultura mayoritariamente no catalana, se opusiesen a su proyecto. Esta amenaza era acuciante porque CiU ha sido incapaz de ampliar su espacio electoral más allá de la clase alta y clases medias de origen catalán, nunca ha superado el porcentaje demográfico de éstas, poco más del 30% de la población. El catalanismo es la plataforma de hegemonía de la burguesía de origen catalán, y CiU es su partido. […]

Pero si hay un partido que ha facilitado el avance del catalanismo ha sido el partido socialista de Cataluña. En su role de partido de gobierno desde los años del President Pujol, cuando nacionalistas y socialistas se repartieron la administración del país –Generalitat para CiU, ayuntamientos para la izquierda– el PSC se concibió a sí mismo como un partido interclasista. Pero la transversalidad del PSC fue desigual: mientras su base electoral, siempre fiel, fueron los barrios y ciudades obreras de emigrantes españoles, sólo logró avances blandos en los segmentos profesionales más cosmopolitas de la clase media.

To summarize, Álvarez states that, in Catalonia, the center-right, Catalan nationalist party Convergència i Unió (CiU) is the political platform of the Catalan bourgeoisie, since it receives most of its votes among Catalan high and upper middle class voters, while the main opposition party, the socialdemocrats of Partit dels Socialistes de Catalunya (PSC) gathers most of its votes from Spanish-origin working-class and a few votes from urban middle class.

This, actually, has been a very common and popular political belief in Catalonia and Spain, reproduced over and over again by all kinds of pundits in newspapers and other media.

Here I use postelectoral panel data from the Spanish-government Centro de Investigaciones Sociológicas (CIS) (CIS survey no. 2857) to test the relationship between vote and sociodemographical status in Catalonia (n=2,523; sample error = 2%) for the 2006 elections to the Catalan Parliament. My analysis is based upon two simple contingency tables.

[Edit Sep. 1, 2012: the original table in this post has been changed for a new one due to errors. In the old version, all cells were 22.4% higher than they should have been due to an error when running the R code to produce it. Since the change is linear and applies to the whole table, though, the interpretation is unchanged.]

The first table shows (row wise) how votes are distributed within each social class [click on the table for better visualization]. In effect, about 23% of the high/upper middle class votes for the Catalan nationalist center-right (CiU), but another 22% votes for the socialdemocrats (PSC), 12.5% of the high class votes for the far left green party ICV (ICV), and 18% votes for the Catalan left independentists of ERC. The Spanish right nationalists (PP) are voted only by a tiny part of the Catalan high and upper-middle class. So, almost 52.5% of Catalonia’s high and upper-middle class voted for left parties (PSC, ERC, ICV) in the 2006 election.

What the data also show is that, in fact, CiU has a strong position within all the rest of the classes (unqualified workers included). In fact, in the 2010 elections CiU was dominant among all classes, unqualified workers included (which might be due to a lower mobilization of PSC voters from the working class on that election).

Although the relative weight of all the other parties change in each class (e.g., PSC has a higher vote share among qualified and unqualified workers), the relevant point here is that CiU never gets less than 18% of vote share within any class, and it is more dominant among, say, the middle classes than it is among the richer. Of course, PSC is still the dominant party among the working class (with a difference of around 13-15 percentage points in vote share within each group), but the substantial part of workers who vote for CiU should not be neglected.

The following dotplot shows the whole picture of the distribution of each class’ political preferences, e.g., that CiU and PSC are almost equally preferred among high/upper middle and new middle class [click on the figure for better visualization], which counters Álvarez’s statement that PSC has achieved only a very limited penetration in urban (he calls them ‘cosmopolitan’) middle classes. Well almost 37% of them voted for the PSC.

However, Álvarez’s words could be interpreted in different way. He might mean that CiU’s voters come mainly from higher and rural middle classes while PSC’s voters are to be found mainly in the working class and (in a lower degree) urban middle classes. The next table shows the data on the social composition of each party (to be read column wise) [click on it for better visualization].

Again, the data show interesting patterns. Actually, only around 22% of CiU’s voters come from high/upper middle class, and only 17% from rural middle classes. In fact, 40% of its voters come from some kind of middle class, and almost 29% from qualified workers. Combinig the two categories for working class, 37% of CiU’s voters are working class. The social distribution of CiU’s voters, then, is something like 20 (top)-40 (middle)-40 (lower).

Let’s focus on the socialdemocrats (PSC). As the table shows, they too have their share of high class voters (17%, only 5% less than the right-wing CiU). Moreover, the weight of the middle classes among their voters (31%) is smaller than CiU’s, though still substantial, and this party has a larger share of qualified (39%) and unqualified (12%) workers. The overall distribution of PSC’s voters would be, then, 15 (top), 30 (middle), 55 (down).

No extreme differences, then, between the social composition of both parties, which seem to be more or less equally inter-class parties with slight socioeconomic differences among their supporters in the middle and working classes. These differences, on the other hand, may experience some changes in different elections.

What’s more striking of this table, though, is that if we are looking for the parties with higher share of high class voters, those are, first, the far-left, Green ex-communists ICV (44.5% of their voters are high/upper middle class), and the left Catalan independentists ERC (34% of their voters are high/upper middle class).

The plot below compares the social distribution of the voters of each Catalan party compared to CiU voters’ social distribution [click on it for better visualization]. The data show, therefore, that both CiU and PSC have a very similar vote base in terms of socioeconomic position, and have substantial presence among almost all social classes.

Policy positions between both parties differ in substantial issues, so other variables should be used to explain voters’ preferences for one party or the other, such as (parental) national origin, linguistic behavior, and national identification.

Álvarez points out some of these predictors, and I may pursue this in my future work. Yet, the main problem with his view is that such hot issues as national identification and linguistic questions are seen as mere tools used by the high class to perpetuate its dominant position against the ‘true’ preferences of the working class. False conscience… that rings some bells, quite old bells. Moreover, his prejudices show up when he contrasts Catalan vs. cosmopolitan middle classes in Catalonia, thus identifying Catalan people with parochialism and Spanish people with openmindedness. This is an unfounded prejudice.

This whole line of argumentation is still popular among some Spanish and Catalan political observers who are puzzled by the possibility that Catalan middle class and working class of Spanish origin may be starting to consider future Catalan independence as legitimate and desirable political objective. This, however, remains to be tested with further data.

Posted in Uncategorized | Tagged , , , , | 1 Comment

Open knowledge and wealth (part 2)

In our previous post we explored the complex relationship between the Open Knowledge Index and wealth for a sample of countries. The basic linear model looked like this:

OKI = -2.34 + 0.28*Log GDP + error,

which we interpreted that a 10% increase in GDP/capita would yield, on average, a 0.028 increase in the Open Knowledge Index. The observation of the data, though, showed that high increases in the Open Knowledge Index are easier for richer countries than poorer, which raised a couple of questions regarding whether we can identify other factors that help build theories on why richer countries present higher increases in the open knowledge index than poorer ones.

Let’s begin observing the data again. A closer look at the graphic of the previous post allows us to see that apart from a low GDP/capita, most of the countries that present lower values in both GDP/capita and OKI are also highly populated countries. In fact, except Chile, the poorest countries in the sample are among the 20 most populated in the world, while the countries with the highest scores in the open knowledge index (Iceland, Norway, Sweden, Luxembourg, and Estonia) are among the less populated ones.

In order to assess the effect of population size on the relationship between weatlh and open knowledge, we add population size to the original linear model represented in our original figure. The results are as following (standard errors in parentheses):

(Intercept)           -1.13 (0.685)
log(GDP)             0.22 (0.048)
log(population) -0.034 (0.016)

The new model is, therefore:

OKI = -1.13 + 0.22*Log GDP –  0.034*Log Population + error

With the new model, the marginal effect of wealth (GDP/capita) controlling for population size (setting population size to its mean) is represented in the figure below.

Relationship between wealth and OKI, controlling for population size.

The dotted line represents the original relationship between both variables (wealth and OKI), and the blue line, on the other hand, represents this relationship once we take population size into account. This line shows that there still exist a positive relationship between wealth and open knowledge, but the relationship is mitigated once we account for population size.

It seems, though, that the fact that the poorer countries are also more populated could affect our analysis. Let’s recall that the sample used to generate the Open Knowledge Index was made of countries from the OECD plus the BRIC countries: Brazil, Russia, India, and China. The question is: since these countries are clearly biasing the relationship between population and OKI, how different would our model be should we drop the BRICs out of the sample?

The following table shows the results once we drop the BRIC countries from the sample (standard errors in parentheses):

(Intercept)             -1.764 (0.724)
log(GDP)                 0.293 (0.058)
log(population)    -0.040 (0.016)

The new model is as follows:

OKI = -1.18 + 0.29*Log GDP –  0.04*Log Population + error

From the coefficients of the new model we observe that the removal of the BRIC countries did not have a large effect on the model. Once we plot the predicted values of the effect of wealth on the Open Knowledge Index, again keeping population to its mean, we can see that those highly populated countries had a huge effect on the previous model.

Relationship between wealth and Open Knowledge Index, controlling for population (and with BRIC countries removed).

The new model is represented here by the red dotted line, which shows that once we remove the highly populated countries of the sample, the slope of the regression line is basically parallel to the one that we had when we didn’t control for population size.

Below is the R source code to freely reproduce the analyses and graphics of this post:

##################################################
#AUTHOR: JOAN-JOSEP VALLBE
#THEME: OPEN KNOWLEDGE, WEALTH, and POPULATION
#DATA SOURCE: <http://openeconomics.net/open-knowledge-indicator/>
#SOFTWARE: R version 2.15.0 (2012-03-30)
#MACHINE OS: LINUX UBUNTU 12.04
##################################################
#################################
#LOAD THE DATA
#################################

data <- read.csv("data/open_knowledge_indicator_0.1.csv",
                 header=TRUE,
                 sep=",")

ord.gdp <- order(data$GDP)
data <- data[ord.gdp,]

#####################
#RESCALE
####################

oki.res <- scale(data$Open_Knowledge_Index,center=FALSE)

require(plotrix)

oki.res <- rescale(data$Open_Knowledge_Index,c(0,1))

data <- data.frame(data,oki.res)


######################
#LINEAR MODEL
######################
mod.1 <- lm(oki.res~log(GDP),data=data)
mod.2 <- update(mod.1,.~. + log(population))

#Without BRIC countries
mod.1.b <- lm(oki.res~log(GDP),data=data[-c(1,2,3,7),])
mod.2.b <- update(mod.1.b,.~. + log(population))

########################
#GENERATE FITTED VALUES
########################

a <- predict(mod.1,interval="confidence")

b <- predict(mod.2,interval="confidence",
             newdata=within(data,{population <- mean(population)}))

c <- predict(mod.2.b,interval="confidence",
             newdata=within(data[-c(1,2,3,7),],{population <- mean(population)}))


######################################
#PLOTTING THE DATA AND FITTING VALUES
######################################

plot(data$GDP,data$oki.res,
     pch=20,
     type="p",
     col="black",
     log="x",
     xlab="Log GDP per capita (PWT 7.0)",
     ylab="Open Knowledge Index 2009-10 (normalized)")
text(data$GDP[-15],data$oki.res[-15],
     as.character(data$iso[-15]),
     cex=0.6,
     pos=1)
text(data$GDP[15],data$oki.res[15],
     as.character(data$iso[15]),
     cex=0.6,
     pos=2)
lines(data$GDP,a[,1],col="black",lty=2)
lines(data$GDP,b[,1],col="blue")
lines(data$GDP[-c(1,2,3,7)],c[,1],col="red",lty=2)


Posted in Uncategorized | Tagged , , , , | Leave a comment

Open knowledge and wealth (part 1)

Is there a relationship between open knowledge and wealth? Data from the Open Economics group show that there is a positive relationship between their open knowledge index and GDP per capita [see our previous post on this index]. So, richer countries have higher scores in the open knowledge index, as shown in this plot produced using their data.

Relationship between (log) GDP/capita and Open Knowledge Index

In effect, we can see that countries with lower GDP per capita such as India, China, Turkey, Mexico, Brazil, Chile and Russia (bottom-left part of the graphic) present also lower scores in the Open Knowledge Index compared to richer countries like Germany, South Korea, Norway or Luxembourg (upper-right part).

The linear relationship is represented by a regression coefficient of 0.28, which means that, for example, a 10 percent increase in GDP per capita would imply, on average, a 0.028 increase in the open knowledge index. This is not bad at all, considering that the index has been rescaled to have a value range of 0 and 1. Moreover, the model explains 55 percent of the variation between both variables (through the adjusted R-squared).

But, is the relationship between wealth and open knowledge really linear? Does a 10 percent increase in GDP per capita produce always, at all levels of wealth, a 0.028 increase in the open knowledge index? Or rather the open knowledge index will present different levels of increase at different levels of wealth?

A simple and eye-catching way to attack this problem is to fit a local regression model (LOESS) to the data and plot a LOESS curve against the data. I’ve done it using the loess.smooth R function, and the results can be seen in the figure below.

LOESS curve representing the relationship between GDP/capita and OKI

The curve in the figure shows that, indeed, the relationship between wealth and open knowledge is not linear—i.e., that we cannot expect that a similar increase in GDP per capita (say, a 10 percent) will produce the same increase of the open knowledge index, say, for India as for the Netherlands.

Although this analysis should be reproduced using a larger sample of countries, the figure suggests that relatively small increases in GDP per capita really make a difference in the open knowledge index, but only among the richer countries. In contrast, the poorer countries should experience relatively large increases in their GDP per capita in order to have decent increases in the open knowledge index.

Does this mean that the open knowledge question is just a matter of wealth? Do rich countries spend money in open knowledge because they are rich? Is the open knowledge culture some kind of post-materialist phenomenon and therefore just a matter of rich countries that have other needs (health, education, money) covered?

Finally, considering that our simple model using just the GDP per capita explains only 55 percent of the variation, can other factors be found that help explain better the relationship between wealth and open knowledge?

We will try to tackle some of these questions in our next post.

Below is the R code to freely reproduce the analyses and graphics of this post:

##################################################
#AUTHOR: JOAN-JOSEP VALLBE
#THEME: OPEN KNOWLEDGE AND WEALTH
#DATA SOURCE: <http://openeconomics.net/open-knowledge-indicator/>
#SOFTWARE: R version 2.13.2 (2011-09-30)
#MACHINE OS: LINUX UBUNTU 11.10 (Oneiric Ocelot)
##################################################

#################################
#LOAD THE DATA
#################################

data <- read.csv("data/open_knowledge_indicator_0.1.csv",
                 header=TRUE,
                 sep=",")
#This is to order the countries according to GDP per capita
ord.gdp <- order(data$GDP)
data <- data[ord.gdp,]

#####################
#RESCALE the open knowledge index to range [0,1]
####################

oki.res <- scale(data$Open_Knowledge_Index,center=FALSE)

require(plotrix)#You should have this R package installed

oki.res <- rescale(data$Open_Knowledge_Index,c(0,1))

data <- data.frame(data,oki.res)

######################
#LINEAR MODEL
######################
mod.1 <- lm(oki.res~log(GDP),data=data)

########################
#GENERATE FITTED VALUES
########################

a <- predict(mod.1,interval="confidence")

######################################
#PLOTTING THE DATA AND FITTING VALUES (linear model)
######################################

plot(data$GDP,data$oki.res,
     pch=20,
     type="p",
     col="black",
     log="x",
     xlab="Log GDP per capita (PWT 7.0)",
     ylab="Open Knowledge Index 2009-10 (normalized)")
text(data$GDP,data$oki.res,
     as.character(data$iso),
     cex=0.6,
     pos=1)
lines(data$GDP,a[,1],col="black",lty=2)

######################################
#PLOTTING THE LOESS CURVE AGAINST THE DATA
######################################

plot(data$GDP,data$oki.res,
     pch=20,
     type="p",
     col="black",
     log="x",
     xlab="Log GDP per capita (PWT 7.0)",
     ylab="Open Knowledge Index 2009-10 (normalized)")
text(data$GDP,data$oki.res,
     as.character(data$iso),
     cex=0.6,
     pos=1)
lines(loess.smooth(data$GDP,data$oki.res,
                   span=0.75))
Posted in Uncategorized | Tagged , , , , , | Leave a comment

The dimensions of open knowledge

Recently, the OpenEcon working group from the Open Knowledge Foundation released its first Open Knowledge Index, which “has been designed to measure and track progress in opening up information, data and knowledge in a broader sense to the public” (data may be downloaded here).

This is very good news as it is a way for comparing performance among different countries, as in the graph below, and an excellent opportunity to address important theoretical and empirical questions on the role of knowledge and information in democracy and democratization.

Open Knowledge Index score of the countries in the sample.

The Open Knowledge Index is a composite indicator that “captures three dimensions of knowledge” (see technical details here): capability (access to knowledge), legislation (availability of knowledge), and open society (effective use of knowledge and feedback). Each of these dimensions, in turn, is a sub-index created through the combination of different variables.

The authors acknowledge that the indicators are “in an early testing stage”, and I guess they will be refined in the near future. I am unaware of the discussions behind the design process of these indicators, and probably what I’m going to say has already been discussed. I will focus here on the “open society” index, which I think deserves some discussion.

This sub-index, which tries to capture “the capacity to use the data and feed it back into the open data ecosystem”, is created through the combination of three different variables:

Of these variables, I think that the use of the number of Wikipedia edits  to obtain a good measure of the openness of knowledge in society may present a number of potential problems.

A first problem is that, as noted by Jakob Nielsen some years ago, participation inequality (which is a common problem in political science) takes an extreme skewed form in online communities. Hence the well-known “90-9-1 rule”, by which 90% of users don’t contribute content at all, 9% only contribute from time to time, and 1% produce most of the content, producing a “long-tailed” distribution. In Wikipedia, their own data tell us that 82,800 active contributors are working on more than 19,800,000 articles. So, what is a value of this variable really telling us about a society when it is presumably showing a feature of a very small fraction of that society?

Moreover, a second problem is that behavior related to Wikipedia edits suffers also from a clear cultural and geographical bias, represented by the fact that 43% of the active contributors make their contributions in English, and more than 50% of contributions in English come from the United States.

A third problem might be associated to the fact that while the other two variables in this indicator somehow capture the institutional context of the “openness of knowledge ” (especially the World Bank’s governance indicators (AGI)), this third variable represents a behavioral dimension—i.e., how do (a small group of) people actually perform in this context. I’m not sure what does this behavioral component contribute to this indicator.

In conclusion, as a part of an indicator of the openness of knowledge in society, these systematic error components, in my opinion, should be addressed in the discussions on the validity of the measurement—i.e., what is really that we want to measure with this variable.

Below is the R code to freely reproduce the graphic in this post:


######################################
#DOTPLOT OF THE OPEN KNOWLEDGE INDEX
######################################
#LOAD THE DATA
data <- read.csv("data/open_knowledge_indicator_0.1.csv",
                 header=TRUE,
                 sep=",")
#SORT DATA
order.index <- data[,c(1,3)]
order.index <- order.index[order(order.index[,2]),]

#PLOT IT IN A DOTCHART
dotchart(order.index[,2],
         pch=20,
         labels=order.index[,1],
         cex=0.8,
         xlab="Open Knowledge Index")

Posted in Uncategorized | Tagged , , , , | Leave a comment