Centrality

Vertex Centrality

Vertex centrality measures can be used to determine the importance of a vertex in a graph. There are many different vertex centrality measures, each with their own advantages and disadvantages. In a communication network a vertex with high centrality is an actor that is important for the communication in the network, hence they are also often called actor centrality measures. An actor with high centrality can control the flow of information in the network for good or bad. They can also be used to determine key actors in a network, for example in a power grid it is important to know which vertices are key actors, because if they fail, the whole network fails.

Degree Centrality

The degree centrality is the simplest centrality measure. It is simply the number of edges connected to a vertex. The degree centrality is a local measure, because it only takes into account the direct neighbors of a vertex. It can be calculated using the $\text{deg()}$ function. Or alternatively using the $\text{indeg()}$ and $\text{outdeg()}$ depending on whether the graph is directed or not and the use-case.

The degree centrality can be normalized by dividing it by the maximum possible degree in the graph. This is rarely done in practice, because a lot of values will be small, and we are most often interested in the actual degree of a vertex.

The interpretation of the degree centrality is pretty self-explanatory. And is closely related to the prestige of a vertex.

Closeness Centrality

Unlike the degree centrality, the closeness centrality is a global measure, because it takes into account the whole graph, however the consequence of this is that it is more expensive to calculate.

The idea of the closeness centrality is that a vertex is important if it is close to the center of the graph. So a vertex is important if it is close to all other vertices in the graph, i.e. it is close to the center of the graph.

This also means that a vertex can be important even if it only has one edge. As seen by the green vertex in the following graph.

The closeness centrality for a vertex $v$ is calculated by taking the inverse distance of all shortest paths from the vertex $v$ to all other vertices in the graph. This can be interpreted as how efficiently can all the other vertices be reached from $v$ . The formula for the closeness centrality is as follows:

\text{closenessCentrality}(v) = \sum_{u \in V \setminus \{v\}}{d(v,u)^{-1}} = \sum_{u \in V \setminus \{v\}}{\frac{1}{d(v,u)}}

Where $d(v,u)$ is the length of the shortest path from $v$ to $u$ . Let us calculate the closeness centrality for the green vertex in the graph above.

\begin{align*} 1 + \frac{1}{2} + \frac{1}{2} + \frac{1}{3} + \frac{1}{3} + \frac{1}{3} + \frac{1}{3} &= \frac{10}{3} \\ \frac{10}{3} \cdot \frac{1}{8-1} &= \frac{10}{21} \approx 0.476 \end{align*}

To normalize the closeness centrality, it can be divided by $|V| - 1$ .

Info

This gives different values to the formula from wikipedia and networkx. They use the following formula:

\text{closenessCentrality}(v) = \frac{1}{\sum_{u \in V \setminus \{v\}}{d(v,u)}}

and for the normalized closeness centrality:

\text{closenessCentrality}(v) = \frac{|V| - 1}{\sum_{u \in V \setminus \{v\}}{d(v,u)}}

where $d(v,u)$ is the length of the shortest path from $v$ to $u$ .

The issue with the above formula is that if no path exists between $v$ and $u$ then the distance is $\infty$ which would lead to the closeness centrality being $0$ . This could be solved by just using 0 instead of $\infty$ which would lead to the same result as the formula above because 1 divided by $\infty$ is $0$ , i.e. 0 is added to the sum.

Betweenness Centrality

In the above example using the degree centrality we saw that the green ones are the most important. However, we can clearly visually see that the vertex inbetween them is the most important one as it connects the two communities. Because of this we could say that that vertex is in Brokerage position or is a Broker/Gatekeeper of information.

The betweenness centrality is a global measure that takes into account the whole graph and tries to solve the above issue.

The idea of the betweenness centrality is that a vertex is important if a lot of shortest paths go through it, i.e. it is between a lot of vertices.

To calculate the betweenness centrality we need to calculate the number of shortest paths that go through a vertex $v$ . So for every pair of vertices $u$ and $w$ we need to calculate the shortest paths and then count how many of them go through $v$ . The formula for the betweenness centrality is as follows:

\text{betweennessCentrality}(v) = \sum_{u \neq v \neq w}{\frac{\sigma_{uw}(v)}{\sigma_{uw}}}

Where $\sigma_{uw}$ is the number of shortest paths from $u$ to $w$ and $\sigma_{uw}(v)$ is the number of shortest paths from $u$ to $w$ that go through $v$ .

Info

The fraction in the formula leads to the weight being split if there are multiple shortest paths between $u$ and $w$ .

Because the calculations for the betweenness centrality are quite complex and take a while to calculate, we will use a smaller graph to calculate the betweenness centrality.

Todo

make this more algorithmic and use the pictures from the script.

We start with all betweenness centralities being $0$ . We start with the first vertex on the left and mark it green.

We then calculate the shortest path to the next one in a BFS manner. The vertex to the right is the next one so we mark it green as the target vertex. Because it is directly connected to the other green one nothing changes. Now that we have visited it we mark it gray.

We take the next vertex, the one above and mark it green. We then calculate the shortest path between the two green vertices. There is only one shortest path going over the previously visited gray vertex. So we add $1$ to that gray vertexes betweenness centrality.

We continue this process until we have visited all vertices once. We then mark the initial vertex on the left as red. All shortest paths that start at this vertex have been calculated. We then pick a new start vertex in a BFS manner. Repeat the process until all shortest paths have been calculated.

To normalize the betweenness centrality, you devide by the centrality by following:

For an undirected graph: $\frac{(n-1)(n-2)}{2}$
For a directed graph: $(n-1)(n-2)$

The Image below summarizes all the centrality measures we have seen so far and compares the most central vertices.

Eigenvector Centrality

Before I start explaining the eigenvector centrality, I describe what an eigenvector is. An eigenvector is a vector that does not change its direction when multiplied by a square matrix, only its magnitude changes, i.e. it is only scaled. Because a matrix can have multiple eigenvectors, the solution is to allow for only eigenvectors with a magnitude of 1, i.e. $||\boldsymbol{v}||_2 = 1$ , i.e. the normalized eigenvector. The scaling factor is then called the eigenvalue, denoted by $\lambda$ . The formula for the eigenvector is as follows:

\boldsymbol{Av}=\lambda \boldsymbol{v}

The eigenvector centrality is the eigenvector corresponding to the largest eigenvalue of the adjacency matrix of the graph. The eigenvector corresponding to the largest eigenvalue is also commonly called the dominant eigenvalue/vector. This can just be calculated but is most often calculated using the power iteration method.

The eigenvector centrality is an interesting centrality measure.

The idea is that a node is important if its neighbors are important.

What makes a vertex important could be any attribute of the vertex, for example if we have a network of people, their salary. However, the simplest and most commonly used approach is to use the degree centrality as the importance measure. In an undirected graph most commonly the in-degree centrality.

To show the idea that the eigenvector centrality is based on the importance of the neighbors, I will use the following graph and calculate the eigenvector centrality using the degree centrality as the importance measure with the power iteration method.

The adjacency matrix of the graph, mirrored because it is undirected.

The initial importance measure, the degree of each vertex.

Power Iteration Method

The power iteration method is a simple iterative method to calculate the eigenvector corresponding to the largest eigenvalue.

The idea is to start with an initial vector $\boldsymbol{b_0}$ and then multiply it with the adjacency matrix $\boldsymbol{A}$ . Then we normalize the resulting vector $\boldsymbol{b_1}$ and repeat the process until the vector converges. Most often to check for convergence we calculate the difference between the two vectors and check if it is smaller than a threshold.

\boldsymbol{b_{i+1}} = \frac{\boldsymbol{Ab_i}}{||\boldsymbol{Ab_i}||_2}

Info

The initial vector $b_0$ in the power iteration method is the importance measure, in this case the degree centrality. However, the initial vector can be any non-zero vector and the method will still converge to the same eigenvector. You could interpret this as the eigenvector centrality being the "true underlying importance" of the vertices.

Katz Centrality

PageRank

Todo

Do this

Prestige

In a directed Graph it is possible to analyze the prestige of a vertex, i.e the stature or reputation associated with a vertex. The vertices relationships however need to resemble this. For example, if a person has a lot of followers but doesn't follow a lot of people, then that person has a high prestige and stature, for example a celebrity.

Popularity

The simplest way to measure prestige is to count the number of incoming edges, i.e using the $\text{indeg()}$ function. This is called popularity.

Proximity Prestige

The proximity prestige measure does not just account for the number of directly incoming edges, but also the number of indirectly incoming edges, i.e. the number of paths that lead to the vertex. However, the longer the path, the lower prestige from that path is weighted.

Simply put the proximity prestige is the sum of all paths that lead to the vertex weighted by the length of the path.

The formula for the proximity prestige can be summarized pretty simply:

The proximity prestige of a vertex is the number of vertices that have a path to the vertex divided by the average shortest path length leading to the vertex.

More formally:

\text{proximityPrestige}(v) = \frac{\frac{|I|}{n-1}}{\frac{\sum_{i \in I}{d(i,v)}}{|I|}}

Where $I$ is the set of all vertices that have a path to $v$ and $d(u,v)$ is the length of the shortest path from $u$ to $v$ .

Example

The input graph for the proximity prestige.

The resulting proximity prestige for each vertex.

\begin{align*} \text{proximityPrestige}(2) &= \frac{\frac{1}{(8-1)}}{\frac{1}{1}} = 0.14 \\ \text{proximityPrestige}(4) &= \frac{\frac{2}{(8-1)}}{\frac{2}{2}} = 0.29 \\ \text{proximityPrestige}(6) &= \frac{\frac{7}{(8-1)}}{\frac{10}{7}} = 0.7 \\ \end{align*}

Group Centrality

The goal of group centrality measures is to determine the importance of a group of vertices in a graph. These measures are based on the vertex centrality measures, but they are more complex and expensive to calculate.

Degree Group Centrality

The degree group centrality is the simplest group centrality measure. It is simply the fraction of the number of vertices outside the group that are directly connected to the group. So in the following graph with the group $G$ being defined as $G={v_6,v_7,v_8}$ the degree group centrality would be $\frac{3}{10}$ so $0.3$ .

Closeness Group Centrality

The closeness group centrality measures how close the group is to the other vertices in the graph. It is calculted by adding up all inverse distances from the vertices outside the group to the closest vertex in the group. So in the same graph and group $G={v_6,v_7,v_8}$ as above the closeness group centrality would be:

1+1+1+\frac{1}{2}+\frac{1}{2}+\frac{1}{2}+\frac{1}{2}+\frac{1}{2}+\frac{1}{2}+\frac{1}{3} = 6.333

It can be simply normalized by dividing it by the number of vertices outside the group, which would lead to $0.6333.$

Betweenness Group Centrality

The betweenness group centrality measures how many shortest paths go through the group. It is calculated by counting how many shortest paths between all the vertices outside the group go through the group.

Example

If we define our group to contain the vertices $C,E$ from the graph below we can calculate the betweenness group centrality simply by calculating all the shortest paths between the vertices outside the group and counting how many of them go through the group.

Our graph where the group we are inspecting contains the vertices C and E.

We have the following shortest paths between the vertices outside the group:

$A \rightarrow B$
$A \rightarrow C \rightarrow D$ goes through the group via $C$ .
$A \rightarrow C \rightarrow D \rightarrow E \rightarrow G$ goes through the group via $C$ and $E$ .
$A \rightarrow C \rightarrow D \rightarrow E \rightarrow F$ goes through the group via $C$ and $E$ .
$B \rightarrow C \rightarrow D$ , goes through the group via $C$ .
$B \rightarrow C \rightarrow D \rightarrow E \rightarrow F$ goes through the group via $C$ and $E$ .
$B \rightarrow C \rightarrow D \rightarrow E \rightarrow G$ goes through the group via $C$ and $E$ .
$D \rightarrow E \rightarrow G$ goes through the group via $E$ .
$D \rightarrow E \rightarrow F$ goes through the group via $E$ .
$F \rightarrow G$

Therefore 8 of the 10 shortest paths go through the group, so the betweenness group centrality is $\frac{8}{10} = 0.8$ .

Network Centrality

The idea of network centrality is to measure the centrality of the entire network, i.e. to compare the difference in centrality between the vertices in the network. The goal is then to show how different the key vertices are from the rest of the network.

General Network Centrality

To calculate the network centrality the vertex centrality measures are used. For this Linton Freeman defined a general formula that returns a value between $0$ and $1$ with the following meanings:

$0$ means that all vertices have the same centrality, i.e. the network is a ring network.
$1$ means that one vertex has all the centrality, i.e. the network is a star network.

Star Network

Ring Network

The formula is as follows:

\text{networkCentrality}(G) = \frac{\sum_{v \in V}{C_{max} - C(v)}}{Star_n}

Where:

$C(v)$ is the centrality function for a vertex $v$ .
$C_{max}$ is the maximum centrality of all vertices in the graph, i.e $C_{max}= \argmax_{v \in V}{C(v)}$ .
The denominator $Star_n$ is the maximal sum of differences between the centrality of a vertex and the maximum centrality of all vertices in the graph, i.e. if the graph was a star graph with the same amount of vertices as the graph $G$ , so $n=|V|$ (Is this always the case, no matter the centrality measure?).

With the definition above it is now logical why the value is $1$ when the graph is a star graph because the numerator and denominator are the same. Whereas if the graph is a ring graph, i.e. all vertices have the same centrality, then the sum of differences in the numerator is $0$ and the denominator is the maximum sum of differences, which leads to the value being $0$ .

Warning

Depending on the definition of the general formula the Sum in the nominator skips the vertex with the maximum centrality since the difference would be $0$ . I find the definition above more intuitive, but it is important to know that there are different definitions.

Degree Network Centrality

For the degree network centrality the denominator is pretty simple, because for a star graph the key vertex will have a degree of $n-1$ and the other vertices will have a degree of $1$ . So the denominator is simply $(n-1)(n-2)$ for an undirected Graph, if it is a directed Graph then the nominator can just be doubled.

If you are working with the normalized degree centrality, then the denominator can be even further simplified to just $n-2$ .

Closeness Network Centrality

When using the normalized closeness centrality, the denominator is simply $\frac{n-2}{2}$ . I will save you the details just trust me bro.

Betweenness Network Centrality

When using the normalized betweenness centrality, the denominator is simply $n-1$ , just like with the degree centrality.

Network Reduction Communities