upgrade
upgrade

๐Ÿ•ธ๏ธNetworked Life

Key Social Network Analysis Metrics

Study smarter with Fiveable

Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.

Get Started

Why This Matters

Understanding network metrics isn't just about memorizing formulasโ€”it's about grasping why certain nodes become powerful, how information spreads, and what makes some networks resilient while others fragment. You're being tested on your ability to identify which metric answers which question: Who's the most connected? Who controls information flow? How tightly clustered is a community? These distinctions matter because they reveal fundamentally different types of influence and network behavior.

The metrics in this guide fall into distinct conceptual categories: centrality measures (who matters and why), structural properties (how the network is organized), and efficiency measures (how well information travels). Don't just memorize that betweenness centrality involves shortest pathsโ€”know that it identifies brokers who can make or break communication across groups. When you can explain what each metric reveals about network dynamics, you're thinking like a network scientist.


Centrality Measures: Who Holds Power?

Centrality metrics answer the fundamental question: which nodes are most important? But "important" means different things depending on context. A node can be central because it has many friends, because it bridges communities, or because its friends are themselves influential. Understanding these distinctions is essential.

Degree Centrality

  • Counts direct connectionsโ€”the simplest measure of how "popular" or connected a node is in the network
  • Hub identification relies on high degree centrality; these nodes are often first adopters who can spark viral spread
  • Limitation: treats all connections equally, missing whether those connections are themselves influential or peripheral

Betweenness Centrality

  • Measures brokerage powerโ€”how often a node sits on the shortest path between other pairs of nodes
  • Bridge nodes with high betweenness control information flow; removing them can disconnect communities entirely
  • Calculated as CB(v)=โˆ‘sโ‰ vโ‰ tฯƒst(v)ฯƒstC_B(v) = \sum_{s \neq v \neq t} \frac{\sigma_{st}(v)}{\sigma_{st}} where ฯƒst\sigma_{st} is the total shortest paths between ss and tt, and ฯƒst(v)\sigma_{st}(v) counts those passing through vv

Closeness Centrality

  • Measures reachabilityโ€”the inverse of a node's average distance to all other nodes in the network
  • Low average distance means information originating from this node spreads quickly; ideal for broadcasting or rapid dissemination
  • Expressed as CC(v)=nโˆ’1โˆ‘uโ‰ vd(v,u)C_C(v) = \frac{n-1}{\sum_{u \neq v} d(v,u)} where d(v,u)d(v,u) is the shortest path length between nodes

Eigenvector Centrality

  • Quality over quantityโ€”a node's importance depends on being connected to other important nodes, not just having many connections
  • Recursive definition means your centrality score is proportional to the sum of your neighbors' scores
  • Captures elite networks where being connected to the "right" people matters more than knowing many people

Compare: Degree vs. Eigenvector Centralityโ€”both count connections, but eigenvector weights them by neighbor importance. A node with few connections to highly central nodes can outrank a node with many peripheral connections. If asked to identify "influential" nodes, clarify which type of influence the question targets.


Algorithmic Centrality: Computational Approaches

Some centrality measures emerged from computational problems rather than pure sociology. PageRank famously powered Google's search engine by treating the web as a network where link structure reveals importance.

PageRank

  • Random surfer modelโ€”imagines someone clicking links randomly; PageRank is the probability of landing on each node at equilibrium
  • Damping factor (typically d=0.85d = 0.85) accounts for users jumping to random pages rather than always following links
  • Beyond web search, PageRank identifies influential accounts in social networks, important papers in citation networks, and key species in food webs

Compare: PageRank vs. Eigenvector Centralityโ€”both consider connection quality, but PageRank adds the damping factor and handles directed networks naturally. PageRank also divides a node's influence among its outgoing links, so linking to many pages dilutes your "vote."


Local Structure: Clustering and Community

Not all network properties focus on individual nodes. Clustering and modularity reveal how nodes organize into groupsโ€”essential for understanding social cohesion, echo chambers, and community dynamics.

Clustering Coefficient

  • Measures triangle densityโ€”the fraction of a node's neighbors that are also connected to each other
  • High clustering indicates tightly-knit groups where "friends of friends are friends"; common in social networks but rare in random graphs
  • Local vs. global: individual nodes have clustering coefficients; the network average reveals overall tendency toward clique formation

Modularity

  • Quantifies community structureโ€”compares actual within-group connections to what you'd expect in a random network with the same degree sequence
  • Modularity score QQ ranges from โˆ’0.5-0.5 to 11; values above 0.30.3 typically indicate significant community structure
  • Community detection algorithms optimize modularity to partition networks into meaningful groups

Compare: Clustering Coefficient vs. Modularityโ€”clustering measures local "cliquishness" around individual nodes, while modularity assesses global division into distinct communities. A network can have high clustering but low modularity if triangles don't organize into separable groups.


Global Properties: Network-Wide Measures

These metrics describe the network as a whole rather than individual nodes. They answer questions about overall connectivity, efficiency, and structure.

Network Density

  • Proportion of possible edges that existโ€”calculated as 2mn(nโˆ’1)\frac{2m}{n(n-1)} for undirected networks with mm edges and nn nodes
  • Dense networks facilitate rapid information spread but require more maintenance; sparse networks are cheaper but may fragment easily
  • Real social networks are typically sparse; even Facebook's friendship network has density far below 1%

Average Path Length

  • Mean shortest distance between all pairs of reachable nodes in the network
  • Small-world property emerges when average path length is low despite high clusteringโ€”the "six degrees of separation" phenomenon
  • Efficiency indicator: lower values mean faster information diffusion and easier coordination across the network

Diameter

  • Maximum shortest pathโ€”the longest distance between any two connected nodes in the network
  • Worst-case scenario for information travel; reveals how "stretched out" a network can be
  • Sensitive to outliers: a single long chain of nodes can dramatically increase diameter without affecting most node pairs

Compare: Average Path Length vs. Diameterโ€”average path length gives typical separation, while diameter captures the extreme. A network with low average path length but high diameter has most nodes close together but some isolated chains. Both matter for understanding information spread dynamics.


Quick Reference Table

ConceptBest Examples
Direct connectivityDegree Centrality, Network Density
Brokerage and controlBetweenness Centrality
Reach and efficiencyCloseness Centrality, Average Path Length
Influence through connectionsEigenvector Centrality, PageRank
Local clusteringClustering Coefficient
Community structureModularity
Network extremesDiameter

Self-Check Questions

  1. A node has relatively few connections but extremely high betweenness centrality. What role does this node likely play in the network, and why might removing it be particularly damaging?

  2. Compare degree centrality and eigenvector centrality: under what circumstances would these two metrics rank the same node very differently?

  3. A social network has high clustering coefficient but low modularity. What does this suggest about its structure? Describe what such a network might look like.

  4. You're analyzing a network and find that average path length is very low (around 4) while diameter is very high (around 20). What does this tell you about the network's topology?

  5. If you needed to identify the best node for rapidly spreading a message to the entire network, which centrality measure would you prioritize and why? How would your answer change if you instead wanted to identify nodes whose removal would most disrupt communication?