Wednesday, November 26, 2025

HITS for Authority and Hub identification

The HITS (Hyperlink-Induced Topic Search) algorithm makes use of the mutual reinforcing relationship between authorities and hubs to evaluate and rank a set of linked entities. It assigns ranking scores to the vertices, aimed to assess the quality of information and references in linked structures.  Instructively, a node with large in-degree is viewed as an authority. If a node points to a considerable number of authoritative nodes, it is referred to as a hub.

 We will use the following Social Network to demonstrate the HITS algorithm

Here we have State official, citizens and journalists in a State. 

As illustrated in the graph above, State Officials represent good authorities, while Journalists represent good hubs. Observe that Hubs and Authorities exhibit a mutually reinforcing relationship: a good hub points to many good authorities; a good authority is pointed to by many good hubs.


Let's start by creating sample data:



create table users (username varchar(20));
create table follows (follower varchar(20), followee varchar(20));

insert into users values 
  ('governor')
  , ('mayor')
  , ('senator')
  , ('journalist_1')
  , ('journalist_2')
  , ('journalist_3')
  , ('g')
  , ('h')
  , ('i')
  , ('j')
  , ('k')
  , ('l')
  , ('m')
  , ('n')
  , ('o')
  , ('p')
  , ('q')
  , ('r')
  , ('s')
  , ('t')
  , ('u')
  , ('v')
  , ('w')
  , ('x')
  , ('y')
  , ('z');


insert into follows values 
  ('z', 'governor')
  , ('y', 'governor')  
  , ('x', 'governor')  
  , ('w', 'governor')    
  , ('v', 'governor')  
  , ('journalist_1', 'governor')  
  , ('journalist_2', 'governor')  

  , ('u', 'mayor')  
  , ('t', 'mayor')  
  , ('s', 'mayor')        
  , ('r', 'mayor')
  , ('q', 'mayor')
  , ('journalist_1', 'mayor')        
  , ('journalist_2', 'mayor')        
  , ('journalist_3', 'mayor')        

  , ('p', 'senator')        
  , ('o', 'senator')
  , ('n', 'senator')
  , ('m', 'senator')
  , ('l', 'senator')
  , ('journalist_2', 'senator')
  , ('journalist_3', 'senator');                 


Next create a PROPERTY GRAPH to represet the Following relationship in this Social Graph

CREATE OR REPLACE PROPERTY GRAPH social_network
  VERTEX TABLES (
    users KEY (username) PROPERTIES ARE ALL COLUMNS
  )
  EDGE TABLES ( 
    follows
      key (follower, followee) 
      SOURCE KEY (follower) REFERENCES users(username)
      DESTINATION KEY(followee) REFERENCES users(username)
      PROPERTIES ARE ALL COLUMNS
    
);
We can query this PROPERTY GRAPH using the following SQL/PGQ

SELECT distinct 
   follower
   , followee
   , v1
   , v2
   , e1
FROM GRAPH_TABLE (
       social_network
       MATCH   (follower  IS users) - [e is follows] -> (followee is users)
       COLUMNS (
               follower.username  as follower
               , followee.username  as followee
               , vertex_id(follower) as v1
               , vertex_id(followee) as v2
               , edge_id(e) as e1

        )
);
Next we will run the HITS algorithm on this PROPERTY GRAPH

%python-pgx
graph = session.read_graph_by_name("SOCIAL_NETWORK", "pg_sql")

hits = analyst.hits(graph, auth='authorities', hubs='hubs')

result_set = graph.query_pgql(

    "SELECT x.USERNAME, x.authorities, x.hubs "

    "MATCH (x) ORDER BY x.authorities DESC, x.hubs DESC")

result_set.print()


+---------------------------------------------------------+
| x.USERNAME   | x.authorities      | x.hubs              |
+---------------------------------------------------------+
| mayor        | 0.7071067811865475 | 0.0                 |
| senator      | 0.5000000000000001 | 0.0                 |
| governor     | 0.5000000000000001 | 0.0                 |
| journalist_2 | 0.0                | 0.5187737569752051  |
| journalist_1 | 0.0                | 0.3668284414587895  |
| journalist_3 | 0.0                | 0.3668284414587895  |
| u            | 0.0                | 0.21488312594237396 |
| t            | 0.0                | 0.21488312594237396 |
| s            | 0.0                | 0.21488312594237396 |
| r            | 0.0                | 0.21488312594237396 |
| q            | 0.0                | 0.21488312594237396 |
| l            | 0.0                | 0.1519453155164156  |
| m            | 0.0                | 0.1519453155164156  |
| n            | 0.0                | 0.1519453155164156  |
| o            | 0.0                | 0.1519453155164156  |
| p            | 0.0                | 0.1519453155164156  |
| v            | 0.0                | 0.1519453155164156  |
| w            | 0.0                | 0.1519453155164156  |
| x            | 0.0                | 0.1519453155164156  |
| y            | 0.0                | 0.1519453155164156  |
| z            | 0.0                | 0.1519453155164156  |
| k            | 0.0                | 0.0                 |
| j            | 0.0                | 0.0                 |
| i            | 0.0                | 0.0                 |
| h            | 0.0                | 0.0                 |
| g            | 0.0                | 0.0                 |
+---------------------------------------------------------+
 
Note: Nodes with no in-links are assigned an authority weight of 0, while nodes with no out-links are assigned a hub weight of 0. State Officials get Authority Rankings because many Hubs point to them. Where Journalist get high Hub Ranking as they point to many Hubs.

Eigenvector Centrality Algorithm

Eigenvector centrality is an established measure of global connectivity, from which the importance and influence of nodes can be inferred. Eigenvector centrality quantifies a node's influence within a graph. A node's importance is determined by its neighbors—it is both influenced by them and exerts influence on them. However, not all connections are equal; a node's centrality increases if it is connected to other highly influential nodes. In short, Eigenvector centrality gets the centrality of the vertices in an intricate way using neighbors, allowing to find well-connected vertices.

  


 Let's start by creating some sample data:

 


    create table website (website_url varchar(50));
    create table links(link_from varchar(50), link_to varchar(50), weight float);

    insert into website values ('web1')
    , ('web2')
    , ('web3')
    , ('web4')
    , ('web5')
    , ('web6');

    insert into links values ('web1', 'web1', .2),
        ('web1', 'web2', .3),
        ('web2', 'web3', .4),
        ('web3', 'web1', .4),
        ('web3', 'web2', .4),
        ('web3', 'web4', .4),
        ('web3', 'web5', .4),
        ('web5', 'web3', .4),
        ('web6', 'web6', .4);

Next create a PROPERTY GRAPH to represent Web Graph


CREATE OR REPLACE PROPERTY GRAPH web_graph
  VERTEX TABLES (
    website KEY (website_url) PROPERTIES ARE ALL COLUMNS
  )
  EDGE TABLES ( 
    links
      key (link_from, link_to) 
      SOURCE KEY (link_from) REFERENCES website(website_url)
      DESTINATION KEY(link_to) REFERENCES website(website_url)
      PROPERTIES ARE ALL COLUMNS
    
);
You can use the following SQL/PGQ to query the above PROPERTY GRAPH

SELECT distinct 
   website_a
   , website_b
   , v1
   , v2
   , e1
FROM GRAPH_TABLE (
       web_graph
       MATCH   (link_from  IS website) - [e is links] -> (link_to is website)
       COLUMNS (
               link_from.website_url  as website_a
               , link_to.website_url  as website_b
               --, e_path.relationship as e_path_relationship
               , vertex_id(link_from) as v1
               , vertex_id(link_to) as v2
               , edge_id(e) as e1

        )
);
Finally run eigenvector_centrality algorithm on the above PROPERTY GRAPH

%python-pgx


print(session)

# Write your codeprint(session)
graph = session.read_graph_by_name("WEB_GRAPH", "pg_sql")
print(graph)
analyst.pagerank(graph)
analyst.degree_centrality(graph)
analyst.in_degree_centrality(graph)
analyst.out_degree_centrality(graph)
analyst.vertex_betweenness_centrality(graph)
analyst.local_clustering_coefficient(graph)
analyst.eigenvector_centrality(graph)
analyst.communities_label_propagation(graph, 100, 'label_propagation')

print(cycle)
graph.query_pgql("""
SELECT 
  a.website_url
  , a.eigenvector
  , a.label_propagation
FROM MATCH (a)
""").print()

Output:

PgxSession(id: 5ef30734-4ebb-4f22-87fe-457c3e5ccc84, name: ADMIN)
PgxGraph(name: WEB_GRAPH, v: 6, e: 9, directed: True, memory(Mb): 0)
PgxPath(graph: WEB_GRAPH, src: PgxVertex[provider=WEBSITE,key=web1,ID=WEBSITE(web1)], dst: PgxVertex[provider=WEBSITE,key=web1,ID=WEBSITE(web1)], num. edges: 1 cost: 1.0)
+---------------------------------------------------------+
| website_url | eigenvector           | label_propagation |
+---------------------------------------------------------+
| web1        | 0.24693143869692635   | 0                 |
| web2        | 0.19814795883595723   | 1                 |
| web3        | 0.3567709398214408    | 1                 |
| web4        | 0.0                   | 1                 |
| web5        | 0.19814795883595723   | 1                 |
| web6        | 1.7038097185306344E-6 | 2                 |
+---------------------------------------------------------+

Tuesday, November 25, 2025

Louvain for Community detection

Louvain algorithm is a widely used and and well-regarded method for community detection in graphs. The algorithm begins with a singleton partition, where each node belongs to its own community. It then proceeds iteratively through multiple passes, each consisting of two distinct phases.

We will use the Louvain Algorithm in Oracle Graph Server to identify communities in the following Mentoring relationship graph

 

Let's start by generating some sample data:

 

 



create table users (username varchar(10));
create table mentors (mentor varchar(10), mentee varchar(10), weight float);

insert into users values 
  ('a')
  , ('b')
  , ('c')
  , ('d')
  , ('e')
  , ('f')
  , ('g')
  , ('h')
  , ('i')
  , ('j')
  , ('k')
  , ('l')
  , ('m')
  , ('n')

  ;


insert into mentors values 
  ('a', 'b', 0.3)
  , ('a', 'c', 0.4)  
  , ('a', 'd', 0.1)  
  , ('a', 'e', 0.4)    
  , ('b', 'g', 0.5)  
  , ('g', 'f', 0.2)  
  , ('f', 'a', 0.5)  
  , ('f', 'h', 0.2)  
  , ('f', 'i', 0.1)  
  , ('f', 'j', 0.5)        
  , ('f', 'k', 0.9)
  , ('k', 'a', 0.1)        
  , ('k', 'm', 0.2)        
  , ('k', 'n', 0.6)
  , ('k', 'l', 0.3);                 


Next create a PROPERTY GRAPH to reflect the Mentor -> Mentee relationships
CREATE OR REPLACE PROPERTY GRAPH social_network
  VERTEX TABLES (
    users KEY (username) PROPERTIES ARE ALL COLUMNS
  )
  EDGE TABLES ( 
    mentors
      key (mentor, mentee) 
      SOURCE KEY (mentor) REFERENCES users(username)
      DESTINATION KEY(mentee) REFERENCES users(username)
      PROPERTIES ARE ALL COLUMNS
    
);

Now let's run the Louvain algorithm on this Property Graph 

%python-pgx

# Write your code here - e.g.: graph = session.read_graph_with_prgraph = ....
graph = session.read_graph_by_name("SOCIAL_NETWORK", "pg_sql")

weight = graph.get_edge_property(name="WEIGHT")

print(weight)

partition = analyst.louvain(graph, weight,  max_iter=100, nbr_pass=3, tol=0.0001, community='community')

result_set = graph.query_pgql(

    "SELECT x.USERNAME, x.community MATCH (x) ORDER BY x.community DESC")

result_set.print()
print(partition)

session.close()

Output:

EdgeProperty(name: WEIGHT, type: double, graph: SOCIAL_NETWORK)
+--------------------------+
| x.USERNAME | x.community |
+--------------------------+
| i          | 10          |
| j          | 10          |
| k          | 10          |
| l          | 10          |
| m          | 10          |
| n          | 10          |
| f          | 10          |
| h          | 10          |
| a          | 4           |
| c          | 4           |
| d          | 4           |
| e          | 4           |
| b          | 1           |
| g          | 1           |
+--------------------------+

PgxPartition(graph: SOCIAL_NETWORK, components: 14)
 
 
 

Wednesday, November 12, 2025

Degree Centrality in Oracle Graph (SQL/PGQ)

Betweenness centrality measures the likelihood of a node being on the shortest paths between any two other nodes. This metric effectively identifies "bridge" nodes that facilitate connectivity between different parts of a graph. This algorithm assigns a score to each node- higher scores indicating nodes that exert greater influence over the flow and connectivity of the network.

 In this blogpost we will show examples of running the Betweenness Centrality algorithm on a concrete graph. The intention is to illustrate what the results look like and to provide a guide in how to make use of the algorithm in a real setting. We will do this on a small social network graph of a handful nodes connected in a particular pattern. The example graph looks like this:


 

We will run the Betweenness Centrality algorithm on this node and compute the Node. Can you guess which node will score the highest. Yes, in this case it is easy to visually inspect and guestimate. However in a large graph this will not be possible without the use of some algorithm.

Let's start by generating some sample data:

 



create table users (user_id int, user_name varchar(256));
create table knows (user_id_from int, user_id_to int, relationship varchar(256));

insert into users values (1, 'Fatima')
  , (2, 'Uroosa')
  , (3, 'Hannah')
  , (4, 'Maryam')
  , (5, 'Zainab')
  , (6, 'Urwa')
  , (7, 'Zoha');


insert into knows values (1, 2, 'knows')
  , (1, 5, 'knows')
  , (3, 1, 'knows')
  , (4, 3, 'knows')
  , (4, 6, 'knows')
  , (6, 5, 'knows')
;


Next create a Property Graph to define the relationships in the Social Network


CREATE OR REPLACE PROPERTY GRAPH social_graph
  VERTEX TABLES (
    users KEY (user_id) PROPERTIES ARE ALL COLUMNS
  )
  EDGE TABLES ( 
    knows
      key (user_id_from, user_id_to) 
      SOURCE KEY (user_id_from) REFERENCES users(user_id)
      DESTINATION KEY(user_id_to) REFERENCES users(user_id)
      PROPERTIES ARE ALL COLUMNS
    
);


We can query the the Property Graph as follows:



SELECT distinct 
   *
FROM GRAPH_TABLE (
       social_graph
       MATCH   (user_id_from  IS users) - [e_path is knows] -> (user_id_to is users)
       COLUMNS (
               user_id_from.user_name  as user_name_a
               , user_id_to.user_name  as user_name_b
               , vertex_id(user_id_from) as v1
               , vertex_id(user_id_to) as v2
               , edge_id(e_path) as e1

        )
) --as recommendations
;



   Next let's run the Betweenness Centrality on this Network Graph


print(session)
graph = session.read_graph_by_name("SOCIAL_GRAPH", "pg_sql")
print(graph)
analyst.vertex_betweenness_centrality(graph)
graph.query_pgql("""
SELECT 
  a.user_name
  , a.betwenness
FROM MATCH (a)
""").print()


+-------------------------+
| user_name | betweenness |
+-------------------------+
| Fatima    | 3.0         |
| Uroosa    | 0.0         |
| Hannah    | 2.0         |
| Maryam    | 0.0         |
| Zainab    | 0.0         |
| Urwa      | 1.0         |
| Zoha      | 0.0         |
+-------------------------+
 

We note that the 'Fatima' node has the highest score, followed by the 'Hannah' node. Studying the example graph we can see that these nodes are in bottleneck positions in the graph. The 'Fatima' node connects the 'Uroosa' nodes to all other nodes, which increases its score. In particular, the shortest path from 'Uroosa' to any other reachable node passes through 'Fatima'. 

Conversely, there are no shortest paths that pass through either of the nodes 'Uroosa', 'Maryam', or 'Zainab' which causes their betweenness centrality score to be zero.