Wednesday, November 26, 2025

HITS for Authority and Hub identification

The HITS (Hyperlink-Induced Topic Search) algorithm makes use of the mutual reinforcing relationship between authorities and hubs to evaluate and rank a set of linked entities. It assigns ranking scores to the vertices, aimed to assess the quality of information and references in linked structures.  Instructively, a node with large in-degree is viewed as an authority. If a node points to a considerable number of authoritative nodes, it is referred to as a hub.

 We will use the following Social Network to demonstrate the HITS algorithm

Here we have State official, citizens and journalists in a State. 

As illustrated in the graph above, State Officials represent good authorities, while Journalists represent good hubs. Observe that Hubs and Authorities exhibit a mutually reinforcing relationship: a good hub points to many good authorities; a good authority is pointed to by many good hubs.


Let's start by creating sample data:



create table users (username varchar(20));
create table follows (follower varchar(20), followee varchar(20));

insert into users values 
  ('governor')
  , ('mayor')
  , ('senator')
  , ('journalist_1')
  , ('journalist_2')
  , ('journalist_3')
  , ('g')
  , ('h')
  , ('i')
  , ('j')
  , ('k')
  , ('l')
  , ('m')
  , ('n')
  , ('o')
  , ('p')
  , ('q')
  , ('r')
  , ('s')
  , ('t')
  , ('u')
  , ('v')
  , ('w')
  , ('x')
  , ('y')
  , ('z');


insert into follows values 
  ('z', 'governor')
  , ('y', 'governor')  
  , ('x', 'governor')  
  , ('w', 'governor')    
  , ('v', 'governor')  
  , ('journalist_1', 'governor')  
  , ('journalist_2', 'governor')  

  , ('u', 'mayor')  
  , ('t', 'mayor')  
  , ('s', 'mayor')        
  , ('r', 'mayor')
  , ('q', 'mayor')
  , ('journalist_1', 'mayor')        
  , ('journalist_2', 'mayor')        
  , ('journalist_3', 'mayor')        

  , ('p', 'senator')        
  , ('o', 'senator')
  , ('n', 'senator')
  , ('m', 'senator')
  , ('l', 'senator')
  , ('journalist_2', 'senator')
  , ('journalist_3', 'senator');                 


Next create a PROPERTY GRAPH to represet the Following relationship in this Social Graph

CREATE OR REPLACE PROPERTY GRAPH social_network
  VERTEX TABLES (
    users KEY (username) PROPERTIES ARE ALL COLUMNS
  )
  EDGE TABLES ( 
    follows
      key (follower, followee) 
      SOURCE KEY (follower) REFERENCES users(username)
      DESTINATION KEY(followee) REFERENCES users(username)
      PROPERTIES ARE ALL COLUMNS
    
);
We can query this PROPERTY GRAPH using the following SQL/PGQ

SELECT distinct 
   follower
   , followee
   , v1
   , v2
   , e1
FROM GRAPH_TABLE (
       social_network
       MATCH   (follower  IS users) - [e is follows] -> (followee is users)
       COLUMNS (
               follower.username  as follower
               , followee.username  as followee
               , vertex_id(follower) as v1
               , vertex_id(followee) as v2
               , edge_id(e) as e1

        )
);
Next we will run the HITS algorithm on this PROPERTY GRAPH

%python-pgx
graph = session.read_graph_by_name("SOCIAL_NETWORK", "pg_sql")

hits = analyst.hits(graph, auth='authorities', hubs='hubs')

result_set = graph.query_pgql(

    "SELECT x.USERNAME, x.authorities, x.hubs "

    "MATCH (x) ORDER BY x.authorities DESC, x.hubs DESC")

result_set.print()


+---------------------------------------------------------+
| x.USERNAME   | x.authorities      | x.hubs              |
+---------------------------------------------------------+
| mayor        | 0.7071067811865475 | 0.0                 |
| senator      | 0.5000000000000001 | 0.0                 |
| governor     | 0.5000000000000001 | 0.0                 |
| journalist_2 | 0.0                | 0.5187737569752051  |
| journalist_1 | 0.0                | 0.3668284414587895  |
| journalist_3 | 0.0                | 0.3668284414587895  |
| u            | 0.0                | 0.21488312594237396 |
| t            | 0.0                | 0.21488312594237396 |
| s            | 0.0                | 0.21488312594237396 |
| r            | 0.0                | 0.21488312594237396 |
| q            | 0.0                | 0.21488312594237396 |
| l            | 0.0                | 0.1519453155164156  |
| m            | 0.0                | 0.1519453155164156  |
| n            | 0.0                | 0.1519453155164156  |
| o            | 0.0                | 0.1519453155164156  |
| p            | 0.0                | 0.1519453155164156  |
| v            | 0.0                | 0.1519453155164156  |
| w            | 0.0                | 0.1519453155164156  |
| x            | 0.0                | 0.1519453155164156  |
| y            | 0.0                | 0.1519453155164156  |
| z            | 0.0                | 0.1519453155164156  |
| k            | 0.0                | 0.0                 |
| j            | 0.0                | 0.0                 |
| i            | 0.0                | 0.0                 |
| h            | 0.0                | 0.0                 |
| g            | 0.0                | 0.0                 |
+---------------------------------------------------------+
 
Note: Nodes with no in-links are assigned an authority weight of 0, while nodes with no out-links are assigned a hub weight of 0. State Officials get Authority Rankings because many Hubs point to them. Where Journalist get high Hub Ranking as they point to many Hubs.

No comments:

Post a Comment