The HITS (Hyperlink-Induced Topic Search) algorithm makes use of the mutual reinforcing relationship between authorities and hubs to evaluate and rank a set of linked entities. It assigns ranking scores to the vertices, aimed to assess the quality of information and references in linked structures. Instructively, a node with large in-degree is viewed as an authority. If a node points to a considerable number of authoritative nodes, it is referred to as a hub.
We will use the following Social Network to demonstrate the HITS algorithm
Here we have State official, citizens and journalists in a State.As illustrated in the graph above, State Officials represent good authorities, while Journalists represent good hubs. Observe that Hubs and Authorities exhibit a mutually reinforcing relationship: a good hub points to many good authorities; a good authority is pointed to by many good hubs.
Let's start by creating sample data:
create table users (username varchar(20));
create table follows (follower varchar(20), followee varchar(20));
insert into users values
('governor')
, ('mayor')
, ('senator')
, ('journalist_1')
, ('journalist_2')
, ('journalist_3')
, ('g')
, ('h')
, ('i')
, ('j')
, ('k')
, ('l')
, ('m')
, ('n')
, ('o')
, ('p')
, ('q')
, ('r')
, ('s')
, ('t')
, ('u')
, ('v')
, ('w')
, ('x')
, ('y')
, ('z');
insert into follows values
('z', 'governor')
, ('y', 'governor')
, ('x', 'governor')
, ('w', 'governor')
, ('v', 'governor')
, ('journalist_1', 'governor')
, ('journalist_2', 'governor')
, ('u', 'mayor')
, ('t', 'mayor')
, ('s', 'mayor')
, ('r', 'mayor')
, ('q', 'mayor')
, ('journalist_1', 'mayor')
, ('journalist_2', 'mayor')
, ('journalist_3', 'mayor')
, ('p', 'senator')
, ('o', 'senator')
, ('n', 'senator')
, ('m', 'senator')
, ('l', 'senator')
, ('journalist_2', 'senator')
, ('journalist_3', 'senator');
Next create a PROPERTY GRAPH to represet the Following relationship in this Social Graph
CREATE OR REPLACE PROPERTY GRAPH social_network
VERTEX TABLES (
users KEY (username) PROPERTIES ARE ALL COLUMNS
)
EDGE TABLES (
follows
key (follower, followee)
SOURCE KEY (follower) REFERENCES users(username)
DESTINATION KEY(followee) REFERENCES users(username)
PROPERTIES ARE ALL COLUMNS
);
We can query this PROPERTY GRAPH using the following SQL/PGQ
SELECT distinct
follower
, followee
, v1
, v2
, e1
FROM GRAPH_TABLE (
social_network
MATCH (follower IS users) - [e is follows] -> (followee is users)
COLUMNS (
follower.username as follower
, followee.username as followee
, vertex_id(follower) as v1
, vertex_id(followee) as v2
, edge_id(e) as e1
)
);
Next we will run the HITS algorithm on this PROPERTY GRAPH
%python-pgx
graph = session.read_graph_by_name("SOCIAL_NETWORK", "pg_sql")
hits = analyst.hits(graph, auth='authorities', hubs='hubs')
result_set = graph.query_pgql(
"SELECT x.USERNAME, x.authorities, x.hubs "
"MATCH (x) ORDER BY x.authorities DESC, x.hubs DESC")
result_set.print()
+---------------------------------------------------------+
| x.USERNAME | x.authorities | x.hubs |
+---------------------------------------------------------+
| mayor | 0.7071067811865475 | 0.0 |
| senator | 0.5000000000000001 | 0.0 |
| governor | 0.5000000000000001 | 0.0 |
| journalist_2 | 0.0 | 0.5187737569752051 |
| journalist_1 | 0.0 | 0.3668284414587895 |
| journalist_3 | 0.0 | 0.3668284414587895 |
| u | 0.0 | 0.21488312594237396 |
| t | 0.0 | 0.21488312594237396 |
| s | 0.0 | 0.21488312594237396 |
| r | 0.0 | 0.21488312594237396 |
| q | 0.0 | 0.21488312594237396 |
| l | 0.0 | 0.1519453155164156 |
| m | 0.0 | 0.1519453155164156 |
| n | 0.0 | 0.1519453155164156 |
| o | 0.0 | 0.1519453155164156 |
| p | 0.0 | 0.1519453155164156 |
| v | 0.0 | 0.1519453155164156 |
| w | 0.0 | 0.1519453155164156 |
| x | 0.0 | 0.1519453155164156 |
| y | 0.0 | 0.1519453155164156 |
| z | 0.0 | 0.1519453155164156 |
| k | 0.0 | 0.0 |
| j | 0.0 | 0.0 |
| i | 0.0 | 0.0 |
| h | 0.0 | 0.0 |
| g | 0.0 | 0.0 |
+---------------------------------------------------------+
Note: Nodes with no in-links are assigned an authority weight of 0, while nodes with no out-links are assigned a hub weight of 0. State Officials get Authority Rankings because many Hubs point to them. Where Journalist get high Hub Ranking as they point to many Hubs.




