WordPress Recommendations with Neo4j – Part 3: Collaborative Filtering

This post is part of a series on building a recommendation engine with WordPress. If you haven’t already done so, check out the posts below:

  1. Part 1: Data Modelling
  2. Part 2: Content Based Recommendations
  3. Part 3: Collaborative Filtering
  4. TL;DR – View The Repository

Collaborative Filtering

In it’s simplest terms, Collaborative Filtering is a method of making automated predictions for a user based on the behaviour and preferences of other users. By tracking the behaviour of users through the website, we can provide new users with a contextual recommendation.

In order to provide these recommendations, firstly we will need to start tracking the User’s path through the website. The easiest way to do this would be to create a cookie with a unique identifier for the user. By setting this cookie to expire in 30 days, we can make sure when the user returns we are able to identify the user. Using PHP’s session_id() will allow us to track what the User does within a session and give us further insights.

First we need to create a function that will start the session and identify the user.

index.php
/** @var string User ID */
private static $_user;

/**
* Make sure a session has been started so we have a unique Session ID
* @return void
*/
public static function session() {
// Start Session
session_start();

// Identify User
static::identify();
}

/**
* Identify the current User or create a new ID
*
* @return void
*/
private static function identify() {
if ( array_key_exists('neopress', $_COOKIE) ) {
static::$_user = $_COOKIE['neopress'];
}
else {
static::$_user = uniqid();
}

$expires = time()+60*60*24*30;
$path = '/';

setcookie('neopress', static::$_user, $expires, $path);
}

Then add this function to the init action so it will run as WordPress loads.

index.php
add_action('init', Neopress::class .'::session');

Now we know who the User is, we need to track their path through the site. Let’s create a Session class to hold our logic. On each page load, we want to make sure the User and Session records exist, create a new Pageview node linked to the post that they are visiting. At this stage, we can also create a :NEXT relationship between each Pageview so we can see in which order the content of the site is consumed.

Session.php
<?php
namespace Neopress;

class Session {

/**
* Create a Cypher Query for a Category
*
* @return void
*/
public static function log() {
// Merge Page
$cypher = 'MERGE (p:Post {ID: {page_id}})';
$params = ['page_id' => get_the_ID()];

// Attribute the Pageview to a Session
if ( $session_id = session_id() ) {
// Set User's WordPress ID if logged in
if ($user_id = get_current_user_id()) {
$cypher .= ' MERGE (u:User {user_id:{user_id}})';
$cypher .= ' SET u.id = {id}';

$params['user_id'] = $user_id;
}
else {
$cypher .= ' MERGE (u:User {id: {id}})';
}

// Create Session
$cypher .= ' MERGE (s:Session {session_id: {session_id}})';

// Attribute Session to User
$cypher .= ' MERGE (u)-[:HAS_SESSION]->(s)';

// Create new Pageview
$cypher .= ' CREATE (s)-[:HAS_PAGEVIEW]->(v:Pageview {created_at:timestamp()})';

// Relate Pageview to Page
$cypher .= ' CREATE (v)-[:VISITED]->(p)';

$params['id'] = Neopress::user();
$params['session_id'] = $session_id;
}

// Create :NEXT relationship from last pageview
if (array_key_exists('neopress_last_pageview', $_SESSION)) {
$cypher .= ' WITH v';
$cypher .= ' MATCH (last:Pageview) WHERE id(last) = {last_pageview}';
$cypher .= ' CREATE (last)-[:NEXT]->(v)';

$params['last_pageview'] = $_SESSION['neopress_last_pageview'];
}

// Return Pageview ID
$cypher .= 'RETURN id(v) as id';

// Run Query
$result = Neopress::client()->run($cypher, $params);

// Store Last Pageview in Session
$_SESSION['neopress_last_pageview'] = $result->getRecord()->get('id');
}

}

Now, we can use the shutdown listener to run our code once a page has finished loading.

index.php
class Neopress {
// ...

/**
* Register Shutdown Hook
*
* @return void
*/
public static function shutdown() {
if (is_single()) {
Session::log();
}
}
}

add_action('shutdown', Neopress::class .'::shutdown');

After a few clicks around the site, we can see a rich graph of information developing.

Recommend Unread Posts

Now that we have some information in the database, we can start to build up some more intelligent recommendations. Using our Cypher before, we can utilise the session information we have collected to filter out posts that this user has visited during their session or during previous visits to the site.

MATCH (s:Session) WHERE s.session_id = '3ch9ng6amor3m9a9rao91ikn51'
MATCH (p:Post)-[:HAS_TAXONOMY|AUTHORED]-(target)-[:HAS_TAXONOMY|AUTHORED]-(recommended:Post)
WHERE p.ID = 110
AND recommended.status = "publish"
AND NOT ((s)-[:HAS_PAGEVIEW|VIEWED*2]->(p))
WITH labels(target) as labels, recommended, case when "User" in labels(target) then 10 else 5 end as weight
RETURN id(recommended) as ID, sum(weight) as weighting
ORDER BY weighting DESC LIMIT 5

We can even take it a step further and find all posts that the current user has not read during previous visits by adding a single line of cypher.

AND NOT ((s)<-[:HAS_SESSION]-(:User)-[:HAS_SESSION|HAS_PAGEVIEW|VIEWED]->(p))

Social Recommendations

Social proof is a powerful tool. By creating the connection between users by using information either collected from the website or using third party – for example Facebook friends – we can provide valuable context about why the post has been recommended. In the following query, we use the connections between people to recommend posts that their connections have read. By using Cypher’s COLLECT function, we can return a list of the friends to display to the user.

MATCH (u:User) WHERE id(u) = 169
OPTIONAL MATCH (u)-[:CONNECTED_TO]-(friend:User)-[:HAS_SESSION|HAS_PAGEVIEW|VISITED*3]->(p:Post)
WHERE NOT( (u)-[:HAS_SESSION|HAS_PAGEVIEW|VISITED*3]->(p) )
WITH id(p) AS post_id, COLLECT(friend.name) AS friends
RETURN post_id, friends, SIZE(friends) AS count
ORDER BY count DESC LIMIT 5
post_id friends count
110 [Adam, Joe, Jon] 3
108 [Adam, Jon] 2
113 [Joe, Jon] 2
120 [Matt] 1
135 [Adam] 1

Unearthing Hidden Gems

Sometimes, it may be appropriate to provide the user with something completely different. As humans, we first look to belong and then to differentiate ourselves from the group. Nothing brings more value than a recommendation out of left field. Take music for example, you may like rock music but you’ve shown no interest in Blink 182 – that doesn’t necessarily mean that deserves a recommendation. I hate Blink 182. At this point, there is more value in recommending things that your friends aren’t listening to, the hidden gems in the database. The power of cypher means that with a simple tweak of the query, you can identify a completely different subgraph.

If we take our :CONNECTED_TO relationship, we can filter out recommendations that our connections have the same taxonomy ratings but do not have an association with any connected Users. As we want to look at two connections regardless of who initiated the friendship, I have ommited the direction of the relationship in the query.

AND NOT ((s)<-[:HAS_SESSION]-(:User)-[:CONNECTED_TO]-(friend:User)-[:HAS_SESSION|HAS_PAGEVIEW|VIEWED]->(p))

Conclusion

Throughout this series, we’ve learnt how to use Neo4j to provide better recommendations; from creating WordPress hooks to synchronise our data with Neo4j to running cypher queries to pull out recommendations. These recommendations should provide users with a better experience and allow you to promote your quality content.

Are you trying this? Is there anything you would do differently? Leave a comment below and let me know how you get on.