WordPress Recommendations with Neo4j – Part 1: Data Modelling

WordPress is arguably the world’s most popular CMS with reportedly almost a quarter of the internet running  on a WordPress install.  When done correctly, WordPress can be a great tool to build a website quickly and in a cost effective manner.

Part of the attraction is arguably the wealth of themes and plugins built around WordPress’ easy-to-use API.  The function you are looking for can be as easy as the_title() to print the posts title to get_the_content() to return the content of the post.

The WordPress API also comes with a huge list of Actions out of the box which allow you to quickly extend functionality.  In this post, I will be using WordPress hooks to sync information with Neo4j in order to provide real time recommendations.

TL;DR:

The accompanying code for this blog post is available on github at github.com/adam-cowley/neo4j-wordpress.

Installing Neo4j

Graph’s are everywhere.  They’re also great for real time recommendations.    The Neo4j is the world’s leading Graph Database with a great community and many use cases online that are beyond the scope of this post.  If you haven’t yet, head over to neo4j.com and follow the installation instructions.

Our Model

The key to setting up our Recommendation Engine is to only adding the information that we need to provide useful recommendations.  Out of the box, WordPress comes Posts, Pages and Taxonomies.   This will give us enough information to start providing Content based recommendations.  Along with this, we can also track User and Session behaviour which will allow us to start providing Collaborative Filtering, combining both to create what is known as a Hybrid Recommendation Engine.

To start with, we should be looking at the graph model below.

wordpress-recommendation-model

A User will author a Post, which will be categorised with at least one WordPress Taxonomy. This will be a good start to provide Content Based Recommendations. We can also use Neo4j to store the User’s behaviour and use that information to build better recommendations using Collaborative Filtering.

In the first instance, we should look to bring in a Post with it’s related taxonomies and it’s related author.  Later on we will look at how we can use WordPress to track user behaviour to build better recommendations.

As Neo4j is a schemaless database, the key is to get the information into a database as quickly as possible.  Once the information is in there, we can use Cypher to update the model and create more relationships as necessary.

Installing Dependencies

Neo4j comes with officially supported drivers for all major languages, from Java and Python to Node and C#. As we’re using WordPress, we can use the GraphAware PHP Client to connect to Neo4j.  The quickest route is to install Neo4j using composer.  Let’s set up a new directory inside our wp-content/plugins folder and run composer init.


cd wp-content/plugins
mkdir neopress
cd neopress
composer init

This should run the Composer config generator.  After following the steps you should now have a composer.json file.  Install the Client by running the following command:

composer require graphaware/neo4j-php-client:^4.0

Plugin Configuration

By default, Neo4j comes with authorisation enabled. We will need a page in the admin panel that will allow us to provide this configuration as a WordPress page. I’m going to rush through this bit, but there are many great posts out there on how to set create a WordPress plugin. First, let’s create our plugin file.

index.php
<?php
/**
Plugin Name: Neopress
Description: Neo4j Recommendation Engine for WordPress
Version: 1.0
Author: Adam Cowley
Author URI: http://wecommit.co
License: GPLv2 or later
Text Domain: neopress
*/

namespace Neopress;

// No Hackers
defined( 'ABSPATH' ) or die( 'No dice.' );

class Neopress {
// Our code will go here...
}

If all has gone well, we should now see a Neopress plugin ready to activate in the plugin section of our WP admin. Head over and click Activate. All good? Cool. Let’s create our admin page.

To connect to Neo4j with the PHP driver, we will need to configure a Host, Port, Username and Password. For brevity, I have excluded the legwork for generating this form but you can view it [here].  Click here to see full instructions on how to set up an Options page in WordPress. Once set up, we should see a form below. Filling out the fields will save the configuration to the wp_options table.

You can see the full code to create the settings page in the repository.

Screen Shot 2017-03-06 at 17.30.36

 

Now, we need to set up a connection to Neo4j.  As my plugin functions are all statically called, I have chosen to set a static property in a singleton pattern.  Open up index.php and add the following code to return a singleton instance.

index.php
/**
* Get Neo4j Client Instance
*
* @return GraphAware\Neo4j\Client\Client
*/
public static function client() {
if ( !static::$_client ) {
// Create Neo Client
$connection_string = sprintf('://%s:%s@%s:',
get_option('neopress_username', 'neo4j'),
get_option('neopress_password', 'neo'),
get_option('neopress_host', 'localhost'),
get_option('neopress_port', '')
);

static::$_client = ClientBuilder::create()
->addConnection('default', 'http'. $connection_string .get_option('neopress_port', 7474))
->addConnection('bolt', 'bolt'. $connection_string .get_option('neopress_bolt_port', 7876))
->build();
}

return static::$_client;
}

Now this method is available, you should be able to use the static  client() method to run a Cypher query.

Neopress::client()->run('MATCH (n) RETURN COUNT(n)');

Now we’re ready to get our hands dirty with WordPress hooks.

Hooks

As I mentioned earlier on, there are hundreds of WordPress hooks. Each of these make WordPress really easy to extend. Actions within the WordPress core are called by the do_action() and do_action_ref_array() functions at various stages and span a wide range of action types from framework initiation and admin actions to printing footer scripts and shutdown.

The hooks that we’re particularly interested in are the ones that either create, update or delete a post or category. A quick CMD+F on the Action Reference page shows us that we could listen for the following actions to keep our graph up to date.

  • save_post – Run when a post or page is created or updated.
  • updated_postmeta – Run when meta data has been updated.
  • trashed_post – Run when a post has been trashed
  • untrash_post – Run when a post has been removed from the trash.
  • deleted_post – Run when a post has been deleted from the database.
  • create_category – Run after a category has been created.
  • edit_category – Run when a category is updated.
  • delete_category – Run when a category is deleted

save_post

When the save_post action is called, we will be passed the $post_ID as a single argument. At this point, we will not know whether this is a newly created post or an update to an existing post. Lucky, Neo4j’s MERGE keyword allows us to run an upsert query, matching on existing properties and setting properties on CREATE or MATCHing existing records. Once we have checked that the post isn’t a revision, we should run a query to make sure our post is up to date in Neo4j.

Our unique identifier for a Post will be it’s ID, so let’s go ahead and make this property a unique constraint.

CREATE CONSTRAINT ON (p:Post) ASSERT p.ID IS UNIQUE

Now that we have our function ready, we can use the add_action function to hook in our code. Firstly, let’s create a new Post.php file in our project folder to hold the hook logic.

Post.php
<?php
namespace Neopress;

class Post {
public static function merge($post_id) {
// Our code will go here...
}
}

Now we’ve got our code. In index.php, add the following code to register the hook.

index.php
add_action('save_post',  Post::class.'::merge');

This will make sure that when the save_post hook is called, the static merge method in our Post class will be called.

In our hook, we need to do two things. Firstly, we need to make sure our categories exist in the database, then make sure that our post has the right properties set against it, before finally make sure that the post has the correct categories attributed to it.

To create the Cypher query to persist our Categories and Tags. As both are considered taxonomy terms, we should store these with a common label. Neo4j allows us to hold several labels against a node. This way we can run queries against both, or use the specific labels to differentiate between them. First, let’s create a constraint on the Taxonomy label.

CREATE CONSTRAINT ON (t:Taxonomy) ASSERT c.term_id IS UNIQUE

Now, let’s create a new class with a method that will add a merge query to our transaction.

Category.php
<?php
namespace Neopress;

use GraphAware\Common\Transaction\TransactionInterface;
use WP_Term;

class Category {
/**
* Create a Cypher Query for a Category
*
* @param Int $post_id
* @return void
*/
public static function merge(TransactionInterface $tx, WP_Term $category) {
$cypher = sprintf('
MERGE (t:Taxonomy:Category {term_id: {term_id}})
SET t += {category}
');

$tx->push($cypher, ['term_id' => $category->term_id, 'category' => (array) $category]);
}
}

Here we’ve got a simple query that will merge a Category on it’s term_id and then bulk set the properties based on what we provide in the category parameter. We can also do the same for Tags, I’ve omitted this for brevity but you can view this file in the repository. Now we have these queries, let’s add the logic to our merge method.

As Neo4j is a transactional database, we can use the PHP SDK’s transaction() method to create a transaction and run these queries in the same batch. Combining our queries into a single transaction will allow us to rollback our changes should anything go wrong.

// Create a new Transaction
$tx = Neopress::client()->transaction();

// Store an array of Term ID's to merge later
$terms = [];

// For each category, add a MERGE query to our Batch
$categories = get_the_category($post_id);
foreach ($categories as $category) {
array_push($terms, $category->term_id);
Category::merge($tx, $category);
}

// ...and the same for tags
$tags = get_the_tags($post_id);
foreach ($tags as $tag) {
array_push($terms, $tag->term_id);
Tag::merge($tx, $tag);
}

The next step is to update the post details. At this stage we don’t need to go overboard with meta data for the post. Let’s just add the permalink, title and status to the post.

Post.php
// Write Cypher MERGE query
$cypher = sprintf('
MERGE (Post {ID: {post_id}})
ON CREATE SET p.created_at = timestamp()
ON MATCH SET p.updated_at = timestamp()
SET p.permalink = {permalink},
p.title = {title},
p.status = {status}
');

// Set Parameters
$params = [
'post_id' => $post_id,
'permalink' => get_permalink( $post_id ),
'title' => get_the_title( $post_id ),
'status' => get_post_status( $post_id ),
];

// Add to Transaction
$tx->push($cypher, $params);

The last step is then to create the relationships between our posts and the categories.

Post.php
// Detach Taxonomies
$cypher = 'MATCH (p:Post {ID: {post_id}})-[r]->(:Taxonomy) DELETE r';
$params = ['post_id' => $post_id];

$tx->push($cypher, $params);

// Relate to new Taxonomies
$cypher = '
MATCH (p:Post {ID: {post_id}})
WITH p, {terms} as terms
UNWIND terms AS term_id
MATCH (t:Taxonomy) where t.term_id = term_id
MERGE (p)-[:HAS_TAXONOMY]->(t)
';

$params = [
'post_id' => $post_id,
'terms' => $terms
];

$tx->push($cypher, $params);

Then we just need to make sure that we commit the transaction to save our node and relationships.


// Run it
$tx->commit();

Once everything has been set up, log into the WordPress control panel and create a new post. If all has gone well, you should see your graph starting to populate as you save and update your posts.

Conclusion

We’ve used WordPress actions to create the functionality to synchronise our WordPress database with Neo4j. In Part 2, we will look at how we can use the information in this database to create a Content based Recommendation Engine.

Click here to read WordPress Recommendations with Neo4j Part 2: Content Based Recommendations.