The architecture behind hpfriends

10 May 2013

Abstract: In this post, readers will gain some insight into the architecture of the hpfriends platform. Users who want to learn how to use hpfriends should have a look at the guide on using hpfriends.

Introduction

hpfeeds is a simple publish/subscribe data sharing model. It was initially created by Mark ‘rep’ Schloesser as a way to carry high-volume real-time data from different pieces of honeypot software between members of the Honeynet project. hpfriends is an evolution of the hpfeeds data sharing model. It uses the same wire protocol and thus maintains backwards-compatibility with all data sources / sinks.

However, instead of relying on access authorization based on channels, hpfriends uses a social graph in order to make sharing data even easier and more natural. This also circumvents some policy issues about which people should be able to use hpfeeds and who should be allowed onto which channels.

Architecture

Channels in hpfeeds are commonly used to group data of the same origin and structure. A channel dedicated to data from the dionaea honeypot software can be expected to carry data adherring to a specific format, suitable for automated processing. Before hpfriends, channel names were unique and global within the hpfeeds system. This meant that once a channel was established by a user (such as dionaea.captures), that user had to grant permission each time another user wanted to publish or subscribe to the channel dionaea.captures.

hpfriends

hpfriends uses a social sharing graph as its backend database. That means that users and groups are represented as nodes, while the sharing relationships are modelled as edges between those nodes. Sharing is no binary attribute, which is why each edge contains a number of attributes, such as the type of the relationship and the channel name.

hpfriends graph
A (very small) subset of the hpfriends sharing graph.

Channels are no longer global broadcast rooms but rather local decisions. For each message on a channel and user node, a decision is made whether the channel can be accessed by other users based on the existence of a sharing relationship between the original user and other users and groups. Since a channel name can be used by anyone to publish messages, channels have de-facto become exclusive mandatory tags for messages.

If user jojo has a sharing relationship on channel dionaea.captures with user mark, then mark will be able to read messages on that channel. Other user groups might also use the channel name dionaea.captures, yet have no sharing relationships with jojo or mark. User mark may in turn decide to also share his dionaea.captures messages with jojo, or even re-distribute the messages he received from jojo to third parties, i.e. other users and groups. The difference between sharing only messages generated by the user or also incoming messages by other users is indicated using the type attribute.

Groups

Groups are used to address a number of people without having to establish sharing relationsships with each individual member. In our database, groups a represented as nodes which have an edge to each member.

Authkeys

hpfriends uses Authkeys in the same fashion as the hpfeeds system. In fact, the hpfriends message broker is backwards-compatible to the hpfeeds tools. Authkeys are used as single-purpose tokens for different pieces of backend software which is needed to publish and subscribe to data-channels. Each Authkey has a list of channels it’s allowed to publish and subscribe to.

Implementation

The hpfriends system is still very much work-in-progress, which is why the implementation details are subject to change.

The sharing graph was realized with the Neo4j database, an Open-Source graph-database implemented in Java. Although Neo4j enables very efficient graph storage and operations it does have a few quirks. Concerning hpfeeds, it is obviously quite impractical and unnecessary to do a lookup on the sharing graph for each new message on a channel. Instead, we traverse the graph if the sharing relationships change and save the sharing attributes in a flat format afterwars.

For the web frontend we decided to go with the Meteor JavaScript framework. Meteor is still being developed with frequent significant changes. But it enabled us to create a relatively simple real-time web application without having to worry about a lot of aspects. We chose to stick with the Twitter Bootstrap HTML/CSS framework for the visual elements.

The reference implementation for hpfeeds was done in Python, but there are projects to bring hpfeeds support to Go and Ruby. If you are using these projects consider contributing patches or feedback.