Note: this has been renamed from PURL to PURR to avoid confusion with existing acronyms. A challenge of building for the long web is optimizing systems for ease of maintenance, inspectability and portability. Databases provide a convenient API for manipulating structured data. They make it easier to write featureful applications, but fail at the three above concerns.
Flat file storage schemes improve maintainability and inspectability. They are a way to avoid the database tax, but suffer from lack of portability due to differing formats.
Rather than create separate flat files in markdown or JSON, we can use the published HTML + microformats as its own content store.
First we need an MF2 parser to read semantic objects from the web. This is already being used today to read webmentions. Then we close the loop by rendering the same objects back to HTML, to create a read-write system.
I call this PURR: the parse-update-render-repeat. With PURR, your published HTML becomes the content store. You use the same code to access the whole web: both your content (read-write), and everyone else's (read-only).
Since it's just HTML, your data can be versioned in source control, backed up in the cloud, and copied to a flash drive. Unlike non-HMTL flat files, you can read it offline in a browser.
Another benefit is that microformat correctness is enforced by validation checks (more on this below). You can be more confident that your microformats are compatible with reader apps.
But the killer feature is that it's trivial to import your indieweb site into a PURR system by copying the HTML.
Let's look at how PURR works when updating a post. First we parse a document to an object model. The raw output of MF2 parsers is inconvenient to work with, so an object model or helper library can make this easier. Next we update the model. Maybe we received a webmention, so we add it to the comments. Then we render the object back to HTML using templates.
To publish new content we can skip the parsing stage and jump directly to creating a new object, then render it to HTML.
The site theme is separate from the semantic content, but during rendering it's all baked in to one file. Parsing separates the content from styling again. To update the styling, we can just change the templates, then refresh the system without applying updates to the content.
What happens if you update the site templates but introduce errors in the MF2? We can add a validate function that acts as a safeguard. To validate, instead of publishing the content, we parse it a second time and make sure the original and regenerated objects are identical. It's like having a regression test, but without the effort of creating a test suite.
Importing data into PURR is as trivial as copying the HTML and refreshing the system (similar to updating the template).
I've implemented PURR in two indieweb projects. Both of these hosted my blog over the past two years, and migrating from one to the other was very easy.
Neonblog was written in PHP and designed for a shared hosting environment. It's using the php-mf2 parser and PHP as the template mechanism. I created an object model to simplify the logic. Post permalinks are stored as static content served by Apache.
Skein was written in node.js and designed for hosting on AWS S3. I wanted to try out cloud technologies and learn node.js, so I switched over to Skein a year ago. It uses the microformat-node parser and jade templates. I created a new object model, mf-obj, based on the previous work with Neonblog.
In a future post, I'll go into more detail about Skein and mf-obj.