(Redirected from databases-antipattern)
The database antipattern is the use of a database for primary long-term storage of posts and other personal content (like on an indieweb site), and is an anti-pattern due to the additional maintenance costs, uninspectability, platform-dependence, and long-term fragility of databases and their storage files, as documented with specific examples below.
Is this your first time on IndieWebCamp.com?
Do you have a personal site or want one? start here:
The indieweb community is nearly evenly split about the database-antipattern, as measured by what projects we use on our live sites:
Split experience too: while there are significant experiences (documented below) with the database-antipattern, it is by no means a universally agreed upon aspect of indieweb sites or the indiewebcamp community. A few have even noted their complete lack of first-hand negative experience.
databases for storage
Using databases for the primary storage of content on your personal site is considered by some in the community to be an antipattern.
Databases are all a pain to maintain (i.e. highly human-time inefficient - see DBA tax below), and more fragile than the file system.
For any content you care about, don't put the primary copy in a database.
(this likely deserves its own canonical definitional article)
DBA tax (a form of admin tax) refers to the extra overhead you as a person incur by depending on any system that uses a database for primary storage. If you maintain any such system, you have to spend some amount of your time being a database administrator (DBA), hence that time spent is referred to as the DBA tax that you are incurring.
In particular, a database is yet another space of things.
Everyone already deals with a file system. Why deal with yet another space?
The following are aspects of the DBA tax - hassles of:
All extra crap (thus tax) compared to "just" using the file system.
All databases have a DBA tax.
Experience and evidence shows DBA tax exists as well. People lose things (and have them corrupted) in databases all the time, far more often than in file systems. This is because databases always require extra "magic" for backups etc.
E.g. search the web for people complaining about having to backup their WordPress databases before upgrading their WordPress install.
Even in databases that use only a single file (e.g. SQLite?), that "one file" is still uninspectable, as opposed to simple HTML you can always look at in a browser.
There are many reasons why an uninspectable/random/binary magic file format is fragile, whereas HTML is an inspectable file format that many tools can read/write, including a simple text editor. In particular:
Most databases tend to be biased/tied to a particular programming language (or operating system) as well - more unnecessary constraints - trapping you into a particular platform (language or OS).
In contrast, every language / OS has flat file APIs. Nearly all have DOM Document APIs as well.
MySQL hard reset
... Bad news: MySQL barfed all over the table holding my posts. ... 
And I'm pretty sure that happened when Rackspace hard reset my machine due to swapping.
Note: apparently Postgres doesn't have that problem.
DB connection lossOften (but not always) database server software runs on a separate machine and/or process than the web server, and thus it's possible for the connection between the two to go down, for who knows what reasons, e.g. 2014-10-02 Keybase DB connection lost:
@aaronpk @KeybaseIO Yeah we lost the DB connection in our Web process so all queries were failing. Weird, first time we've seen that bug.
Maintaining / checking / restoring this "connection" is yet another source of DBA tax.
This is apparently a frequent occurrence:
I was getting 'mysql crashed' so I did 'repair' on the tables, and then the homepage and /travel were blank. I imported a backup from 25th but it wouldn't import, said there was a primary key collision. ... when I did the mysql repair it screwed up some keys in the database to point the object of the published triple to the id of another post, because ARC2 has this complicated database structure to turn relational into a graph that I haven't bothered to understand properly
Main article: database-cache
Databases are useful for caching/performance needs for high volume sites. Examples:
In both search use-cases the DBs should be a cache/query store, the real data should still be in flat files (e.g. .md, HTML, etc.) to ensure long-term durability.
Even though databases are fine for rebuildable caches, that's still solving an optimization problem.
Any design that depends on databases is optimizing prematurely - before you have any idea what the performance characteristics of the system are.
Rather than using a database for storage, use open well established formats like markdown or HTML + microformats, and if you really like how databases let you access them, instead build a database-like API on top (of said markdown or HTML + microformats files).
With HTML+microformats2 + a microformats2 parser, we already have a database-like API that gives you data back from HTML.
It's likely that eventually we'll come up with something resembling a database / storage API on top of an HTML+microformats2 file.
Why discuss storage plumbing
Q: Whenever people advocate or deep dive into specific backend tools here, we often say, "that's plumbing, we prefer to focus on end-user use-cases" (and maybe interop) here, but why do we make an exception for database vs flat files, which we dive into, even though it's still plumbing?
A: Summary: when the choice of plumbing strongly impacts the UX, then it is worth documenting problems, issues, advantages, disadvantages, and tradeoffs among choices in plumbing. By the same measure, any criticism not tied to UX should be moved to the specific technology impacted, rather than technology area or approach.
Negative UX impacts from this use of databases tend to be in two categories:
Should I never use a database
Q: What, you mean, never use a database ever? That's crazy! How on earth could you build (some big service) without a database?
This page is specifically about the problems and undue burdens that database usage causes people that maintain their own indieweb sites, and documentation their real-world personal frustrations and issues with databases.
This doesn't mean that you should never use a database: judge technology in a sensible way.
Some popular database systems are overkill for some use cases around personal publishing. If you are trying to build a large scale, multi-user system, this page might not apply to you.
If you are building a personal publishing platform, or archiving content you’ve spent hours/days/years creating, then there are actual tradeoffs (in terms of complexity, management etc.) that you need to think about.
SQLite tax avoidance
Q: Is SQLite a way to avoid the DBA tax?
How are database file formats random
Q: How are database file formats random?
Q: Isn't MySQL pretty portable?
Follow-up: Q: Isn't MySQL trivial to port via: mysqldump ... | scp | mysql ?
A:Yes. If you can't easily scp/rsync files to/from your host or PaaS, then any files created there are "trapped" as well.
Database with export
Q: Would using a database with an export to HTML w/ microformats (or other durable format) files mitigate these concerns?
A: It could mitigate those concerns to the extent that you trust your code to be able to import from those files as well as export to those files any time you create/save user content in your databases. And at that point (where you have dependable code to import/export from/to flat files), consider directly using the flat files, and only using an intermediate database for caching purposes (if you need that, the section on "databases for caching").
Still not convinced
Q: Still not convinced that databases for storage are an anti-pattern or about the DBA tax.
Isn't flat-files a form of database?
Q: This page seems to be advocating flat file storage. But that is just a form of database. This is so hypocritical.
You may also feel free to consider this page to be titled "the big, overly complicated databases with non-trivial maintenance antipattern".
Finally, this wiki is itself hosted on MediaWiki, which stores data in MySQL. So there's that - as an issue of practicality, it may be less costly (time etc.) for some use-cases to re-use existing solutions with a database and bear the DBA tax, than attempt to write your own solution (see wiki-page for work in progress on that use-case in particular).
Q: I don't understand how this anti pattern is related to the indieweb?
Does a dynamic site need a databases
"Q: Does this mean I can’t have a dynamic site without a database?"
A: Not at all! Not using a database as the primary storage for your content doesn’t mean you have to use static hosting. Your dynamic web application can read data out of the filesystem just as it would an opaque database, without all the disadvantages mentioned above. Falcon is one real world indieweb example of flat file storage based dynamic post content server.