Relational Database Management System (RDBMS) Thread

I thought we could use a relational database thread.

Post any and all database-related content and questions here. I swear I'll try to answer.

I'm trying to do proper version control on my database, any recommendations? I guess the somewhat basic solution is to do periodic sqldumps of the database, and then to do version control (git, on like a local gitlab server was what I was thinking) on the generated flat text files containing the DDL statements and such. I was wondering if there were any major drawbacks associated with this method, and/or whether there was a better solution that I wasn't seeing.

Other urls found in this thread:

en.wikipedia.org/wiki/Rolling_hash
howfuckedismydatabase.com/
twitter.com/NSFWRedditImage

VC of the database is a dumb idea.
VC is for your application code which initializes or migrates DB.
Backups of DB are a separate thing, and version controlling them is dumb and it won't scale.

For backups, simply run sql dumps (from a cron job or something), pipe through `xz` and copy them somewhere, each one should be independent, so that if one of your backups gets fucked, you won't lose the others.

what?


isn't that going to be super redundant?
that's why I was thinking that version control was needed for the SQLDumps
It'd have the added benefit of making it fairly easy to see what changes between versions, and when.

VC=version control

no, it'd be just redundant enough.
for changes, you can still do that if you have proper backups, just use diff.

Just do the stupid thing and store an sql structure dump in git. If you ever need to change a table from then on, put the alter table/whatever commands in a numbered file and update your main dump.

Ignore the autistic screeching, you don't need migration framework bullshit unless people are paying you to write code with proper version numbers on it.

What is good way to dynamically create counters in Postgres?
For instance if you are building Holla Forums clone then you need to keep track of post numbers separately for each board and boards could be created at any time. Creating sequences dynamically at runtime seems very dangerous. Is something as simple as table with board ID and value of counter enough?

...

It's called chunking and deduplication, dipshit.
en.wikipedia.org/wiki/Rolling_hash
And further, 20TB can be had for under $1000
Storage is cheaper than a starving Filipino whore.

I know VC is version control, I was expressing my confusion at the sentiment that version control was not necessary/desirable for databases.


Postgres, as with almost any implementation of SQL, has a way to autoincrement table attributes. But it sounds like you're wondering how to create an association between a post id #, which is unique to each board, and the board name/ID.
I guess my first thought was to create a separate table for each board, and that way the post # could be the primary key for each, but that feels like such a silly solution. If you need to create like a "top posts" ranking or something, then you'll need to union like 50000 tables together. Very messy.
my intuition is no, I think there could be concurrency issues. If two posts occur at the same time for the same board, supposedly you'll query the counter table for the value, for each post, but if each post queries the same counter value, what happens?
I think you might be able to pull it off with a trigger, like before insertion or something.
I'm assuming you'll have a table relating board name to post #? That alone won't be enough though, you couldn't just have a table relating those two values, since the combination of board name and post # would be the primary key. You'd have the post content, timestamp, and so on in the same table, I guess, since in order to uniquely identify the post, at minimum you need the board name and post ID #. We could also store the user ID here, which would be a foreign key to the users table (maybe this is an unforgivable sin for image boards).

err, isn't that exactly what git does?
It tracks changes and keeps a record of what actions are needed to change from one version to the next.

You mean snapshots.
This >>>>761070 retard to whom I was replying wasn't talking about that at all.
Running full dumps is Arch-tier backup policy. In the production environment such dump can be running for days if not weeks, and restore would take accordingly. The only sane approach is to run several databases with master-master replication. If you are doing dumps or relying on write-logs consider suicide.

If you buy cheapest LTO tapes probably. A production-ready hardware starts with $2K/TB.

I'm not dealing with such a scenario, my sqldumps are like 350 mb. Is running the sqldumps through git thus a viable solution?

why don't you just try it, and measure the shit

UPDATE board SET next_post_id = next_post_id + 1 WHERE board_id = $1 RETURNING next_post_id;

What is best way to test queries for concurrency issues?

howfuckedismydatabase.com/

...

Better to keep in application in a map with atomically incremented integers, that would be faster

And at start up time, populate that with something like select max(post_id) + 1 from posts group by board_id

If I wanted to mess with data in application then I would choose NoSQL solution and get extra speed with no downside.

(((WEBSCALE)))

...

...

Unless your application is going to be single-threaded in a single machine, that's a bad idea, because then you'll have to write the synchronization code yourself, and it's pretty much guaranteed to be a buggy mess. PostgreSQL already handles that, so let it do it.