Relational Database Management System (RDBMS) Thread

Question

Relational Database Management System (RDBMS) Thread

Luke Hill

I thought we could use a relational database thread.

Post any and all database-related content and questions here. I swear I'll try to answer.

I'm trying to do proper version control on my database, any recommendations? I guess the somewhat basic solution is to do periodic sqldumps of the database, and then to do version control (git, on like a local gitlab server was what I was thinking) on the generated flat text files containing the DDL statements and such. I was wondering if there were any major drawbacks associated with this method, and/or whether there was a better solution that I wasn't seeing.

June 23, 2017 - 21:11

Other urls found in this thread:

en.wikipedia.org/wiki/Rolling_hash
howfuckedismydatabase.com/
twitter.com/NSFWRedditImage

Liam Rivera

VC of the database is a dumb idea.
VC is for your application code which initializes or migrates DB.
Backups of DB are a separate thing, and version controlling them is dumb and it won't scale.

June 23, 2017 - 21:47

Cameron Robinson

For backups, simply run sql dumps (from a cron job or something), pipe through `xz` and copy them somewhere, each one should be independent, so that if one of your backups gets fucked, you won't lose the others.

June 23, 2017 - 21:50

Jose Wood

what?

isn't that going to be super redundant?
that's why I was thinking that version control was needed for the SQLDumps
It'd have the added benefit of making it fairly easy to see what changes between versions, and when.

June 23, 2017 - 21:57

Xavier Phillips

VC=version control

no, it'd be just redundant enough.
for changes, you can still do that if you have proper backups, just use diff.

June 23, 2017 - 22:19

Nathan Lewis

Just do the stupid thing and store an sql structure dump in git. If you ever need to change a table from then on, put the alter table/whatever commands in a numbered file and update your main dump.

Ignore the autistic screeching, you don't need migration framework bullshit unless people are paying you to write code with proper version numbers on it.

June 23, 2017 - 22:29

Eli Cook

What is good way to dynamically create counters in Postgres?
For instance if you are building Holla Forums clone then you need to keep track of post numbers separately for each board and boards could be created at any time. Creating sequences dynamically at runtime seems very dangerous. Is something as simple as table with board ID and value of counter enough?

June 23, 2017 - 22:54

Aiden Murphy

...

June 23, 2017 - 23:09

Henry Russell

It's called chunking and deduplication, dipshit.
en.wikipedia.org/wiki/Rolling_hash
And further, 20TB can be had for under $1000
Storage is cheaper than a starving Filipino whore.

June 24, 2017 - 01:12

Lucas Fisher

I know VC is version control, I was expressing my confusion at the sentiment that version control was not necessary/desirable for databases.

Postgres, as with almost any implementation of SQL, has a way to autoincrement table attributes. But it sounds like you're wondering how to create an association between a post id #, which is unique to each board, and the board name/ID.
I guess my first thought was to create a separate table for each board, and that way the post # could be the primary key for each, but that feels like such a silly solution. If you need to create like a "top posts" ranking or something, then you'll need to union like 50000 tables together. Very messy.
my intuition is no, I think there could be concurrency issues. If two posts occur at the same time for the same board, supposedly you'll query the counter table for the value, for each post, but if each post queries the same counter value, what happens?
I think you might be able to pull it off with a trigger, like before insertion or something.
I'm assuming you'll have a table relating board name to post #? That alone won't be enough though, you couldn't just have a table relating those two values, since the combination of board name and post # would be the primary key. You'd have the post content, timestamp, and so on in the same table, I guess, since in order to uniquely identify the post, at minimum you need the board name and post ID #. We could also store the user ID here, which would be a foreign key to the users table (maybe this is an unforgivable sin for image boards).

June 24, 2017 - 06:18

Gavin Cruz

err, isn't that exactly what git does?
It tracks changes and keeps a record of what actions are needed to change from one version to the next.

June 24, 2017 - 06:26

Sebastian Torres

You mean snapshots.
This >>>>761070 retard to whom I was replying wasn't talking about that at all.
Running full dumps is Arch-tier backup policy. In the production environment such dump can be running for days if not weeks, and restore would take accordingly. The only sane approach is to run several databases with master-master replication. If you are doing dumps or relying on write-logs consider suicide.

If you buy cheapest LTO tapes probably. A production-ready hardware starts with $2K/TB.

June 24, 2017 - 07:14

Jackson Edwards

I'm not dealing with such a scenario, my sqldumps are like 350 mb. Is running the sqldumps through git thus a viable solution?

June 24, 2017 - 07:40

Alexander Lee

why don't you just try it, and measure the shit

June 24, 2017 - 11:58

Owen Reed

UPDATE board SET next_post_id = next_post_id + 1 WHERE board_id = $1 RETURNING next_post_id;

June 24, 2017 - 12:49

Ethan Hernandez

What is best way to test queries for concurrency issues?

June 28, 2017 - 14:01

Caleb Myers

howfuckedismydatabase.com/

June 28, 2017 - 18:09

Austin Powell

...

June 29, 2017 - 06:40

Christopher Wright

Better to keep in application in a map with atomically incremented integers, that would be faster

And at start up time, populate that with something like select max(post_id) + 1 from posts group by board_id

June 29, 2017 - 23:51

Evan Robinson

If I wanted to mess with data in application then I would choose NoSQL solution and get extra speed with no downside.

June 30, 2017 - 01:01

Nolan Martin

(((WEBSCALE)))

June 30, 2017 - 01:10

Samuel Smith

...

June 30, 2017 - 01:27

Cooper Campbell

...

June 30, 2017 - 04:50

Wyatt Morgan

Unless your application is going to be single-threaded in a single machine, that's a bad idea, because then you'll have to write the synchronization code yourself, and it's pretty much guaranteed to be a buggy mess. PostgreSQL already handles that, so let it do it.

June 30, 2017 - 22:40

1 2 3 Next

Relational Database Management System (RDBMS) Thread

Last threads