Fucking hell.
while (true) System.err.println("node.js sucks, and you're a moron for even considering using it.")
Worst language runtime in the world. You'll never get a meaningful stacktrace for anything, because everything is a short-duration callback run from the event loop. Meanwhile, the language designers have piled enough syntactic sugar on top of callbacks to make them look like linear flow-control, basically reifying threads (albiet cooperative threads.) If the new calling convention is "everything is an async function", you just trashed the alleged performance benefits of async IO.
All of this could be have designed so much better by competent people without abandoning JavaScript, even.
Does copy data to a data warehouse via a cron job make sense? Shouldn't that
happen INSIDE the database instead?
2019-02-11 11:53 from Ragnar Danneskjold
Does copy data to a data warehouse via a cron job make sense?
Shouldn't that happen INSIDE the database instead?
We did a lot of this a couple of jobs ago; maybe it wans't exactly a cron, the ETL tool might have managed the scheduling. But either way, it's not an uncommon pattern.
We had it set up with a little bit of both. Inside the database there were audit tables (which track insert/update/deletes for each associated table) and these were populated by triggers. Then the ETL job comes along periodically and pulls the new audit rows.
Ragnar - Our backups happen via transaction log shipping. and then being reapplied
to backup databases, but that's not really what you want, so you'll need to
have something outside the database as well to pull I would imagine.
Data warehouse is really for reporting purposes over time, not for the real
time work done. Not backup. Although that's something we need to consider
as well.....
Correct. Usually when grabbing for data warehouse you're doing some sort of
aggregation before you carry it over. WEll in the old days that's how we would
do it. Now we might bring over the details and aggregate on the other machine.
Riddle me this. Why the eff does mongodb need 1gb of heap cache to serve a database that has about 10 rows?
I get that the heap limit is a tunable with the new storage backend since 3.2, but something seems wrong here.
Correct. Usually when grabbing for data warehouse you're doing some
sort of aggregation before you carry it over. WEll in the old days
that's how we would do it. Now we might bring over the details and
aggregate on the other machine.
Yes. Production OLTP should be assumed to be too overloaded (even if it isn't) to be bothered doing the cube rollups. Unless perhaps they're fairly simple and can be done in triggers on every insert--but that's only the case for the simplest of data warehouses.
Read replicas (when set up with popular toolsets such as RDS) tend to be read-only. So, it doesn't happen there either.
There *are* some alternatives, I remember seeing a closed-source product that would parse your mysql binlogs and help handle pushing all that data to your warehouse. It's not something I've ever tried, but it's out there.
Sorry I haven't been around much lately. I have to redesign our product at work, and it has me very distracted.
It's good, though. Our original design isn't cutting it, but now that we know how folks want to work with the product, we can build something better.
Besides, some of the designed involved efforts from someone who doesn't really know how to engineer this kind of thing, so reworking things from another perspective should give us more flexibility for all kinds of peculiar ways that someone might want to work with this.
But... it's making me think about stuff like 'Do we want to use a nosql database, or stick to a relational database with a kind of object-orientedness grafted onto it', or 'these guys communicate to each other using HTTPS, but are mechanical (all API-driven), so authentication should be handled through http signatures instead of the usual authentication schemes, since we can share keys', and so on.
It's liberating, but a lot of work.
Isn't that always how we spend way too much of our time? Cleaning up other
people's messes?
Seems like there are very few people who understand how to build clean, loosely-coupled components with well-defined interfaces between them.
Seems like there are very few people who understand how to build clean, loosely-coupled components with well-defined interfaces between them.
Yes, quite a bit of time is spent cleaning up messes.
Although, sometimes, the mess is our own, heh.
Most of the interfaces we need to clean up involve stuff upstream of the stuff I wrote.
But I have some cleanup of my own I need to do.
What makes this current effort kind of awkward is the need to work from both the bottom and the top at the same time. I prefer to work from the bottom up (whatever is closest to the wire, or whatever originates the data), but I have to work with others now, and that means ensuring there's enough design in place to keep everyone busy so we aren't wasting money or time.
Since the other guy works on front-end development, to keep him going, I have to work on both ends at the same time, at least as far as design goes.
Then, to avoid some communication issues caused by this remote workplace thing we have going on, I'm making a lot of UML drawings to explain things that mere English isn't quite conveying.
Which is funny, 'cause I'm using PlantUML to do it, so I'm literally writing stuff down to make it generate pretty graphs.
I've literally built an infrastructure just to help describe the design of this reworked product, using TiddlyWiki for documentation, and our own PlantUML server to provide UML diagrams within the TiddlyWiki to help explain bits that English can't quite convey well enough. It works well enough that I'd like to figure out if I could maybe integrate a TiddlyWiki into our product to provide user-extensible documentation.
Huh.
-Wl,--version-script=your_script.map can do downright magical things to ensure you don't export symbols from your shared object that you do not wish to export.
It'll also make the shared objects load faster, since it exports far fewer symbols.
That's pretty cool. It sounds like if you pay enough attention to detail
you can end up with a binary that doesn't try to bring along Like A Gig, Man
of unnecessary libraries.
I am a fan of -Wl for using "this copy of this library, not the system one" when needed. Alas, this practice has not become popular; instead, we are now expected to containerize our applications to achieve that effect.
And now, the "this person needs cluebat therapy" story of the day. I just read the following email:
It was an email regarding "...the naming of the [storage] nodes. It does not follow the standards that we use for all the remaining storage technologies.
We've been working on automation and configuration management for the various storage flavors we support, these break when devices don't follow standards already in place."
Arrgh. Ok, well, naming standards are good and we should use them, and kudos to you for working on automation, but if your scripts depend on specific node names, your data model is brittle and it's going to break. Guaranteed. I should be able to call my storage devices "array1", "array2", "array3", and "big_fat_funky_booty" and the script shouldn't care.
Deriving anything from a name other than the name itself is for humans, not computers.
Data modeling is my superpower. And apparently it's a rare superpower.
I am a fan of -Wl for using "this copy of this library, not the system one" when needed. Alas, this practice has not become popular; instead, we are now expected to containerize our applications to achieve that effect.
And now, the "this person needs cluebat therapy" story of the day. I just read the following email:
It was an email regarding "...the naming of the [storage] nodes. It does not follow the standards that we use for all the remaining storage technologies.
We've been working on automation and configuration management for the various storage flavors we support, these break when devices don't follow standards already in place."
Arrgh. Ok, well, naming standards are good and we should use them, and kudos to you for working on automation, but if your scripts depend on specific node names, your data model is brittle and it's going to break. Guaranteed. I should be able to call my storage devices "array1", "array2", "array3", and "big_fat_funky_booty" and the script shouldn't care.
Deriving anything from a name other than the name itself is for humans, not computers.
Data modeling is my superpower. And apparently it's a rare superpower.
I've always hated intelligent keys where the ID is meant to mean something
to the system. It's begging for problems down the line.
Yeah, I don't think systems should rely upon computer names in that sense.
They ought to be able to create a file that holds these names, and read from that instead. And maybe they can use some other process to try and validate the names within the file actually are the systems they think they are.
Eh, but what would I know?
But, yeah, that trick is going to help keep me from having naming collisions in this library I'm making available to customers. I'm already including all the other libraries they need to use it, so they should be okay.
This is how it works by default in Windows. You make a DLL, you have to specify the exports, and only those exports are, er, exported. It doesn't seem to work that smoothly in Linux, where you have to set about three different compiler or linker switches while also using that DLLEXPORT macro you made for Windows.
Meh.
Chuffed I got this library to work in Node.js properly (while building an Electron app to drive it, mostly for testing purposes). Next, I get to try to use this properly.
Well not quite everything.
Last few days, I've been subverting bash, so I've been delving into c/c++ again.
"Did you give birth to a baby?" asked my co-worker.
"Yes, and it's evil!" I replied.
Got it to work in 4.x of bash, and am now porting my brand of wrong to 5.x.
Huh.
Stumbled into this recently:
https://json-editor.github.io/json-editor/
You can create a dynamic way to store forms to display to people later.
So you build a schema to fit some new blob of data you want to store as JSON somewhere, chuck the schema into a database, and you can pull it from the database to use when generating the form later. At least, that's how I'm gonna wind up using it.
There's something just batshit weird about this stuff, but it's damned useful for what I need.