2019-08-19 14:13 from IGnatius T Foobar
What base image are you all using for your containers?
Alpine? Minideb? Something else?
ubuntu:18.04 or alpine or whatever is a transititive dependency of the packages we're using; which always means debain or ubuntu or alpine.
It is disturbing to me that people trust Alpine so much. As far as I know, Alpine, is a group of hackers with no budget whatsoever, so why should anybody trust them to maintain security-critical software?
2019-08-19 14:13 from IGnatius T Foobar
What base image are you all using for your containers?
Alpine? Minideb? Something else?
I don't use "containers" myself but I have heard Alpine is very very popular for those.
Are you looking for a platform for building a container image for distributing Citadel?
I tried Alpine but it just got too frustrating. I'm assuming for now that I don't have to worry about sustainability, since the whole point of containers is that I can just retool under a new base image and ship an upgrade, as long as I make the plugs fit (ports, volumes, etc).
As for what workload ... I have a project to distribute some software out to our data centers, and decided that we're overdue for getting on the container bandwagon. It directly solves a lot of our problems. I was inspired to do this after installing a third-party application that shipped as a container.
We're starting slowly with a few single-host Docker servers, but I plan to become the expert on container hosting that I should have already been a year or three ago.
But yes, there's also a parallel effort to containerize Citadel. That's why I've been working so hard to eliminate every bit of storage that isn't in the main database: so that database can be attached to the container as a Volume and everything else is ephermal.
This is interesting. To make Citadel a zero-persistence cloud app is within reach: it's already using a NoSQL kind of database (unless something happened while I was asleep upstate for the last 20 years), so it's pretty trivial to develop plugins that move that to something that's hosted as a service in the relevant cloud platforms -- DynamoDB or perhaps MongoDB.
It's nice to not have to worry about backups, replication, volume size, etc, when the IaaS gnomes can do that all for you.
I've got some stuff that's deployed kinda the way you describe. An app that ships as a prepackaged Docker image that requires one or two volume mounts to persist its state. This is deployed onto a mostly stateless and automated ECS infrastructure, the volume mount is done via Amazon EFS, which is a managed NFS service -- you don't have to specify volume sizes, it's all just pay-for-the-capacity-you-use. No more capacity planning. We know damn well that NFS is suboptimal, but this particular app is very lightly loaded.
Does Berkeley DB play will over NFS?
I haven't tried running Berkeley DB over NFS. The whole FOSS world seems
to be wanting to move away from Berkeley DB, but I don't think the time for
us to do that has come *yet*. LMDB looks very cool, but to use it at the
scale we need, it requires a 64-bit machine. In 2019 that's not a big deal
in the x86 world, but we've got a lot of people running mini-nodes on Raspberry
Pi.
Docker's volume plugins look promising, but it seems you have to be running under Kubernetes to use any of the really interesting ones. At this stage of the game, a plugin to move the persistent storage to something hosted as a service, might be plugged into the host rather than into the application.
Obviously the application would be better, but when you're dragging a 30+ year legacy path behind the application with millions of installed seats to upgrade, that muddies the water a bit...
Crawl, walk, run. I'm heading to VMworld next week and have loaded up on container-centric sessions. Hoping to come home a bit smarter about this stuff. (I wish they'd sent me last year when it was in Vegas ... this year it's in SF. Ugh.) I doubt we'll go all-in on Virtzilla's container ecosystem because the level of lock-in they're starting to push is getting uncomfortable.
As for Citadel, my experimental build still requires three persistent volumes: one for the database, one for the upload/download library, and one for SSL keys. A year ago it would have required at least twice as many, but I've spent considerable effort moving everything else into the database. I'd like to get it down to one mount, two at the most, with the ability to convert an existing non-containerized site to a containerized site.
There seem to be a zillion people attempting to build a volume plugin that speaks S3, but none seem to be gaining any traction. I wonder if there's a performance problem or some other constraint they're all hitting. It would be great to just plug in to an existing Ceph cluster, or any of the zillion other services that speak S3.
Docker's volume plugins look promising, but it seems you have to be running under Kubernetes to use any of the really interesting ones. At this stage of the game, a plugin to move the persistent storage to something hosted as a service, might be plugged into the host rather than into the application.
Obviously the application would be better, but when you're dragging a 30+ year legacy path behind the application with millions of installed seats to upgrade, that muddies the water a bit...
Crawl, walk, run. I'm heading to VMworld next week and have loaded up on container-centric sessions. Hoping to come home a bit smarter about this stuff. (I wish they'd sent me last year when it was in Vegas ... this year it's in SF. Ugh.) I doubt we'll go all-in on Virtzilla's container ecosystem because the level of lock-in they're starting to push is getting uncomfortable.
As for Citadel, my experimental build still requires three persistent volumes: one for the database, one for the upload/download library, and one for SSL keys. A year ago it would have required at least twice as many, but I've spent considerable effort moving everything else into the database. I'd like to get it down to one mount, two at the most, with the ability to convert an existing non-containerized site to a containerized site.
There seem to be a zillion people attempting to build a volume plugin that speaks S3, but none seem to be gaining any traction. I wonder if there's a performance problem or some other constraint they're all hitting. It would be great to just plug in to an existing Ceph cluster, or any of the zillion other services that speak S3.
Docker's volume plugins look promising, but it seems you have to be
running under Kubernetes to use any of the really interesting ones. At
Not to my knowledge! Really depends on your cloud infrastructure, but the stuff for EBS/EFS is a breeze and works fine on ECS (which is basically just bare docker with a very minimal management daemon)
There seem to be a zillion people attempting to build a volume plugin
that speaks S3, but none seem to be gaining any traction. I wonder if
You'd have to build a kernel FS driver or Fuse driver that speaks S3. I'm not sure what the point is; this is not S3's targeted use case. That's what EFS is for.
S3 seems intended for near-line object storage rather than primary storage
... but one benefit of S3 is that the protocol is spoken by dozens of different
providers/technologies. I have my own datacenters and am building for a decidedly
non-Amazon environment. Thankfully it seems that from the container's point
of view, a volume is a volume, regardless of what backing store the system
administrator has given it.
Not sure exactly what you mean by near-line, but yeah, it's not "hot" storage, it's a bit more of an archival thing, although it can be used for simple lower-volume static file hosting because of its simplicity and its web APIs. (Not a good fit for anything that needs a CDN.)
Basically, it's optimized for cost and durability rather than performance/availability, and it's only accessible through slower HTTP APIs.
What you just described is exactly what nearline storage is. Not online (like
high speed block storage) and not offline (like tape or removable). Accessible
without requiring any work to mount it, but not fast enough to use as primary
storage. As with a lot of these things it's ambiguous what counts as nearline.
In a world of enterprise-grade SSD and 15KRPM SAS, some people now refer to slow but very large disks (like 8TB 7200KRPM SATA) as nearline, even though you can mount it directly. As an old mainframe hand, my idea of nearline storage is when your program is suspended while DFHSM uncompresses files out of an archive before they can be accessed.
Nothing in the open systems world has ever matched the transparency of DFHSM.
It could take infrequently-used files (datasets) and migrate them to compressed archives on a slow disk, or even tape, while keeping references to them in the catalog. The moment a program accessed the file, it would block the program and bring it back out, even signaling an operator to mount a tape if necessary (which is decidedly offline storage, but it made it look the same). Really great stuff.
In a world of enterprise-grade SSD and 15KRPM SAS, some people now refer to slow but very large disks (like 8TB 7200KRPM SATA) as nearline, even though you can mount it directly. As an old mainframe hand, my idea of nearline storage is when your program is suspended while DFHSM uncompresses files out of an archive before they can be accessed.
Nothing in the open systems world has ever matched the transparency of DFHSM.
It could take infrequently-used files (datasets) and migrate them to compressed archives on a slow disk, or even tape, while keeping references to them in the catalog. The moment a program accessed the file, it would block the program and bring it back out, even signaling an operator to mount a tape if necessary (which is decidedly offline storage, but it made it look the same). Really great stuff.
I remember going to Linux Expo at Javitz back in the early aughts, and looking at some of those big tape robot things, so yeah I know what you mnean. All going the way of the dodo with cloud archive-optimized storage services (such as S3 Glacier in particular, or even something like the sc1 ("Cold HDD") volume type on EBS.
Yup. When I was doing mainfame work at Waldenbooks in the early 90's, we
were just getting started with robotic tape libraries. Most of the tapes
were mounted manually. And we still had a lot of data on reel-to-reel tapes,
which always had to be mounted manually. It was pretty cool, though ... there
were big digital displays over each tape unit that displayed to the operators
which tape number they needed to mount. There was something amusing about
the fact that they'd just go and mount the tapes without knowing what job
it was for, or who ran it.
At the Big Blue X we had tape robots, but it was never for HSM ... strictly backups. But as you correctly point out, tape backup is in a big decline.
At some point it became more valuable to just send that data to cheap-and-deep disk storage. Even for offsite backups, we just send it to disk at another data center. The last holdouts for tape are the people who need to have their backups taken offsite to an underground vault somewhere, but that's such a small minority now that we don't even do it locally anymore. We just send the backups to a data center that has a tape library. Most people look at the cost of offsite transport and archival, and decide that simply having a second copy of the data in another time zone is enough.
I miss having Linux Expo locally. :(
At the Big Blue X we had tape robots, but it was never for HSM ... strictly backups. But as you correctly point out, tape backup is in a big decline.
At some point it became more valuable to just send that data to cheap-and-deep disk storage. Even for offsite backups, we just send it to disk at another data center. The last holdouts for tape are the people who need to have their backups taken offsite to an underground vault somewhere, but that's such a small minority now that we don't even do it locally anymore. We just send the backups to a data center that has a tape library. Most people look at the cost of offsite transport and archival, and decide that simply having a second copy of the data in another time zone is enough.
I miss having Linux Expo locally. :(
Heh. These guys might have wanted to reconsider their tape archive strategy: https://www.nytimes.com/2019/06/11/magazine/universal-fire-master-recordings.html
S
Sad story.
Harkening back to an old question, I use a 10 year old Debian, and I build all my own dependencies.
My results work on Debian and Redhat derived OSes, and would probably work on pretty much any Linux with a reasonable kernel within 10 years of today's kernel.
But it's a lot of work to put something like that together, especially if you have a lot of dependencies.
But, my results don't need to work in a container.
Still, I use a container to help provide a cross compiler.
And the container uses the 10-year old Debian distribution as a base.
Yeah, containers are particularly useful in a heterogeneous build environment to keep the build env(s) separate from the build host (Jenkins or whatever.)