This suggests that a call to sbrk() will increase the size of the heap
but does not actually allocate physical memory, and when the process
tries to access an address in that space, a page fault is raised and
then you get actual capacitors mapped to those pages.
Yes. And of course this next statement depends on the libc implementation, but glibc has some threshold (which again is tunable) to determine whether your malloc will go into the sbrk area or will it use mmap to satisfy your request. There's also a glibc function called malloc_trim that you can use to trigger glibc to try to shrink the program break if it's able to, but of course it can only release as much as the highest still in use address. I think there might also be conditions when glibc will auto trim, I don't remember the details.
Also, tangentially related, there is a madvise syscall that allows you to notify the kernel that you're no longer using pages that you still have mapped, which means it's allowed to free those physical pages until you fault them again.
I may have mention it before, but I've nearly made half my career out of being the solutions guy for a company of "embedded" developers who think that memory is infinite. That's why I know far more about this subject than I ever cared to. :)
Memory allocation is insanely naive: it keeps track of the number of
items actually present in the array, and the number of items for which
it has allocated memory. Whenever an append operation would cause the
size of the array to exceed the size of the allocation, it simply
doubles the allocation. This avoids the need to call realloc() for
*every* append operation.
Coincidentally I've been recently working on a hash map implementation in Forth and I've decided to factor out the internal dynamic array into its own module.
The way you've written it is how other languages also solve the same problem. I've read that growth factor of 2 or higher is considered quite aggressive and can cause problems with naive allocation algorithms (fragmentation). I did a bit of testing but I wasn't able to prove that this is actually a problem on Linux, probably because whatever allocation algorithm glibc uses is decent. It seems like the ideal growth factor lies somewhere between 1 and 2. Facebook uses 1.5 for their array library, Java uses 1.5, G++ and Clang use 2 and I think C# also uses 2.
My implementation is currently at about 300 lines of dense Forth + 150 lines of tests, but I'm seriously considering trashing the majority of the features because they add a lot of complexity without much gain. For example, I've added the functionality to override the way the array internally retrieves and stores the elements. So you could define type specific arrays which works internally a bit like OOP but with function pointers:
12345 myarray array-add \ C pseudocode: array_add(myarray, 12345)
or
x y z myarray array-add \ C pseudocode: array_add(myarray, POINT(x, y, z))
To make this work I've had to make it possible to set different set/get functions for operations to/from the stack and memory and this is done per array with an options structure. In hindsight it may just be better to redefine these words and then include the same Forth file multiple times to compile different dynamic array behaviors and use namespaces to differentiate them (in Forth these are called word lists). This has the added benefit of being more efficient too, because you don't need function pointers (in direct threaded Forths).
Looks like I'll be rewriting it a third time :) ... but I think I now know what I actually want.
I may have mentioned this to you ( i have in general.. i forget the message thread now ), but i help run a FB group for Forth. We even have people like Leo Brodie on it..
Wed Jun 14 2023 07:04:38 AM EDT from x4thCoincidentally I've been recently working on a hash map implementation in Forth and I've decided to factor out the internal dynamic array into its own module.
2023-06-14 07:29 from Nurb432
I may have mentioned this to you ( i have in general.. i forget the
message thread now ), but i help run a FB group for Forth. We even
have people like Leo Brodie on it..
Yes I remember, I do watch the videos you guys post on youtube every now and then.
:)
Wed Jun 14 2023 09:18:48 AM EDT from x4th2023-06-14 07:29 from Nurb432
I may have mentioned this to you ( i have in general.. i forget the
message thread now ), but i help run a FB group for Forth. We even
have people like Leo Brodie on it..
Yes I remember, I do watch the videos you guys post on youtube every now and then.
Am I insane for doing this? That style of allocation has never let me
down in the past. I'm debugging a reoccurring segfault problem right
now in some code that does three nested levels of linked lists, and
it's abusing the allocator to the point where the allocator just tells
me to go shit in my hat and aborts the program. I'm thinking of
replacing all three linked lists with calls to my array class.
With linked lists I've made the mistake of pooling items in a block of memory because it's really efficient, but this has the unintended consequence that a realloc can move the block elsewhere and invalidate all pointers.
It's really hard to find these type of bugs unless you know what you're looking for. If you know, then you can replace malloc and realloc and track invalidated regions and when you get the segfault you check if the invalid access was to an invalidated region.
There are probably tools for this, valgrind? but with Forth I just redefine allocate and resize in my application.
By coincidence, I came across a blog post [ https://nullprogram.com/blog/2019/06/30/ ] in which I learned that what I had built is now called "fat pointers" -- a data type that contains a pointer to the data itself, along with metadata indicating the size of the contained data type, the number of items stored, and the number of items for which memory has been allocated.
Im sorry, that term is offensive to someone on this planet, so you now owe me 10 million USD.
or something like that :)
Fatass pointer. Obese pointer. Ate too many cheezburgerz pointer.
Tue Jun 20 2023 17:54:19 EDT from IGnatius T FoobarRiiiiiiiiiiight. If I ever start being politically correct you have my permission to kill me.
2023-06-20 18:07 from nonservator <nonservator@uncensored.citadel.org>
Cal it a Chubby Chaser.
That was my nickname at uni
I had one in high school. wont repeat it here, else people from back then might recognize me.. Much like our resident cat, i have things in my past id rather not deal with and did a few things i regret...
*coughs up a catnip-laced hairball*
Wed Jun 21 2023 20:02:45 EDT from Nurb432Much like our resident cat, i have things in my past id rather not deal with and did a few things i regret...
I had one in high school. wont repeat it here, else people from back
High school ... makes *me* want to cough up a hairball, and I'm not even a cat. Good riddance. My daughter graduated today and somehow I didn't kill myself.
Anyway, I was today years old when I learned that C11 has atomic operations.
Does the `_Atomic` keyword guarantee a lock of some sort? And if so, is it worth converting a multithreaded program to use `_Atomic` locks instead of `pthread_mutex_*` locks?
2023-06-25 04:27 from IGnatius T Foobar <ajc@citadel.org>I had one in high school. wont repeat it here, else people from back
High school ... makes *me* want to cough up a hairball, and I'm not
even a cat. Good riddance. My daughter graduated today and somehow I
didn't kill myself.
Anyway, I was today years old when I learned that C11 has atomic
operations.
Does the `_Atomic` keyword guarantee a lock of some sort? And if so,
is it worth converting a multithreaded program to use `_Atomic` locks
instead of `pthread_mutex_*` locks?
Yes, and I actually don't like it. Multithreading is a property of the platform, it belongs in a library and not in the language, in my opinion.
If it's anything like GCC's __atomic_* accessors, then it emits atomic assembly when the ISA supports it, or emits mutex locking when it doesn't. This can actually pose a really tricky trap in some situations because POSIX mutexes have a glaring design flaw that they don't behave very well in conjunction with fork(). So you have a mechanism that in some situations warrants special care, and now you have a feature in the language that on some platforms may emit that mechanism invisibly. I see this as nothing but a recipe for headache.
Reading through as many examples as I can and I'm still having some difficulty understanding how `_Atomic` is supposed to be applied. There don't seem to be any explicit lock/unlock operations
2023-06-26 21:22 from IGnatius T Foobar <ajc@citadel.org>
I didn't think POSIX mutexes were supposed to be usable *at all* across
a fork(). So if you're trying to do that, I wouldn't be surprised if
the results are inconsistent, because (for example) Linux and FreeBSD
implement fork() in *very* different ways.
Reading through as many examples as I can and I'm still having some
difficulty understanding how `_Atomic` is supposed to be applied.
There don't seem to be any explicit lock/unlock operations
The problem is that libraries and applications don't know about each other. A library could use _Atomic which produces mutexing, and then an application that links with it could call fork without knowing any better.
Hilariously, it seems the POSIX guys tried to address this once by adding the pthread_atfork function. If you read the Linux man page for this function, there's a note in there that basically says they added it to fix deadlocks when the program forks with a mutex locked, and then they realized it doesn't fix it! Like I said before, huge oversight here. What they need is for fork to inherently take all mutexes before it forks and then reinitialize them in the child.
And I don't think there are explicit lock/unlock operations. It's a variable qualifier, so the compiler knows that any accesses to it should use atomic assembly or (I'm just guessing) possibly libatomic (which uses pthread_mutex underneath).
That sounds like a good thing, not a problem.
Mon Jun 26 2023 08:45:05 PM EDT from zelgomerThe problem is that libraries and applications don't know about each other.