only use memcpy realloc to shrink if an exact-sized free chunk exists otherwise, shrink in-place. as explained in the description of commit 3e16313f8fe2ed143ae0267fd79d63014c24779f, the split here is valid without holding split_merge_lock because all chunks involved are in the in-use state.
fix memset overflow in oldmalloc race fix overhaul commit 3e16313f8fe2ed143ae0267fd79d63014c24779f introduced this bug by making the copy case reachable with n (new size) smaller than n0 (original size). this was left as the only way of shrinking an allocation because it reduces fragmentation if a free chunk of the appropriate size is available. when that's not the case, another approach may be better, but any such improvement would be independent of fixing this bug.
fix invalid use of access function in nftw access always computes result with real ids not effective ones, so it is not a valid means of determining whether the directory is readable. instead, attempt to open it before reporting whether it's readable, and then use fdopendir rather than opendir to open and read the entries. effort is made here to keep fd_limit behavior the same as before even if it was not correct.
add fallback a_clz_32 implementation some archs already have a_clz_32, used to provide a_ctz_32, but it hasn't been mandatory because it's not used anywhere yet. mallocng will need it, however, so add it now. it should probably be optimized better, but doesn't seem to make a difference at present.
only disable aligned_alloc if malloc was replaced but it wasn't it both malloc and aligned_alloc have been replaced but the internal aligned_alloc still gets called, the replacement is a wrapper of some sort. it's not clear if this usage should be officially supported, but it's at least a plausibly interesting debugging usage, and easy to do. it should not be relied upon unless it's documented as supported at some later time.
reintroduce calloc elison of memset for direct-mmapped allocations a new weak predicate function replacable by the malloc implementation, __malloc_allzerop, is introduced. by default it's always false; the default version will be used when static linking if the bump allocator was used (in which case performance doesn't matter) or if malloc was replaced by the application. only if the real internal malloc is linked (always the case with dynamic linking) does the real version get used. if malloc was replaced dynamically, as indicated by __malloc_replaced, the predicate function is ignored and conditional-memset is always performed.
switch to a common calloc implementation abstractly, calloc is completely malloc-implementation-independent; it's malloc followed by memset, or as we do it, a "conditional memset" that avoids touching fresh zero pages. previously, calloc was kept separate for the bump allocator, which can always skip memset, and the version of calloc provided with the full malloc conditionally skipped the clearing for large direct-mmapped allocations. the latter is a moderately attractive optimization, and can be added back if needed. however, further consideration to make it correct under malloc replacement would be needed. commit b4b1e10364c8737a632be61582e05a8d3acf5690 documented the contract for malloc replacement as allowing omission of calloc, and indeed that worked for dynamic linking, but for static linking it was possible to get the non-clearing definition from the bump allocator; if not for that, it would have been a link error trying to pull in malloc.o. the conditional-clearing code for the new common calloc is taken from mal0_clear in oldmalloc, but drops the need to access actual page size and just uses a fixed value of 4096. this avoids potentially needing access to global data for the sake of an optimization that at best marginally helps archs with offensively-large page sizes.
move oldmalloc to its own directory under src/malloc this sets the stage for replacement, and makes it practical to keep oldmalloc around as a build option for a while if that ends up being useful. only the files which are actually part of the implementation are moved. memalign and posix_memalign are entirely generic. in theory calloc could be pulled out too, but it's useful to have it tied to the implementation so as to optimize out unnecessary memset when implementation details make it possible to know the memory is already clear.
reverse dependency order of memalign and aligned_alloc this change eliminates the internal __memalign function and makes the memalign and posix_memalign functions completely independent of the malloc implementation, written portably in terms of aligned_alloc.
rewrite bump allocator to fix corner cases, decouple from expand_heap this affects the bump allocator used when static linking in programs that don't need allocation metadata due to not using realloc, free, etc. commit e3bc22f1eff87b8f029a6ab31f1a269d69e4b053 refactored the bump allocator to share code with __expand_heap, used by malloc, for the purpose of fixing the case (mainly nommu) where brk doesn't work. however, the geometric growth behavior of __expand_heap is not actually well-suited to the bump allocator, and can produce significant excessive memory usage. in particular, by repeatedly requesting just over the remaining free space in the current mmap-allocated area, the total mapped memory will be roughly double the nominal usage. and since the main user of the no-brk mmap fallback in the bump allocator is nommu, this excessive usage is not just virtual address space but physical memory. in addition, even on systems with brk, having a unified size request to __expand_heap without knowing whether the brk or mmap backend would get used made it so the brk could be expanded twice as far as needed. for example, with malloc(n) and n-1 bytes available before the current brk, the brk would be expanded by n bytes rounded up to page size, when expansion by just one page would have sufficed. the new implementation computes request size separately for the cases where brk expansion is being attempted vs using mmap, and also performs individual mmap of large allocations without moving to a new bump area and throwing away the rest of the old one. this greatly reduces the need for geometric area size growth and limits the extent to which free space at the end of one bump area might be unusable for future allocations. as a bonus, the resulting code size is somewhat smaller than the combined old version plus __expand_heap.