summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* Document that the pipefail option is unsupportedKerin Millar2024-08-221-3/+3
| | | | | | | | | This is worth mentioning because POSIX-1.2024 (Issue 8) introduces pipefail as a standard feature. https://austingroupbugs.net/view.php?id=789 Signed-off-by: Kerin Millar <kfm@plushkava.net>
* Warn upon sourcing if errexit or nounset be enabledKerin Millar2024-08-221-3/+15
| | | | | | | | | | | | | | | I am unwilling to pander to those that elect to enable either of the errexit or nounset options. Rather than gloss over the matter, comment as to the behaviour of gentoo-functions being unspecified in that event. Further, display a warning for each of those options found to be enabled at the time of sourcing functions.sh. It is worth noting that the behaviour of nounset can be selectively employed with the ${parameter:?} form of parameter expansion. Such is occasionally useful and does not require for library authors to acquiesce to the cult of the "unofficial strict mode". Signed-off-by: Kerin Millar <kfm@plushkava.net>
* Double quote a ${parameter+word} expansion in defer()Kerin Millar2024-08-221-1/+1
| | | | | | | | | | | Doing so protects against the following scenario. $ IFS=e word=1 $ set -x; test ${word+set} + test s t dash: 2: test: s: unexpected operator Signed-off-by: Kerin Millar <kfm@plushkava.net>
* Optimise trim() for bash where processing the positional parametersKerin Millar2024-08-221-5/+14
| | | | | | | | Render trim() faster in bash for cases where only the positional parameters are to be processed e.g. var=$(trim "$var") or var=${ trim "$var"; }. Signed-off-by: Kerin Millar <kfm@plushkava.net>
* Move an SC2317 exemption closer to where it is neededKerin Millar2024-08-221-1/+1
| | | | Signed-off-by: Kerin Millar <kfm@plushkava.net>
* Have whenceforth() work around a word splitting bug in OpenBSD shKerin Millar2024-08-221-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | Consider the case where IFS consists of a single character whose value is neither <space>, <tab> nor <newline>. The following example employs the colon, since it is the character that the whenceforth() function relies upon during word splitting. $ bash -c 'IFS=":"; path=":"; set -- $path; echo "$# ${1@Q}"' 1 '' The result is very much as expected because the colon in path serves as a terminator for an empty field. Now, let's consider how many fields are produced in OpenBSD sh as a consequence of word splitting. $ sh -c 'IFS=":"; path=":"; set -- $path; echo "$#"' 0 For the time being, work around it by having whenceforth() repeat the field terminator for the affected edge cases, which are two in number. With this change, the test suite is now able to pass for: - loksh 7.5 - oksh 7.5 - sh (OpenBSD 7.5) Signed-off-by: Kerin Millar <kfm@plushkava.net>
* Print a diagnostic message if no modules can be foundKerin Millar2024-08-221-1/+4
| | | | | | | | | The ability to locate and source the modules depends on the genfun_basedir variable being set correctly. In the case that no modules can be found, print a useful diagnostic message and ensure that the return value is non-zero. Signed-off-by: Kerin Millar <kfm@plushkava.net>
* Disable shellcheck SC2153Kerin Millar2024-08-221-1/+1
| | | | | | | | | | SC2153 is informational in nature and triggers only for environment variables (all uppercase variables) whose names are similar to others and for which no explicit assignment can be observed. In the case of gentoo-functions, it was being raised as a result of KSH_VERSION and YASH_VERSION being expanded. In other words, it is a nuisance. Signed-off-by: Kerin Millar <kfm@plushkava.net>
* Put braces around the expansion of the path variableKerin Millar2024-08-221-4/+4
| | | | | | In accordance with the Gentoo style. Signed-off-by: Kerin Millar <kfm@plushkava.net>
* Check for EPOCHREALTIME support in a safer mannerKerin Millar2024-08-221-1/+1
| | | | | | | | Given that the EPOCHREALTIME variable loses its special properties if unset, to compare two expansions of it to one another ought to be more robust. Signed-off-by: Kerin Millar <kfm@plushkava.net>
* Check for SRANDOM support in a safer mannerKerin Millar2024-08-211-1/+15
| | | | | | | | | | | | | Given that the SRANDOM variable loses its special properties if unset, to compare two expansions of it to one another ought to be more robust. Do so up to three times, so as not to be foiled by the unlikely event of the RNG repeating the same number. Further, the prior check was defective because it incorrectly presumed the minimum required version of bash to be 5.0 rather than 5.1. Fixes: 5ee035a364bea8d12bc8abfe769014e230a212a6 Signed-off-by: Kerin Millar <kfm@plushkava.net>
* Reduce the two non-bash srandom() implementations to just oneKerin Millar2024-08-211-41/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The implementation of srandom() that was written with mksh first and foremost in mind is no longer as slow as it was. I decided to benchmark 30,000 iterations of both of the non-bash implementations with varying maximal pool sizes. The results are beneath. Note that both "dash/1" and "mksh/1" refer to the mksh-targeting implementation. Pool Size dash/1 dash/2 mksh/1 48 B 6.67s 5.57s 58.84s 64 B 5.39s 4.78s 58.20s 96 B 5.49s 4.36s 58.13s 128 B 5.87s 4.63s 59.94s 160 B 5.93s 5.46s 64.64s These figures demonstrate that the optimal pool size is roughly between 64 and 96 bytes, and that the performance of both implementations is now comparable. In addition to testing Linux (6.6) on x86_64 hardware, I experimented with the pool size on macOS Sonoma (using an Apple M1 CPU) and found a value of 64 to be close to optimal. In view of these findings, have _collect_entropy() collect 64 bytes at a time and remove the marginally faster implementation. That is, the one that depended on being able to perform arithmetic on a number as high as 2^32-1 without overflowing. Additionally, increase the maximum number of times that the remaining implementation tries to find a suitable sequence of hex digits from 2 to 3. Finally, remove the overflow check, for it is no longer required. Signed-off-by: Kerin Millar <kfm@plushkava.net>
* Use an entropy pool in srandom(), even if the shell has forkedKerin Millar2024-08-211-21/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Presently, there are two srandom() implementations that do not require bash, one of which is intended for use with mksh and the other of which is intended for the various other implementations of sh(1). Both of these implementations are capable of maintaining an entropy pool, which markedly enhances performance for repeated invocations of the function. However, the pool cannot be effectively utilised in cases where the shell has forked. $ srandom # initialises the pool $ srandom # reads from the now-initialised pool $ ( srandom ) # may fork, rendering the pool rather ineffective $ ( srandom; srandom ) # ditto, despite the consecutive calls This commit addresses the discrepancy by keeping track of whether the pool has been populated on a per-PID basis. Consider the following benchmark, in which the loop is forced to execute within a subshell environment. ( i=0 while [ $((i+=1)) -le 30000 ]; do srandom; done >/dev/null /bin/true ) As conducted with mksh 59c on a system with a 2nd generation Intel Xeon, I obtained the following figures. BEFORE real 3m8.857s user 2m57.276s sys 0m59.511s AFTER real 1m24.047s user 1m6.435s sys 0m19.565s As conducted with dash on the same system, I obtained the following figures. BEFORE real 0m52.056s user 1m2.913s sys 0m18.143s AFTER real 0m12.887s user 0m12.521s sys 0m1.016s Signed-off-by: Kerin Millar <kfm@plushkava.net>
* Use an entropy pool for the mksh-targeting srandom() implementationKerin Millar2024-08-171-26/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The slowest of the the three srandom() implementations is presently selected for shells that overflow numbers at the 2^31 mark. A prominent shell which does so is mksh (even for LP64 architectures). Recently, one of the other srandom() implementations was accelerated by having the shell maintain its own entropy pool of up to 512 hex digits in size. Make it so that the mksh-targeting implementation employs a similar technique. Consider the following benchmark. i=0; while [ $((i += 1)) -le 30000 ]; do srandom; done >/dev/null As conducted with mksh 59c on a system with a 2nd generation Intel Xeon, I obtained the following figures. BEFORE real 0m56.414s user 0m47.043s sys 0m24.751s AFTER real 0m28.900s user 0m22.795s sys 0m6.802s Note that the performance increase cannot be applied in all situations. For further details regarding the constraints, refer to commit 866af9c. Signed-off-by: Kerin Millar <kfm@plushkava.net>
* Ensure that LC_ALL is exported in srandom(); be safer for macOSKerin Millar2024-08-171-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The slowest implementation of srandom() runs od(1) and awk(1) within a command substitution. There, both LC_ALL and LC_CTYPE are overridden but they should also be exported. For now, export LC_ALL=C exclusively, even though it overrides LC_MESSAGES, potentially affecting the user's preferred language for diagnostics. The reason for choosing this course of action is as follows. $ uname Darwin $ echo "$BASH_VERSION" 5.2.26(1)-release $ f() { nonexistent; }; $ ( export LC_ALL=; f ) objc[29971]: +[__SwiftNativeNSStringBase initialize] may have been in progress in another thread when fork() was called. objc[29971]: +[__SwiftNativeNSStringBase initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug. A fix for this is present in the devel branch: - https://git.savannah.gnu.org/cgit/bash.git/commit/?h=devel&id=b3d8c8a See, also: - https://trac.macports.org/ticket/68638 - https://lists.gnu.org/archive/html/bug-bash/2024-05/msg00088.html Of course, the fix hasn't been backported to an actual release. As such, I would prefer to play it safe for the time being. Signed-off-by: Kerin Millar <kfm@plushkava.net>
* Explicitly initialise a local variable in _update_pid()Kerin Millar2024-08-171-0/+1
| | | | | | | | I normally always do this for local variables that may immediately be checked for emptiness or non-emptiness, owing to the formally unspecified behaviour of the local command. Signed-off-by: Kerin Millar <kfm@plushkava.net>
* Abort sourcing for ksh93Kerin Millar2024-08-171-0/+7
| | | | | | | | | | In the case of ksh93, the commonly implemented behaviour of "local" can be approximated with "typeset". However, to use typeset in this way requires the use of the function f { ...; } syntax instead of the POSIX-compatible f() compound-command syntax. As things stand, there is no sense in allowing for functions.sh to be sourced by ksh93. Signed-off-by: Kerin Millar <kfm@plushkava.net>
* Abort sourcing for yash in posixlycorrect modeKerin Millar2024-08-161-8/+15
| | | | | | | | | | | | The yash shell takes conformance so seriously that it goes as far as to disable the local builtin in its posixlycorrect mode. https://magicant.github.io/yash/doc/posix.html $ yash -o posixlycorrect -c 'f() { local var; }; f' yash: local: non-portable built-in is not supported in the POSIXly-correct mode Signed-off-by: Kerin Millar <kfm@plushkava.net>
* Render the non-bash srandom() implementation fasterKerin Millar2024-08-111-3/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Presently, there are three implementations of srandom(), one of which is the preferred implementation for shells other than bash. It is a little on the slow side as it has to fork and execute both od(1) and tr(1) every time, just to read 4 bytes. Accelerate it by having the shell maintain its own entropy pool of up to 512 hex digits in size. Consider the following benchmark. i=0; while [ $((i += 1)) -le 30000 ]; do srandom; done >/dev/null As conducted with dash on a system with a 2nd generation Intel Xeon, I obtained the following figures. BEFORE real 0m49.878s use 1m1.985s sys 0m17.035s AFTER real 0m12.866s user 0m12.559s sys 0m0.962s It should be noted that the optimised routine will only be utilised in cases where the kernel is Linux and the shell has not forked itself. $ uname Linux $ srandom # uses the fast path $ number=$(srandom) # subshell; probably uses the slow path $ srandom | { read -r number; } # ditto Still, there are conceivable use cases for which this optimisation may prove useful. Below is an example in which it is known in advance that up to 100 random numbers are required, and where writing them to temporary storage is not considered to be a risk. i=0 tmpfile=${TMPDIR:-/tmp}/random-numbers.$$.$(srandom) while [ $((i += 1)) -le 100 ]; do srandom done > "$tmpfile" while read -r number; do do_something_with "$number" done < "$tmpfile" Signed-off-by: Kerin Millar <kfm@plushkava.net> Signed-off-by: Sam James <sam@gentoo.org>
* Remedy false positives in categories SC2034 and SC2154Kerin Millar2024-08-111-5/+0
| | | | | Signed-off-by: Kerin Millar <kfm@plushkava.net> Signed-off-by: Sam James <sam@gentoo.org>
* Exempt _should_throttle() from shellcheck SC2317Kerin Millar2024-08-111-0/+1
| | | | | | | | The _should_throttle() function gets the best of shellcheck, which incorrectly reports that there is unreachable code. Signed-off-by: Kerin Millar <kfm@plushkava.net> Signed-off-by: Sam James <sam@gentoo.org>
* Use the -nt and -ot test primaries again rather than depend on GNU findKerin Millar2024-08-111-64/+93
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As regards the test(1) utility, the POSIX.1-2024 specification defines the -nt and -ot primaries as standard features. Given that the specification in question was only recently published, this would not normally be an adequate reason for using them in gentoo-functions, in and as of itself. However, I was already aware that the these primaries are commonly implemented and have been so for years. So, I decided to evaluate a number of shells and see how things stand now. Here is a list of the ones that I tested: - ash (busybox 1.36.1) - dash 0.5.12 - bash 5.2.26 - ksh 93u+ - loksh 7.5 - mksh 59c - oksh 7.5 - sh (FreeBSD 14.1) - sh (NetBSD 10.0) - sh (OpenBSD 7.5) - yash 2.56.1 Of these, bash, ksh93, loksh, mksh, oksh, OpenBSD sh and yash appear to conform with the POSIX-1.2024 specification. The remaining four fail to conform in one particular respect, which is as follows. $ touch existent $ set -- existent nonexistent $ [ "$1" -nt "$2" ]; echo "$?" # should be 0 1 $ [ "$2" -ot "$1" ]; echo "$?" # should be 0 1 To address this, I discerned a reasonably straightforward workaround that involves testing both whether the file under consideration exists and whether the variable keeping track of the newest/oldest file has yet been assigned to. As far as I am concerned, the coverage is more than adequate for both primaries to be used by gentoo-functions. As such, this commit adjusts the following three functions so as to do exactly that. - is_older_than() - newest() - oldest() It also removes the following functions, since they are no longer used. - _find0() - _select_by_mtime() With this, GNU findutils is no longer a required runtime dependency. Of course, should a newly introduced feature of gentoo-functions benefit from the presence of findutils in the future, there is no reason that it cannot be brought back in that capacity. Signed-off-by: Kerin Millar <kfm@plushkava.net> Signed-off-by: Sam James <sam@gentoo.org>
* Render _update_time() a no-op for the yash shellKerin Millar2024-08-111-1/+2
| | | | | | | | | | | | | | | | | | When integer overflow occurs in a non-interactive yash shell, it prints "yash: arithmetic: overflow" as a diagnostic message before proceeding to exit. That makes it extremely difficult for the arithmetic in the _should_throttle() function to be implemented safely for it. For now, ensure that _update_time() does nothing for yash but return a non-zero status code. In turn, this disables the rate limiting feature for yash. Additionally, refrain from running test_update_time() and test_should_throttle() for yash in test-functions. The former would only amount to a waste of time and the latter would be guaranteed to fail. For the record, my testing was performed with yash 2.56.1. Signed-off-by: Kerin Millar <kfm@plushkava.net> Signed-off-by: Sam James <sam@gentoo.org>
* Handle integer overflow as a special case in _should_throttle()Kerin Millar2024-08-111-7/+15
| | | | | | | | | | | | | | | | At the point that the genfun_time variable overflows, guarantee that the should_throttle() function behaves as if no throttling should occur rather than proceed to perform arithmetic based on the result of deducting genfun_last_time from genfun_time. Further, guarantee that the should_throttle() function behaves as if no throttling should occur upon the very first occasion that it is called, provided that the call to update_time() succeeds. Finally, add a test case. Signed-off-by: Kerin Millar <kfm@plushkava.net> Signed-off-by: Sam James <sam@gentoo.org>
* Rename quote_args_bash() to _quote_args_bash()Kerin Millar2024-08-111-21/+24
| | | | | | | For it need not be in the public name space. Signed-off-by: Kerin Millar <kfm@plushkava.net> Signed-off-by: Sam James <sam@gentoo.org>
* Implement a variant of quote_args() optimised for bashKerin Millar2024-08-111-0/+26
| | | | | | | | | | | | | Add the quote_args_bash() function, which will be called from quote_args() under the appropriate circumstances. It is faster than the sh implementation, not merely because it takes advantage of the ${parameter@Q} form of parameter expansion, but also because executing external utilities exacts a greater performance toll for bash than it does for, say, dash. The difference is appreciable if running the test suite. Signed-off-by: Kerin Millar <kfm@plushkava.net> Signed-off-by: Sam James <sam@gentoo.org>
* Have srandom() employ an upper bound of 2^31-1Kerin Millar2024-08-111-5/+37
| | | | | | | | | | | | | | | | | | In the case of some shells - mksh, at least - the maximum value of an integer is 2147483647. Such is a consequence of implementing integers as signed int rather than signed long, even though doing so contravenes the specification. Reduce the output range of srandom() so as to be between 0 and 2147483647, rather than 0 and 4294967295. A change of this scope would normally justify incrementing GENFUN_API_LEVEL but I shall not do so on this occasion. My rationale is that >=gentoo-functions-1.7 has not yet had enough exposure for srandom() to be in use by other projects. Additionally, have test-functions test srandom() 10 times instead of 5. Signed-off-by: Kerin Millar <kfm@plushkava.net> Signed-off-by: Sam James <sam@gentoo.org>
* Avoid a subshell for is_identifier()Kerin Millar2024-08-111-4/+4
| | | | | | | Also, extend the coverage of the test suite a little further. Signed-off-by: Kerin Millar <kfm@plushkava.net> Signed-off-by: Sam James <sam@gentoo.org>
* Re-wrap a comment in get_nprocs()Kerin Millar2024-08-111-1/+2
| | | | | Signed-off-by: Kerin Millar <kfm@plushkava.net> Signed-off-by: Sam James <sam@gentoo.org>
* Document POSIXLY_CORRECT as an influential variableKerin Millar2024-08-111-0/+1
| | | | | Signed-off-by: Kerin Millar <kfm@plushkava.net> Signed-off-by: Sam James <sam@gentoo.org>
* Make _select_by_mtime() work correctly for paths read from STDINKerin Millar2024-08-111-1/+1
| | | | | | | | | | | | | | | The _select_by_mtime() function is called by both newest() and oldest(). Pathnames may be specified as positional parameters or as NUL-separated records to be read from the standard input. Unfortunately, the latter interface does not work at all. Rectify this by checking whether the number of parameters is greater then 0, rather than greater than or equal to 0. Also, extend the existing test case in such a way that the interface in question is tested. Signed-off-by: Kerin Millar <kfm@plushkava.net> Signed-off-by: Sam James <sam@gentoo.org>
* Ensure a radix character of U+2E in _update_time()Kerin Millar2024-08-051-2/+4
| | | | | | | | | | | | | | | | | | | | I overlooked that bash respects the radix character defined by the locale in the course of synthesizing the value of the EPOCHREALTIME value. Set LC_NUMERIC as C to guarantee that the radix character is considered as U+2E (FULL STOP) within the scope of the bash-specific function. Doing so also addresses a distinct issue whereby the invocation of printf was sensitive to the implied value of LC_NUMERIC. Another way to address this would have been to set LC_ALL as C. I decided not to because it would decrease the likelihood of the relevant diagnostic messages being rendered in the user's native language. Additionally, add a test case. Closes: https://bugs.gentoo.org/937376 Reported-by: Christian Bricart <christian@bricart.de> Signed-off-by: Kerin Millar <kfm@plushkava.net> Signed-off-by: Sam James <sam@gentoo.org>
* Add the assign() and deref() functionsKerin Millar2024-08-051-0/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These two functions are primarily intended to mitigate the appalling use of eval in projects such as netifrc and openrc. Consider the following code. net/iproute2.sh:29: eval netns="\$netns_${IFVAR}" This could instead be be written as: deref "netns_${IFVAR}" netns Alternatively, it could be written so as to use a command substitution: netns=$(deref "netns_${IFVAR}") Either method would protect against against illegal identifier names and code injection. Consider, also, the following code. net/iproute2.sh:185: eval "$x=$1" ; shift ;; This could instead be written as: assign "$x" "$1" As with deref, it would protect against illegal identifier names and code injection. Signed-off-by: Kerin Millar <kfm@plushkava.net>
* Alter a variable name in quote_args()Kerin Millar2024-08-031-2/+2
| | | | | | | Now that POSIX-1.2024 has been ratified, strictly_posix no longer makes sense as a variable name. Signed-off-by: Kerin Millar <kfm@plushkava.net>
* Have chdir() enforce POSIX interpretation 1047Kerin Millar2024-08-031-3/+10
| | | | | | | | | | | | POSIX-1.2024 (Issue 8) requires for the cd builtin to raise an error where given an empty directory operand. However, various implementations have yet to catch up. Given that it is a sensible change, let's have the chdir() function behave accordingly. Further, since doing so renders the test_chdir_noop test useless, get rid of it. The purpose that the test served is now subsumed by test_chdir. Closes: https://bugs.gentoo.org/937157 Signed-off-by: Kerin Millar <kfm@plushkava.net>
* Have hr() employ a divide-by-16 strategyKerin Millar2024-08-021-6/+6
| | | | | | | | | | | | | | | A factor of 16 was shown to be faster on average by timing how long it takes for bash to print a rule 5000 times for all lengths between 40 and 132, inclusive. Factor Time StdDev 8 87.004000 3.961607 16 82.893000 3.971257 Further, 16 remains a factor of 80, which is often the number of columns that a terminal emulator is initialised with. Signed-off-by: Kerin Millar <kfm@plushkava.net>
* Jettison the bash-specific hr() implementationKerin Millar2024-08-021-15/+9
| | | | | | | | | | | | | | | | | Testing the BASH variable for non-emptiness is an inadequate pretext for activating the bash-optimised code path. Instead, the test would have to be implemented like so ... if ! case ${BASH_COMPAT} in 3?|4[012]) false ;; esac && _has_bash 4 3 then ... fi Given that hr() is not expected to be called often, and that the sh code was already improved by employing a divide-by-8 strategy, I don't consider it to be worth the trouble. Signed-off-by: Kerin Millar <kfm@plushkava.net>
* Adhere to the Allman style for _select_by_mtime()Kerin Millar2024-08-021-1/+2
| | | | Signed-off-by: Kerin Millar <kfm@plushkava.net>
* Explain that get_nprocs() is called by parallel_run()Kerin Millar2024-08-021-4/+4
| | | | Signed-off-by: Kerin Millar <kfm@plushkava.net>
* Move is_subset() to experimentalKerin Millar2024-08-021-40/+0
| | | | | | | I'm not yet ready to commit to it being among the core functions for the inaugural API level. Signed-off-by: Kerin Millar <kfm@plushkava.net>
* Render hr() faster still for shells other than bashKerin Millar2024-08-021-0/+4
| | | | | | | Reduce the number of loop iterations by initially trying to append characters 8 at a time. Signed-off-by: Kerin Millar <kfm@plushkava.net>
* Render hr() fasterKerin Millar2024-08-021-14/+20
| | | | | | | | | | Render hr() faster by eliminating the requirement to fork and execute any external utilities after having established the intended length of the rule. Also, use printf -v and string-replacing parameter expansion where the shell is found to be bash. Doing so helps considerably because bash is very slow at looping. Signed-off-by: Kerin Millar <kfm@plushkava.net>
* Render contains_all() and contains_any() fasterKerin Millar2024-08-021-95/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Re-implement the contains_all() and contains_any() functions in such a way that they are faster than their forebears by an order of magnitude. In order to achieve this level of performance, the value of IFS is no longer taken into account. Instead, words are always presumed to be separated by characters matching the [[:space:]] character class. Consider a scenario in which the FEATURES variable is comprised of 33 words. $ FEATURES="assume-digests binpkg-docompress binpkg-dostrip binpkg-logs buildpkg buildpkg-live config-protect-if-modified distlocks ebuild-locks fixlafiles ipc-sandbox merge-sync merge-wait multilib-strict network-sandbox news parallel-fetch pid-sandbox pkgdir-index-trusted preserve-libs protect-owned qa-unresolved-soname-deps sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox usersync xattr" Let's say that the contains_any function is used to search for 10 words, where only the 10th can be matched and where FEATURES must be scanned in its entirety exactly 10 times. $ contains_any "$FEATURES" the quick brown fox jumped over the lazy hen xattr The following benchmarks show how long it took to call the function 50,000 times consecutively on a system with an Apple M1 CPU for both the original and new implementations. This is with the dash shell. contains_any (BEFORE) real 0m19.135s user 0m16.781s sys 0m2.258s contains_any (AFTER) real 0m1.571s user 0m1.497s sys 0m0.063s Now let's say that the contains_all function is used to search for 3 words, where all can be matched while requiring for FEATURES to be scanned in its entirety at least once. $ contains_all "$FEATURES" assume-digests news xattr Again, The following benchmarks show how long it took to call the function 50,000 times consecutively. contains_all (BEFORE) real 1m8.052s user 0m19.363s sys 0m42.742s contains_all (AFTER) real 0m0.689s user 0m0.627s sys 0m0.057s The performance improvements are similarly impressive if using bash. Signed-off-by: Kerin Millar <kfm@plushkava.net>
* Render quote_args() robust and implement a test caseKerin Millar2024-08-021-25/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Coerce the effective character set as being C (US-ASCII) in the course of executing awk(1). Some implementations are strict and will otherwise fail in situations where the bytes cannot be decoded. $ uname -o Darwin $ echo "$LC_ALL" en_GB.UTF-8 $ printf '\200' | awk '/[\001-\037\177-\377]/' awk: towc: multibyte conversion failure on: '' In the above case, awk aborts because it has a need to decode the input, which turns out not to be valid UTF-8. Now, it is rather beyond the purview of quote_args() to guarantee that its parameters adhere to any particular character encoding. Fortunately, for it to contend with strings on a byte-by-byte basis is acceptable. Refactor the code somewhat. The behaviour has been adjusted so to be virtually identical to that of the "${*@Q}" expansion in bash, with the exception that the ESC character is rendered as $'\e' instead of $'\E'. Such an exception is necessary for POSIX-1.2024 conformance, wherein dollar-single-quotes are now a standard feature (see section 2.2.4 of the Shell Command Language). Revise the comment preceding the function so as to accurately document its behaviour. Finally, add a test case. It works by calling quote_args for every possible single-byte string before calculating a CRC checksum for the cumulative output and comparing it against a pre-determined value. Signed-off-by: Kerin Millar <kfm@plushkava.net>
* Add a comment regarding POSIX XCU compatibilityKerin Millar2024-07-111-0/+5
| | | | Signed-off-by: Kerin Millar <kfm@plushkava.net>
* Mention that _find0() requires findutils >=4.9Kerin Millar2024-07-111-1/+2
| | | | Signed-off-by: Kerin Millar <kfm@plushkava.net>
* Replace "Issue 8" with "POSIX-1.2024"Kerin Millar2024-07-101-5/+5
| | | | | | The POSIX-1.2024 specification was published on 2024/06/14. Signed-off-by: Kerin Millar <kfm@plushkava.net>
* Mention that _SC_NPROCESSORS_ONLN is now standardKerin Millar2024-07-091-2/+2
| | | | | | https://austingroupbugs.net/view.php?id=339 Signed-off-by: Kerin Millar <kfm@plushkava.net>
* Have _update_time() measure in centisecondsKerin Millar2024-07-091-31/+22
| | | | | | | Doing so simplifies the case where /proc/uptime is read. Having one more digit's worth of accuracy is no bad thing either. Signed-off-by: Kerin Millar <kfm@plushkava.net>
* Initialise the genfun_bin_true variable lazilyKerin Millar2024-07-081-26/+32
| | | | | | | Also, require for true(1) to be executable in order for it to be deemed usable. Signed-off-by: Kerin Millar <kfm@plushkava.net>