Thank you Mark and Gordon. Since the hostnames I needed to collect are in the same field, at least in the lines of the file that are important. I ended up using suggestions from both of you, the code is like this now. The egrep is there to make sure whatever is in the 9th field looks like a domain name. for host in $(awk '{ print $9 }' ${TMPDIR}/* | egrep "[-\.0-9a-z][-\.0-9a-z]*.com" | sort -u); do HOSTS+=("$host") done Original script: real 28m11.488s user 26m57.043s sys 0m30.634s Using awk instead of grepping the entire batch: real 6m14.949s user 5m0.629s sys 0m26.914s Using awk and with export LANG=C real 2m50.611s user 1m20.849s sys 0m27.366s Awesome, thanks for the tips! > For one, do the sort in one step: sort -u. For another, are the hostnames > always the same field? For example, if they're all /var/log/messages, I'd > do awk '{print $4;}' | sort -u > You have two major performance problems in this script. First, UTF-8 > processing is slow. Second, wildcards are EXTREMELY SLOW! > You'll get a HUGE performance boost from prefixing your search with some > known prefix to your regex.