Bash Script Examples - Processing Lists of Files

not-a-bird (58)in #bash • 7 years ago (edited)

A flexible technique for getting a list of input files and simple, but useful, manipulations on file names are discussed here.

Sorry for the weirdly worded preceding sentence, I'm still trying to figure out how to get a useful summary to show up on the post list. Anyway, on to the article!

I do a lot of shell scripting. I mean a lot. And a very common thing I need to do in a lot of different scripts is some kind of processing over a list of filef. So I'm going to cover two different ways to generate a list of files, one with a while loop and the other with globbing in a for loop. The while loop has a couple of different variants, one of which has side effects to watch out for. The task that I'll complete with this scriptery will be creating some thumbnails for some input images.

The basic loop

I'm going to explain the basic loop ahead of time because it looks confusing for beginners.

while read FILE ; do
    # processing work on "${FILE}" goes here.
done < <(find folders/ -type f)

This creates a while loop that reads the output from a find command. If you recall from my previous tutorial the while loop is of the general format while condition ; do body ; done.

In the case of the loop above, the condition is the read command. So as long as the read command is able to read input it will return zero (the successful exit code) and the body between the do and the done will execute.

The read command is a bash built-in that reads from standard in and saves whatever it reads into whatever variable name it is given. So this read command will take the contents from standard in and save them in a variable called file.

The body of this loop is a comment that's a place holder for real work. In reality, running this command would explode because there needs to be more than a comment for the body.

Then a whole lot of magic is happening after the done keyword.

This <(command) is the format for running a command and then making the output of that command available as a file descriptor. Basically the output of the command is able to be treated like a file.

The <file is a way to the content from a file and feed it as input on standard in to the command directly left of it.

The find folders/ -type f is a find command that will locate all files (type f means file, as opposed to directory, for example) that exist under the folders directory.

So, in reverse order, because it makes more sense:

The find folders/ -type f command locates a list of input files.
The <(command) wrapper around the find command converts that find command output to a file descriptor.
The < feeds the file descriptor just like you would an input file name to the while loop.
The read command reads that "file" containing file names one line at a time.
The read command stores the file name it just got in a variable called FILE.
The body would be able to do something meaningful with that FILE, but this example doesn't show any meaningful work.
When the find command exits, it will return an end-of-file and the read command will fail to read more content, so it will exit with non-zero.
When the read command returns a non-zero exit status, the loop will end.

Wow, does that ever sound complicated! But it's actually pretty straightforward. Once you understand all the pieces.

A better basic loop

There's a tiny flaw in that recipe, and that's when the script might need to prompt the user for some sort of input while it's running. Or, if an inner loop needs to run to process some additional input before the outer loop continues. In order to keep the input from the loops straight, an additional layer of redirection is needed.

while read -u 3 FILE ; do
    # processing work on "${FILE}" goes here.
done 3< <(find folders/ -type f)

This is very much like the previous loop, except it will be able to make use of another read inside the loop's body, or even make use of another while loop.

Here's how it works:

The read command takes an argument of a numbered file descriptor, but it defaults to reading standard input. Here, I've arbitrarily chosen file descriptor 3.
The < redirection operator can also take an argument, but it also defaults to standard input. Instead of writing the file on its right to standard input, it will write it to file descriptor 3. That way when the loop runs the read command, it will get the output from the provided find command.

So, all together:

The find folders/ -type f command locates a list of input files.
The <(command) wrapper around the find command converts that find command output to a file descriptor.
The 3< feeds the file descriptor just like you would an input file name to the while loop using file descriptor 3, NOT standard input.
The read command reads that file descriptor one line at a time.
The read command stores the file name it just got in a variable called FILE.
The body would be able to do something meaningful with that FILE, but this example doesn't show any meaningful work.
When the find command exits, it will return an end-of-file and the read command will fail to read more content, so it will exit with non-zero.
When the read command returns a non-zero exit status, the loop will end.

Let's do some work

Okay, let's put the loop to work. We'll use a find command to locate every file in a top level images directory and we'll generate a thumbnail that can live right next to that original input file.

Image Magick

Generically, this is how you create a 128x128 pixel thumbnail of an input PNG image using ImageMagick's convert command:

    convert input.png -scale 128x128 output_thumbnail.png

Okay, I lied a little bit. Image Magick will actually preserve the aspect ratio of the image by default, so if it wont scale to 128x128, then it will scale the width to 128 and keep the height as a multiple of the original input dimensions so that the image still "looks good".

Filename Conversions

In order to generate an output file name, we will need to do some magic on the input file name. We ultimately want to take input.png and turn it into input_scaled.png.

To do this, we are going to use something called Parameter Expansion. See the man page for BASH and search for Parameter Expansion, it's awesome. Here's the tiny piece of it that we will be using.

You can use these to strip off a prefix pattern:

"${parameter##pattern}"
"${parameter#pattern}"

The first one will greedily strip off a prefix and the second one will, er, not greedily strip off the pattern.

For example, assuming the file name is something like this:

FILE=/foo/bar/baz.quux

Then we can calculate the file part of the name like this:

echo ${FILE##*/}
baz.quux

But if we only used this

echo ${FILE#*/}
foo/bar/baz.quux

See the difference? The greedy ## approach stripped off as much of the file name as it could match to the pattern. The non-greedy # only stripped off the very minimum needed to satisfy the match.

We wont be using it like that, that was just to demonstrate greedy vs non-greedy. We will use it, instead, to get the extension, for example:

echo ${FILE##*.}
quux

That will give us the extension, but we will want to strip off the extension from the original name in order to build a new file name, for that we use this form:

"${parameter#pattern}"

The will non-greedily strip off a suffix. It has a greedy form using %% but we wont need it. You can probably guess how the greedy form works.

For example:

FILE=baz.quux
echo ${FILE%.*}

And the result:

baz

So, to convert our file names, we will need to do something like the following:

FULL_NAME=/foo/bar/baz.png
FILE=${FULL_NAME%.*}
SUFFIX=${FILE##*.}
convert -scale 128x128 "${FULL_NAME}" "${FILE}_scaled.${SUFFIX}"

The resulting convert command will look like this:

convert -scale 128x128 "/foo/bar/baz.png" "$/foo/bar/baz_scaled.png"

Putting it together

Okay, using the above loop and the conversion logic above, here's a script that will create thumbnails of every image in a given directory.

Here's the body to what I'll refer to as thumbnailer.sh:

    #!/bin/bash
    # Feed this script a directory name and it will create thumbnails of all images
    # found within.
    TARGET=$(readlink -f ${1}) #the directory to search is the only argument to the script

    while read -u 3 FULL_NAME ; do
        file "${FULL_NAME}" | grep -i image || continue
        FILE=${FULL_NAME%.*}
        SUFFIX=${FILE##*.}
        convert -scale 128x128 "${FULL_NAME}" "${FILE}_scaled.${SUFFIX}" &&
        echo "Created thumnbail for ${FULL_NAME} in "{FILE}_scaled.${SUFFIX}" ||
        echo "Failed to create thumbnail for ${FULL_NAME} in "{FILE}_scaled.${SUFFIX}!"
    done 3< <(find "${TARGET}" -type f)

I apologize for any seemingly attempted hood-winking, I had to add just a tiny bit of error checking and argument parsing for this to be an actual script.

The TARGET=$(readlink -f ${1}) will execute readlink on the provided script argument so that it can get a full path to the input directory, that way we don't have to sanitize a name like . or ./ in the call to convert later. I realize this is confusing, but just think of it as a way to convert this:

~/thumbnailer.sh .

Into this:

~/thumbnailer.sh /home/not-a-bird/example/images/

Without actually making the user type it that way.

The "TARGET=${1}" will assign any value passed to the script to the variable TARGET and later on the find will be done on ${TARGET}.
Then, to avoid processing non images, I used the file command to see if the file in question looks like an image before calling convert on it.

        file "${FULL_NAME}" | grep -i image || continue

The file command examines the contents of the candidate image file and reports it type on standard out. The grep command checks to see if the word image appears in that file type, and if it does not the || will then call continue which will skip the current file and move on to the next (hopefully) actual image file.

After the convert command is called the script will do one of two things, on success it will display that it created a thumbnail and display the name, if it fails, it will display that it failed along with the names.

Pitfalls

It is very tempting to write that while loop with a pipe instead of the mixed up redirection going on, but this can cause grief for a lot of beginners.

Another way to write the loop would be to do something like this:

find folders/ | while read ...

And this approach would work just fine in the case of the above loop because once the loop exits, none of the variables that were used need to be accessed again. But what if we had kept a counter running so we could see how many images had been converted?

The correct way would look something like this:

TARGET=$(readlink -f ${1}) #use the directory to search is the only argument to the script
COUNT=0

while read FULL_NAME ; do
    file "${FULL_NAME}" | grep -i image || continue
    FILE=${FULL_NAME%.*}
    SUFFIX=${FILE##*.}
    set -x
    convert -scale 128x128 "${FULL_NAME}" "${FILE}_scaled.${SUFFIX}" &&
    echo "Created thumnbail for ${FULL_NAME} in "{FILE}_scaled.${SUFFIX}" ||
    echo "Failed to create thumbnail for ${FULL_NAME} in "{FILE}_scaled.${SUFFIX}!"
    COUNT=$((COUNT+1))
done < <(find "${TARGET}" -type f)
echo "Converted $COUNT"

And if run on a directory with 3 images in it, it would provide the following output:

Converted 3.

The easier to read, but incorrect way:

TARGET=$(readlink -f ${1}) #use the directory to search is the only argument to the script
COUNT=0

find "${TARGET}" -type f | while read -u 3 FULL_NAME ; do
    file "${FULL_NAME}" | grep -i image || continue
    FILE=${FULL_NAME%.*}
    SUFFIX=${FILE##*.}
    set -x
    convert -scale 128x128 "${FULL_NAME}" "${FILE}_scaled.${SUFFIX}" &&
    echo "Created thumnbail for ${FULL_NAME} in "{FILE}_scaled.${SUFFIX}" ||
    echo "Failed to create thumbnail for ${FULL_NAME} in "{FILE}_scaled.${SUFFIX}!"
    COUNT=$((COUNT+1))
done

The output here would be:

Converted 0.

The reason this will always report 0, no matter how many it does process, is that the while loop is executing in a subshell. Any variables created or assigned values there will never exist in the shell where the parent script is executed. Sometimes this behavior is desired, but it certainly isn't when you want to share data between the inside of the loop and outside of it.

The reason I mention this is that it's extremely natural when you get going to just chain together a series of commands using a pipe. But in this case, there are side effects.

Globbing

The contrived example here would just as effectively be accomplished using a for loop and globbing file names. The most common use would actually just be to create thumbnails in the current directory using just a particular file extension, for example, PNG. So here's the dirt simply, more likely scenario that takes all the PNG files in the current directory and makes a scaled version:

    for F in *.png ; do
        convert -scale 128x128 "${F}" "${F%.*}_scaled.${F##*.}"
    done

No confusing find in any crazy pipe, this just converts the PNG files in the current directory to thumbnails.

Summary

I packed a lot into what was originally going to be a really simple article, but I couldn't help it. I covered the following topics:

while loops featuring read
piping the output of a find through a while loop
using an alternate file descriptor so that while loops can be nested
basic invocation of image magick's convert
parameter expansion
globbing for the dirt simple case

Sources: my brain. That's how frequently I make use of a while loop processing the content of a find or other command. Granted, in this contrived situation, globbing probably would have been an acceptable approach.

Image source is Pixabay

#linux #tutorial #programming

7 years ago in #bash by not-a-bird (58)

Sort:

not-a-bird (58) 7 years ago

After seeing a discussion in steemit.chat I wanted to go back through and add some sources to my old posts. Since this one is too old to edit, I'm leaving them here in a comment.

The man pages for Bash and all of the other commands I've used above appear to cover 100% the stuff that I put in. If you're not on a Linux or *nix system where you can view man pages, here are the links to some online:

Bash This does a decent job of explaining all of the built-in commands that I used (read, echo, while loops, for loops, parameter expansion). Some of the examples are pretty light, though. So I suggest googling some of the terms you might read in there to see examples on Stackoverflow.
convert The convert man page does an okay job, but you'll probably want to check out the ImageMagick site for examples.
readlink For discovering the path to a thing, I used it to get the absolute path to the script that was being run.
find While technically this explains the command, most mortals will want to look for examples on Stackoverflow.

$0.69

2 votes

[-]

nristen (43) 7 years ago

Thank you for this post. Good content. Steem needs more posts like this... technical and informative. I do a lot of bash scripting so it is good to see content like this.

$0.00

1 vote

[-]

not-a-bird (58) 7 years ago

Thank you for reading!

I was hoping to find more content like this on here, but it's been slow going, mostly due to a lack of time on my part, I'm sure there's lots of this on here, somewhere. Maybe by posting some of the kinds of stuff I'd like to see I can improve my chances of stumbling over a whole community of it.

$0.00

1 vote

STEEM 0.16

TRX 0.15

JST 0.029

BTC 56949.15

ETH 2401.26

USDT 1.00

SBD 2.33

Bash Script Examples - Processing Lists of FilessteemCreated with Sketch.