Zipping arrays in Perl

On reducing implementation complexity with zip function


Years ago, back when I was learning Python, I came across a very handy built in function in that language, the zip function. It quickly has become one of my favourite features of Python, as it can greatly reduce the amount of work needed to manipulate a couple of arrays at once, one pair at a time:

first_list = [2, 3]
second_list = ['a', 'b', 'c']

for item in zip(first_list, second_list):
        print(item)

# prints:
# (2, 'a')
# (3, 'b')

Without it, you basically need to manipulate the keys of the arrays manually, take all array values for a given key and only then you can start your array manipulation. That's quite a lot of boilerplate, and most importantly it reduces the readability of your code.

zip in Perl

How about Perl? We have no zip built in here, but it existed for years in downloadable List::MoreUtils module. Unfortunately, it has little use in actual loops, as it creates a flat list of zipped array elements, like this: 2, 'a', 3, 'b', undef, 'c' - and yes, it also inserts undefs when there's no value, for better or for worse. If we want something we can easily loop over, the same module exports the zip_unflatten function, with the slightly shorter zip6 alias.

However, as discussed lately in the community, array manipulation functions are scattered all across CPAN, in different forms across the modules. Getting a whole module to get just a single helper is a bit of an overkill. Thankfully, List::Util, which is a core module, has added the zip function in the latest release, and has the same output as zip_unflatten function. The release date was 30th of March 2021, and sadly it did not make its way to be bundled with version 5.34.0 of perl. Chances are it will be included in the next stable release, which should be next year according to the schedule.

Here's the above Python example translated to Perl with the latest List::Util installed. And yes, it inserts the undefined values when the arrays differ in length, which can be changed by using the zip_shortest function instead:

use strict;
use warnings;

use List::Util qw(zip);
use Data::Dumper;

my @first_list = (2, 3);
my @second_list = ('a', 'b', 'c');

for my $item (zip \@first_list, \@second_list) {
        print Dumper($item);
}

# prints:
# $VAR1 = [
#           2,
#           'a'
#         ];
# $VAR1 = [
#           3,
#           'b'
#         ];
# $VAR1 = [
#           undef,
#           'c'
#         ];

How is it useful?

To answer this question, we need a fitting example. Lets look at the current Perl Weekly Challenge, task #1: String Chain.

You are given an array of strings.
Write a script to find out if the given strings can be chained to form a circle. Print 1 if found otherwise 0.

A string $S can be put before another string $T in circle if the last character of $S is same as first character of $T.

Examples:

Input: @S = ("abc", "dea", "cd")
Output: 1 as we can form circle e.g. "abc", "cd", "dea".

Input: @S = ("ade", "cbd", "fgh")
Output: 0 as we can't form circle.

This task will require heavy and inefficient calculations by looping through all the possible permutations of @S, unless you find some clever way to immediately narrow down the set of all possible permutations or (even more cleverly) solve it without any permutations, just by looking at input data. It really resembles the travelling salesman problem, where you have to find the shortest way from city to city in order to visit them all in the most efficient route - such tasks have very high computational complexity. This one however, can most likely be solved (or at least narrowed down a lot) just by comparing the count of each letter starting a word with the count of each letter ending a word. But anyway, I'm just going to permute everything with Algorithm::Permute, so that I can show a nice case of zip function being very useful :)

Our iterator code will look like this:

sub check_string_chain
{
        my @string_list = @_;

        my $iterator = Algorithm::Permute->new(\@string_list);

        while (my @case = $iterator->next) {
                ... # will implement that in just a minute!
        }

        # we found nothing
        return 0;
}

We can also immediately implement our checking logic, which is quite easy in this case:

sub comes_after
{
        my ($previous, $next) = @_;

        return substr($previous, -1) eq substr($next, 0, 1);
}

It is now time to implement the missing '...'. For every @case, I want to check if every element of that array returns true when passed to comes_after function together with the element of that array that follows it (or the first element, if it is the last one). Some people might immediately start implementing a C-like for loop, which will allow us to take index $i of our array, then index $i + 1, and pass their values to comes_after. But not me, I'm too lazy for that. Instead, I'll just do this:

# we need those functions!
use List::Util qw(all zip);

my @to_compare = @case;
push @to_compare, shift @to_compare;

return 1 if all {
        comes_after $_->@*
} zip \@case, \@to_compare;

That's quite dense, lets work that out together:

  • @to_compare is basically the same as our @case, but we use shift and push on it, which makes its first element go to the end of the array.
  • we zip the two arrays, @case comes first, and then @to_compare. This creates an array of pairs, where the first element of the pair is $i-th element, while the second element of the array is $i + 1-th element, or the first element in the very last pair.
  • with that set up, we can loop through it with a fitting reducer function. all will make sure that all pairs for this case meet our definition of being valid, coded in comes_after function.

This is all we need for that solution to work. It is inefficient as hell, but it works! Well, at least for cases where we only get a couple elements to check, as its computational complexity will quickly kick in, making more than 10 elements literally take ages to finish.

So how did zip help us really? Without it, we would have to manipulate array indexes by hand, and I believe this needs to be avoided whenever possible. By calculating the indexes yourself you risk making errors like getting out of bounds or miscalculating. It also requires more effort to set it up, and the result can not be read as easily - although that depends on the style you're used to. Once Perl upgrades its core List::Util to 1.56, we will be able to use zip without the need to download anything, which I think is another one small step in the right direction.


Comments? Suggestions? Send to bbrtj.pro@gmail.com
Published on 2021-06-03