PCL -> Clojure, Chapter 11

This article is part of a series describing a port of the samples from Practical Common Lisp (PCL) to Clojure. You will probably want to read the intro first.

This article covers Chapter 11, Collections.

Sequence basics

PCL describes a group of basic collection functions: count, find, position, remove, and substitute. Clojure supports count for a variety of list-like types:

  user=> (count (quote (1 2 3)))
  3
  user=> (count [1 2 3])
  3
  user=> (count #{1 2 3})
  3
  user=> (count "characters")
  10                          

These types, and any others than implement a basic first/rest protocol, are called sequences in Clojure. A sequence is logically a list, but may be implemented using other data structures.

In addition to generic sequence functions, some sequences have specific functions unique to their underlying data structure. Clojure defines find for maps to return the matching key/value pair:

  user=> (find {:lname "Doe", :fname "John"} :fname)                   
  [:fname "John"]

Or, you could just place the map itself in function position, and get back the matching value for a key:

  user=> ({:lname "Doe", :fname "John"} :fname)
  "John"

The Clojure core does not define find for other collection types. But the implementation is a one-liner using some. For example, to ask if a collection contains the number 2:

  user=> (some #(= % 2) [1 2 3])
  true

Clojure-contrib wraps the some idiom into a function named includes?.

The rest of the "basic" functions have similar stories: The Clojure core tends to support them directly where they are efficient (constant time) operations. Where they would take longer (e.g. linear time), the operations can be written as one-liners atop higher-order functions.

Higher-order functions

CL includes higher order versions of the basic functions described above. These higher-order versions take an additional parameter, which is a function that acts as a filter. Here are some examples.

First, a collection of days for the examples to work against:

  ; for re-split
  (use 'clojure.contrib.str-utils)
  (def days (re-split #" " "Sun Mon Tues Wed Thurs Fri Sat"))

Now I can find the weekdays that start with "S":

  user=> (filter #(.startsWith % "S") days)
  ("Sun" "Sat")

Or simply count the days that start with "S":

  user=> (count (filter #(.startsWith % "S") days)) 
  2

In an immutable world, remove is the opposite of find. I can get a collection with all "S" days removed by reversing the previous filter with complement:

  user=> (filter (complement #(.startsWith % "S")) days)
  ("Mon" "Tues" "Wed" "Thurs" "Fri")

To replace all "S" days with "Weekend!" I can use map:

  user=> (map #(if (.startsWith % "S") "Weekend!" %) days)
  ("Weekend!" "Mon" "Tues" "Wed" "Thurs" "Fri" "Weekend!")

Sorting

Sorting is easy:

  user=> (sort days)
  ("Fri" "Mon" "Sat" "Sun" "Thurs" "Tues" "Wed")

Sorting by criteria is also easy:

  user=> (sort-by #(.length %) days)
  ("Sun" "Mon" "Wed" "Fri" "Sat" "Tues" "Thurs")

Combining sequences

The concat function concatenates sequences.

  user=> (concat [1 2 3] [4 5 6])
  (1 2 3 4 5 6)

Note that the resulting sequence is lazy. So, concat can return without walking each input sequence. In other words, the (take 5 ...) below does not have to wait (forever!) for all the powers of 2 to be generated:

  user=> (take 5 (concat (quote (1/4 1/2)) powers-of-2))
  (1/4 1/2 1 2 4)

What if one of the sequences passed to concat blows up instead of returning a sequence?

  user=> (take 2 (concat '(1 2 3) (throw (Error. "Not a sequence"))))
  java.lang.Error: Not a sequence

Here concat fails because its second argument is not a sequence. As it happens, I have an even lazier option than concat. The lazy-cat function does not even look at each argument until it is forced to do so:

  user=> (take 2 (lazy-cat '(1 2 3) (throw (Error. "Not a sequence"))))
  (1 2)

Lazy sequences have many uses, but take some getting used to. One mistake to avoid is trying to inspect a lazy infinite sequence from the REPL. The REPL tries to print the entire sequence, which will take forever (literally). Hence the (take 2 ...) wrappers above.

Subsequences

It is often interesting to take subsequences from the beginning, middle, or end of a collection. Clojure supports this in a general way with take and drop. You have already seen take, which returns the first part of a collection:

  user=> (take 2 days)
  ("Sun" "Mon")                                      

For the end of a collection, I can use drop:

  user=> (drop 2 days)
  ("Tues" "Wed" "Thurs" "Fri" "Sat")

For the middle of a collection, I can use take and drop together:

  user=> (take 5 (drop 1 days))
  ("Mon" "Tues" "Wed" "Thurs" "Fri")

The take-nth function takes only every nth item of a collection. To demonstrate take-nth, I will begin by defining a lazy collection of the natural-numbers:

  (def natural-numbers (iterate inc 1))

The call to iterate produces a collection that starts with 1 and generates subsequent members by calling inc. You can verify that these are the natural numbers by taking a few of them.

  user=> (take 10 natural-numbers)
  (1 2 3 4 5 6 7 8 9 10)

Now I can write an intuitive definition for the even and odd numbers in terms of the natural numbers:

  (def odd-numbers (take-nth 2 natural-numbers))
  (def even-numbers (take-nth 2 (drop 1 natural-numbers)))

Predicates

Clojure provides a number of functions that test boolean predicates, including every?, not-any?, and not-every?, and some. Here are a few examples, using the days collection defined above.

Does every day start with "S"?

  user=>(every? #(.startsWith % "S") days)
  false

Is there some day that starts with "M"?

  user=>(some #(.startsWith % "M") days)
  true

Map and reduce

map take a function and one or more sequences. It returns a new sequence which is the result of applying the function to the item(s) in each sequence. So, to take the product of numbers from two sequences:

  user=> (map * '(1 2 3 4 5) '(10 9 8 7 6))
  (10 18 24 28 30)

If I want to control the type of collection returned, I can use into:

  user=> (into [] (map * '(1 2 3 4 5) '(10 9 8 7 6)))
  [10 18 24 28 30]

reduce walks down a collection, applying function f of two arguments to the first two arguments, then applying f to the result of the first call and the next element. This is very useful for operations that process a sequence and return a single value. For example, I can sum a sequence:

  user=> (reduce + [1 2 3 4 5])
  15

Or find the max value of a sequence:

  user=> (reduce max [1 2 3 4 5])
  5

Maps

Maps (hash tables in CL) can be iterated just like any other sequence type, bearing in mind that the function you pass in should expect a key/value pair. Given the following map of names to scores:

  user=> (def scores {:john 18, :jane 21, :jim 14})                   7
  #'user/scores

I can find all the people who scored above 15:

  user=> (filter (fn [[k,v]] (> v 15)) scores)
  ([:jane 21] [:john 18])

Notice how the destructuring bind ([[k,v]]) makes it easy to bind k and v separately, without introducing a temporary variable pair that I don't really need.

Wrapping up

Lisp excels at processing lists. Clojure offers similar capabilities, but generalized to sequences, which can be lists, vectors, maps, sets, or other list-like collections.

Clojure's support for lazy collections allows a different style for collection processing that I will continue to explore in later articles in this series.


Notes

Revision history

  • 2008/09/22: initial version
  • 2008/10/15: removed var-quoting per Douglas's comment. Thanks.
  • 2008/12/09: fixed filter erratum. Thanks Dean Ferreyra.