I didn’t use Python’s yield statement until I’d been programming for a few years. Within a few days of learning it, however, I could’t live without it. I find Python generators and list/dict comprehensions one of the key things to making Python code compact and readable.
I was pretty bummed to learn that Common Lisp does not yield to this kind of program structure. The clever pun not-withstanding, I was happy to learn that common lisp has a different paradigm: a map-like one.
Consider the following straightforward problem: I have a text file and am interested in the string of characters (called a “read”) present on every fourth line. For each such read I’m interested in getting each K character long section (called a “kmer”) (bioinformatics savvy readers will recognize the file as a FASTQ file, and the K character long sections as k-mers from the reads in the FASTQ.
In Python there’s a very elegant way to abstract this extraction of data from the FASTQ file using generators (which is what yield allows us to create)
def read_fastq(fname): with open(fname, 'r') as fp: for n, ln in enumerate(fp): if (n - 1) % 4 == 0: yield ln.strip() def kmers_from_seq(seq, kmer_size): assert len(seq) > kmer_size for n in range(len(seq) - kmer_size + 1): yield seq[n:n + kmer_size] def kmers_from_fastq(fname, kmer_size): for seq in read_fastq(fname): for kmer in kmers_from_seq(seq, kmer_size): yield kmer
Note how nonchalantly I nest the two generators which allows me to do:
for kmer in kmers_from_fastq(fastq_fname, kmer_size): print(kmer)
Update: My first run through is now appended to the end of the main article. Rainer Joswig saw my original post and showed me the Lisp way to do this, which turns out to be to use a nested map like pattern. Many thanks to Rainer! My only reservation was the medium of instruction – a series of tweets. It might work for the 45th president, but I personally prefer the more civilized medium of post comments, which creates a more coherent record.
Rainer’s example code is here. I’ve minorly changed the code below:
(defun map-to-seq-in-fastq (in fn) "Apply fn to every second line of the FASTQ file (seq line)" (flet ((read-seq-line (in) (prog2 (read-line in nil) (read-line in nil) (read-line in nil) (read-line in nil)))) (loop for seq = (read-seq-line in) while seq do (funcall fn seq)))) (defun map-to-kmer-in-seq (seq kmer-size fn) "Apply fn to all kmers of length kmer-size in seq." (let ((len (length seq))) (loop for pos from 0 for end = (+ pos kmer-size) while (<= end len) do (funcall fn (subseq seq pos end))))) (defun map-to-kmer-in-fastq (in kmer-size fn) (map-to-seq-in-fastq in (lambda (seq) (map-to-kmer-in-seq seq kmer-size fn))))
You can see what is happening. Computation functions are being passed in, Russian doll-like, into a nest of other functions that loop over some source of data. I get it, but it’s still not as neat looking as Python.
Appendix: my first run through using closures.
Well, I can’t do things like this in Common Lisp. Not easily and not with built-ins.
I’m pretty surprised at this. This has got to be the first idiom I know from Python that I can not replicate in CL. The closest I could get was via a closure:
;; generator (closure) based on https://www.cs.northwestern.edu/academics/courses/325/readings/graham/generators.html (defun fastq-reader (fname) (let ((in (open fname))) #'(lambda () (prog2 ; <-- prog1 and prog2 are nifty! (read-line in nil) (read-line in nil) ; <-- this is the sequence line (read-line in nil) (read-line in nil))))) ; http://www.gigamonkeys.com/book/files-and-file-io.html using nil for return value
This can be called using the loop macro as follows
;; loop based on http://www.gigamonkeys.com/book/files-and-file-io.html (defparameter gen (fastq-reader "../../DATA/sm_r1.fq")) (loop for seq = (funcall gen) ; <-- Note this while seq do (format t "~A~%" seq))
Similarly, I can write the kmer extraction as a closure
(defun kmer-gen (seq &key (kmer-size 30) (discard-N? nil)) (let ((n 0) (n-max (- (length seq) kmer-size))) #'(lambda () (prog1 (if (< n n-max) (subseq seq n (+ n kmer-size)) nil) (incf n)))))
And combine the two closures as:
(defparameter gen (fastq-reader "../../DATA/sm_r1.fq")) (loop for seq = (funcall gen) while seq do (let ((g (kmer-gen seq :kmer-size 10))) (loop for s = (funcall g) while s do (format t "~A~%" s))))
It’s not terrible as the code still looks compact and readable but it’s not as nicely abstracted away as the nested Python generator.
(A complete non sequitor: Remember how I had issues with [square brackets] and (parentheses) for array indexing when I moved from MATLAB to Python? Well now I have this issue with prefix and infix notation. I keep trying to write (print kmer) and print(kmer) is starting to look plain weird …)