Strings

One of a series of tutorials about Scheme in general
and the Wraith Scheme interpreter in particular.

Copyright © 2011 Jay Reynolds Freeman, all rights reserved.
Personal Web Site: http://JayReynoldsFreeman.com
EMail: Jay_Reynolds_Freeman@mac.com.

Strings work rather like a special kind of vector whose only purpose is to contain text characters. The standard vector operations do not work on strings: They are a different kind of Scheme object from vectors, so Wraith Scheme will report an error if you try to use a string as an argument to a procedure that expects a vector. There are also lots of special procedures for strings, to perform common operations involving text characters. Yet the basic idea, that a string is a container for a bunch of Scheme objects (that all happen to be characters), is just the same as for vectors, and many of the procedures for creating strings and for getting at their content work exactly like their vector counterparts.

Strings of course print differently from vectors.

(define my-string "This is a string.")    ;; ==> my-string
my-string                                 ;; ==> "This is a string."

Incidentally, Scheme has two procedures for printing things out. One is "write" and the other is "display". Both are called for side effects -- the printing -- so they both return only #t. (Recall that Wraith Scheme generally returns #t from procedures that are called only for side effects; R5 does not define a value to be returned by such procedures, so other Scheme implementations may do something else.) Procedures "write" and "display" may be applied to any kind of Scheme object, and in most cases they do the same thing, but they differ in how they handle strings. Look closely to see the difference.

(write my-string)      ;; ==> "This is a string."#t
(display my-string)    ;; ==> This is a string.#t

Each procedure did some printing, and then the Wraith Scheme interpreter itself printed out the #t that was the procedure's return value, but "write" printed the string with quotation marks around it, whereas "display" did not. The difference is important in two ways.

First, if you are writing a Scheme program that is supposed to print out nice-looking text for a user to read, you would probably use the "display" procedure. That way, the different strings that you print out would not each be surrounded by quotation marks. A paragraph of text would look pretty strange if every sentence in it -- or every word -- were contained within quotation marks.

On the other hand, if you were showing results in some context when the user was intended to understand the details of what was going on, you might want to use "write". That way, when your program showed a string, the user would know from the quotation marks that it was a string, and not merely a text description of some other kind of Scheme object. The user would also be able to tell clearly where the string started and where it ended.

The Wraith Scheme interpreter uses "write" when it prints out the return values of procedures for you to look at. That is why, when you typed "my-string" to the interpreter in the preceding example, you got

my-string    ;; ==> "This is a string."

with quotation marks, and not

my-string    ;; ==> This is a string.

without them.

There is also a way to use "display" and "write" to put things into a file. We will see more about that in a later tutorial. Alternatively, if you are curious right now, you can look up these procedures in the Wraith Scheme Dictionary.

It is possible to quote strings, but there is no real point in doing so, because strings evaluate to themselves anyway. In the next example, the first string has a quote in front of it, while the second one does not, yet the Wraith Scheme interpreter does the same thing with both of them -- it just prints them out.

'"This is a quoted string."        ;; ==> "This is a quoted string."
"This is not a quoted string."     ;; ==> "This is not a quoted string."

Strings that are typed in as text literals -- characters surrounded by quotes -- are constant. The Scheme procedures that modify strings do not work on them.

One thing you might wonder about is, how do you put a double-quote inside a string? If you try to do something like this:

"This string contains a double quote " <-- there it was"

then the Wraith Scheme interpreter will get confused; it will take what you intended as an embedded double-quote to be the end of the whole string -- that is, it will take

"This string contains a double quote "

to be an entire string, and will become confused and emit error messages when it sees the rest of the stuff on the line. The way around that problem is to escape the double-quote; which is jargon for putting a "backslash" -- the character "\" -- in front of it. Scheme is one of several programming languages that allow you to use backslashes to change the meaning of a character in a string. Putting a backslash in front of a quote means "This quote is supposed to be part of the string, it is not intended as a marker for the end of the string." Thus:

"This string contains a double quote \" <-- there it was"
    ;; ==> "This string contains a double quote " <-- there it was"

Several characters have special meanings when escaped. For example:

A escaped "n" -- "\n" means "put a newline character in the string here".
A escaped "t" -- "\t" means "put a tab character in the string here".

There are others, and there is more detail to be learned: See the section on "Slashification" in the Wraith Scheme Help File.

Here is what happens when you put escaped characters in a string.

"Here -->\n<-- is an escaped \"n\"."              ;; ==> "Here -->\n<-- is an escaped \"n\"."

(write "Here -->\n<-- is an escaped \"n\".")      ;; ==> "Here -->\n<-- is an escaped \"n\"."#t
(display "Here -->\n<-- is an escaped \"n\".")    ;; ==> Here -->
                                                         <-- is an escaped "n".#t

Note that "write" printed out the string just as it was typed in, so that it is easy to see what is going on, but "display" actually followed the instructions for how the string is intended to appear -- it put in a real "newline" character, so that the string was printed on two lines, and it surrounded the "n" with a tidy pair of double-quotes.

A word of caution: If you don't match quotes carefully, the Wraith Scheme interpreter will get confused. If you are typing a string that takes more than one line, a little panel at the lower right corner of the Basic Buttons Drawer might become visible, to remind you that Wraith Scheme is expecting one more double-quote real soon now.

I say "might" instead of "will", because it depends on just when and where you forget the double quote: Unhappily, the panel cannot help with Scheme expressions that take only one line -- Wraith Scheme doesn't see those at all until you have pressed the "return" key, by which time it is too late for any advice to do any good. Also, Wraith Scheme allows strings to span more than one line of text only if you put a backslash at the end of all lines but the last one. The example in the image above has one just to the right of the question mark, and you might have been intending to type a complete string that looked like this (note the backslash, and note that there is a final double-quote, in the second line of text, after "user."):

"Do you believe in users?\
I'm a user."

Incidentally, that same little panel at the bottom right of the Basic Buttons Drawer will also natter at you about missing parentheses.

You can also create strings with a procedure application. Remember that you can create lists with "list". Procedure "string" works just like "list".

(string #\C #\a #\t)    ;; ==> "Cat"

Procedure "string" is not as useful as one might hope, because of the rather awkward way of writing individual characters in Scheme.

There is also a procedure called "make-string".

(make-string 4)        ;; ==> "    "    ;; Contains four blanks.
(make-string 4 #\x)    ;; ==> "xxxx"

Procedure "make-string" takes one or two arguments. The first argument is the length of the string, which is four in the examples just given. Strings are like vectors and are different from lists; You must specify the length of a string at the time it is created, and there is no way to change it afterward. (There is a way to combine two strings to make a new, longer one, though -- we will see how shortly.) The specification can be implicit, as in the case of quoted strings -- the string "Cat", in the earlier example, obviously had a length of three -- or explicit, as in the first argument to "make-string".

If "make-string" has only one argument, then it will fill up the resulting string with blank spaces. If "make-string" has a second argument, that argument must be a character, and the whole string will be filled with that character.

If you already have a string and want all its characters to be the same, you can use "string-fill!".

(define my-string (make-string 5))    ;; ==> my-string
my-string                             ;; ==> "     "
(string-fill! my-string #\x)          ;; ==> #t
my-string                             ;; ==> "xxxxx"

There is another useful way to create a string. You can make one out of a list, using procedure "list->string".

(list->string '(#\M #\e #\o #\w #\!))    ;; ==> "Meow!"

The name "list->string" is sometimes pronounced "list-to-string".

Procedure "string->list" performs the reverse operation.

(string->list "Tuna!")    ;; ==> (#\T #\u #\n #\a #\!)

These two procedures are very useful when dealing with strings. Note in particular that although "cat food" is a constant string, and cannot be modified,

(list->string (string->list "cat food"))    ;; "cat food"

is not a constant string -- list->string creates a string that is not constant -- and can be modified at will. Yet you do not have to use a clumsy construct involving "string->list" and "list->string" to get a modifiable string out of a typed literal. You can use "string-copy" instead.

(string-copy "cat food")    ;; ==> "cat food"

The strings returned by "string-copy" are all brand new modifiable ones.

You can retrieve content from a string by using "string-ref", which stands for "string reference". It works analogously to "list-ref", and -- like "list-ref" -- it is zero-based when it comes to figuring out which character in a string is which.

(define s "abc")    ;; ==> s
(string-ref s 0)    ;; ==> #\a
(string-ref s 1)    ;; ==> #\b
(string-ref s 2)    ;; ==> #\c

You can change the contents of a string by using "string-set!", which has no analogy among the standard Scheme procedures for lists. It has side effects -- it changes the string -- so its name follows the Scheme convention for such things, and ends with an exclamation point. Since "string-set!" is used solely for the sake of those side effects, it returns only #t. The first two arguments of "string-set!" are as for "string-ref", and the third is the new value to go in the specified place in the string.

(define letters (string #\a #\b #\c))    ;; ==> letters
letters                                  ;; ==> "abc"
(string-set! letters 0 #\z)              ;; ==> #t
letters                                  ;; ==> "zbc"

Note the use of "string" in the preceding example. I could not have written

(define letters "abc")    ;; ==> letters

and then gone on with the example, because strings defined by typing their literal values are constants. Scheme won't let you change them. If I had done that, there would have been an error message when "string-set!" tried to substitute #\z for #\a. Procedure "string" builds a new string from the content of the list that is its argument, and that new string is not constant, so the example works.

There is a predicate, "string?" to tell whether objects are strings or not, and a procedure, "string-length", to tell how many items are in a string.

(string? "fdsa")                ;; ==> #t
(string? '(#\f #\d #\s #\a))    ;; ==> #f
(string? 42)                    ;; ==> #f

(string-length "asdf")          ;; ==> 4

So far so good, and I hope you see that strings and the procedures that deal with them really do resemble vectors and their procedures. There is lots more, though. There are a handful of string procedures that facilitate common operations involving text.

Procedure "substring" returns a duplicate of part of a string. The wording about duplicate means that you can modify the substring returned, without also modifying the string from which it was copied. This procedure takes three arguments; the first is the string, and the second and third are respectively the zero-based start and end locations of the substring to be replicated. Note that the end position is not inclusive; that is, the substring returned will include all characters up to but not including the one specified by the third argument.

(define sandwich "whole wheat on rye")    ;; ==> sandwich
(substring sandwich 6 11)                 ;; ==> "wheat"

Procedure "string-append" takes any number of strings as arguments, and returns a brand new string that has the arguments duplicated and connected together in the order given.

(string-append)                   ;; ==> ""
(string-append "ab" "cd" "ef")    ;; ==> "abcdef"
(string-append "cat" "food")      ;; ==> "catfood"

The last example illustrates one of the common errors in appending strings in Scheme: Some programming languages automatically insert a blank space as a delimiter in between appended strings. In Scheme, you have to do it yourself, perhaps as in one of the following expressions.

(string-append "cat" " " "food")    ;; ==> "cat food"
(string-append "cat " "food")       ;; ==> "cat food"
(string-append "cat" " food")       ;; ==> "cat food"

Scheme has ten predicates for comparing strings. They all take two arguments -- both strings. The restriction on argument quantity is different from the predicates that compare numbers: All numeric-comparison predicates can take two or more arguments. The R5 report gives Scheme implementations a choice of whether or not to let the string-comparison predicates take more than two arguments; I chose to allow only two in Wraith Scheme.

Each predicate comes in two slightly different varieties. One considers upper- and lower-case letters to be the same when doing the comparison, and the other does not. The case-independent comparisons all have names that include "ci" (for case independent), near the end of the name.

The kind of string comparison that Scheme uses is called lexicographic comparison: One string is considered to be less than another if it would appear before the other in a dictionary or an alphabetically sorted book index. If two strings have different lengths but compare as equal up to the point where one string ends, the shorter string is considered to be "less than" the longer one. The wording about "compare as equal" allows for the possibility of a case-independent comparison, in which case the strings would not have to be the same to pass the comparison test up to the point where the shorter one stopped.

The comparison of characters that Wraith Scheme uses as the basis of comparison of strings is defined by the standard ASCII enumeration of characters. One character is less than another if, and only if, the first character's ASCII number is less than the second one's. That means, among other things, that a capital letter is always less than its lower-case counterpart, and that when two letters have the same case, the one nearer the beginning of the alphabet is less than the other.

The string-comparison predicates are

string=?
string<?
string>?
string<=?
string>=?

string-ci=?
string-ci<?
string-ci>?
string-ci<=?
string-ci>=?

Here are the examples of their use that we saw in the tutorial, "Predicates and Booleans".

(string<? "aardvark" "zygomorph")       ;; ==> #t
(string<? "aardvark" "Zygomorph")       ;; ==> #f
(string-ci<? "aardvark" "Zygomorph")    ;; ==> #t
(string<? "aaa" "aaaaaaaa")             ;; ==> #t

Remember: Strings are much like a specialized kind of vector for handling text.

-- Jay Reynolds Freeman (Jay_Reynolds_Freeman@mac.com)