Troubleshooters.Com and Code Corner Present

Litt's Lua Laboratory:
Lua Simple String Handling
(With Snippets)

Copyright (C) 2011 by Steve Litt



Debug like a Ninja

Contents



Introduction

I think when it comes to string handling, Perl is the Cadillac of the industry. But Lua's not too shabby on the string front either.

Concatenation and Other String Building

Concatenation in  Lua is as simple as using the two dot operator (..):
#!/usr/bin/lua 
local fname = "Barack"
local lname = "Obama"
local wholename = fname .. " " .. lname
print(wholename) --Prints "Barack Obama"
You can build a string whose characters are all the same with string.rep() as shown:
#!/usr/bin/lua 
local fname = string.rep('=', 20);
print(fname)
The preceding prints a line of equal signs 20 long:
slitt@mydesk:~$ ./test.lua
====================
slitt@mydesk:~$

String Formatting

Another way to build strings is by string formatting, done with the string.format() command. That function's first argument is a template string with different tokens which are replaced by its other arguments. For instance:
#!/usr/bin/lua 
local fname="Peter"
local lname="Piper"
local avg=87.567
local str = string.format(
 "Student %s %s has a test average of %4.1f%%.",
 fname, lname, avg);
print(str)

The preceding produces the following output:
slitt@mydesk:~$ ./test.lua
Student Peter Piper has a test average of 87.6%.
slitt@mydesk:~$
Notice the following:
Here are some other things you should know:

Width, Zeros and Justification

Remember %4.1d meant 1 decimal place with minimum width of 4? Check out this columnar report program
The columnar report program preceding produces the following output:
slitt@mydesk:~$ ./test.lua
Lastname Firstname Amount
-------- --------- ------
Anderson , Al contributed $ 1200.00
Foster , Fred contributed $ 32.67
Jones , John contributed $ 400.00
Smith , Sam contributed $ 900.00
slitt@mydesk:~$
We gave every last name and every first name a width of 10, and by making that number negative we left justified it instead of the default right justification. For the amounts, which should always be right justifed, we made it right justified by having a positive width. It was easy to make the headers match up to the line items because we used the same widths.

You and I know that this would have been much better if we'd put it in a loop and had a table of items. But since this lesson comes before the one on tables, I just did it longhand.

Sometimes you want to have numbers zero filled out to the necessary width. No problem. Instead of the token being %8.2f, make it %08.2f. That zero tells string.format() to prepend leading zeros.

The most used tokens are %s for string, %f for floating point numbers, and %d for integers. There are others, and they can be looked up in the Lua manual.

Upper and Lower Case

To convert a string to upper case, use string.upper(str).

To convert a string to lower case, use string.lower(str).

To do things like convert the first letter of every word to upper case (Titlecase) or the first letter of every sentence to upper case, you need to use more advanced techniques.

Splitting By Character Position

You use string.sub() to split strings by character position (index). This function returns a string that is a substring of the original. Its syntax is like this:
string.sub(strng, index_start [, index_end])
Where strng is the string to be cut up, index_start is the index of the first character to be returned, and index_end is the index of the last character to be returned. If index_end is absent, everything between the character at index_start and the end of the larger string is returned.

The fact that index_start and index_end can be negative gives you even more power. If either is negative, it's relative to the end of the original string instead of the beginning, with -1 being the end of the original string. You should never use 0 for either index. This indexing example program makes it pretty clear.

The indexing example program prints the following output:
slitt@mydesk:~$ ./test.lua
123456789
123456789
4567
1234
6789
2345678
23456789
12345678
89
12
slitt@mydesk:~$
UMENU menu definition files are a case in point. The first character is a key, the second character can be anything and is just for humans and ignored by the program, and from the third character on is the value. Here's how you do that:
#!/usr/bin/lua 

local strng = "T_Kt to MPH converter"
local key = string.sub(strng, 1, 1)
local value = string.sub(strng, 3)

print(strng)
print(key)
print(value)
The preceding code prints the following, as expected:
slitt@mydesk:~$ ./test.lua
T_Kt to MPH converter
T
Kt to MPH converter
slitt@mydesk:~$
The string.sub() function can be combined with other string handling functions to do quite powerful things.

Finding Substrings

The preceding section explained how to find substrings by character position. Another way is to find them by searching for patterns. At this point we're getting very close to the concept of "Regular Expressions", which any Lua afficianado will tell you doesn't exist in Lua unless you load the Posix module. But of course you and I know that Lua's built in string functions can do pretty much what other languages' regular expressions can do.

Lua's string.match(), string.gmatch() and string.find() built in functions help you find substrings by matching:
Syntax
Returns
Does
string.find(s, pattern [, init [, plain]]) index of start of match
By returning the index of the found match this gives you a great way to carve up string.
string.match(s, pattern [, init]) capture Returns the capture (instance of the found pattern), or nil if none were found. Quick way of getting a small string out of a big one, or of detecting whether the pattern exists.
string.gmatch(s, pattern)


iterator The iterator repeatedly finds the next instance of the found pattern (capture), so it can be used in a generic while loop.

One requirement that comes up often is to trim blanks from both sides of a string. Here's one way to do it in Lua:
#!/usr/bin/lua 

local strng = " Here I stand "
local rtrim = string.match(strng, ".*%S")
local rltrim = string.match(ltrim, "%S.*")
print(string.format("rltrim=>%s<", rltrim))
The preceding produces the following output:
slitt@mydesk:~$ ./test.lua
rltrim=>Here I stand<
slitt@mydesk:~$
Let's examine the preceding carefully. On the right trim, we look for everything until the last non-space (%S is a non-space printing character, %s is a space character). But why did it take until the last non-space instead of until the first one? It's because Lua's string package's default behavior is greedy matching, meaning match all the way until the last instance of what you're looking for. So in the pattern ".*%S", it takes all characters until the last non-space. If you wanted to take them only until the first non-space, you'd use a minus sign instead of an asterisk, so it would look like this: ".-%S". That's called non-greedy matching. This will be discussed in on the "Lua Regex" page.

Anyway, getting back to the code, there's a more concise way to trim off both sides, as follows:
#!/usr/bin/lua 

local strng = " Here I stand "
local rltrim = string.match(string.match(strng, "%S.*"), ".*%S")
print(string.format("rltrim=>%s<", rltrim))
The inside function is evaluated first, and it trims spaces off the left. I think that's probably quicker. The inside function pass its result as a function return, and the outside function works on that. The output follows:
slitt@mydesk:~$ ./test.lua
rltrim=>Here I stand<
slitt@mydesk:~$

As a final example, consider this code to take a typical config file line with a key, an equal sign, and a value, and extract the key and value from it.

That's some ugly code folks. In the Lua "Regex" lesson you'll learn easier, neater and more concise ways to do things like this. Here's the output.

Notice that both the key and the value have been trimmed of blanks on both sides.