We have seen the print
function:
x <- 1
print(x)
y <- list('Hello', TRUE, c(1,2,3))
print(y)
print is a generic function:
print
and cat
¶print
can only print its first term
print('Right now it is', date())
For this we need the cat (concatenate) function
cat('Right now it is', date(), "in West Lafayette")
cat(..., file = '' , sep = ' ' , fill = FALSE,
labels = NULL, append = FALSE)
…
: Inputs that R concatenates to print
sep
: What to append after each input (default is space)
file
: Destination file (default is stdout)
Use paste()
to store the concatenated output (a string)
cat(1:5)
cat(1:5,sep= ',' )
cat(1:5,sep= '\n' )
cat('[' ,1:5, ']' ,sep=(',' ))
cat('[',1:5, ']' ,sep=c('', rep(',' ,4), '' ))
cat('Hello','World','New para',sep='\n',file='new_file.txt')
Section 8.1.22 in The R Inferno, Patrick Burns:
print
outputs all characters in the stringcat
outputs what the string representsCompare:
print('Hello\n')
cat('Hello\n')
What if we want to output ‘\n’ using cat ?
Escape \
with another \
cat('Hello\\n')
Regular expression: representation of a collection of strings
Useful for searching and replacing patterns in strings
Composed of a grammar to build complicated patterns of strings
R has functions, which coupled with regular expressions allow powerful string manipulation
E.g. grep, grepl, regexpr, gregexpr, sub, gsub
cities <- c('lafayette', 'indianapolis' , 'cincinnati')
grep('in', cities)
grepl('in', cities)
Usage:
grep(pattern, x, ignore.case = FALSE, perl = FALSE, value = FALSE)
grep('in',cities,value=TRUE) #Return values instead of indices
Where in each element did the match occur?
regexpr('in', cities)
What if more than one match occured?
gregexpr('in', cities)
What if we want to match
R supports two flavors of regular expressions, we will always
use perl (set option perl = TRUE
)
'.
' (period) represents any character except empty string '””
'
vec<-c('ct','at', 'cat', 'caat', 'cart', 'dog', 'rat', 'carert', 'bet')
grep('.at', vec, perl = TRUE)
grep('..t', vec, perl = TRUE)
+
represents one or more occurrences
grep( 'ca+t', vec, perl = TRUE)
grep( 'c.+t', vec, perl = TRUE)
*
represents zero or more occurrences
grep('c.*t', vec, perl = TRUE)
Group terms with parentheses ’(’ and ’)’
grep('c(.r)+t', vec, perl = TRUE)
grep('c(.r)*t', vec, perl = TRUE)
‘.
’ ‘,
’ ‘+
’ ‘*
’ are all metacharacters
Other useful ones include:
grep('e.$', vec, perl = TRUE)
| ( logical OR )
grep('(c.t)|(c.rt)', vec, perl = TRUE)
[
and ]
( create special character classes)
i
[0-7ivx]
: any of 0 to 7, i, v, and x
[a-z]
: lowercase letters
[a-zA-Z]
: any letter
[0-9]
: any number
[aeiou]
: any vowel
grep('[ei]t', vec, perl = TRUE)
Inside a character class ˆ
means "anything except the following
characters". E.g.
[ˆ0-9]
: anything except a digit
grep('[^a]t', vec, perl = TRUE)
What if we want to match metacharacters like .
or +
?
vec <- c('ct', 'cat', 'caat', 'caart', 'caaaat', 'caaraat',
'c.t')
grep('c.t', vec, perl = TRUE) #Is this what we want?
Escape them with \
WARNING: a single \
doesn’t work. Why?
cat('c\.t')
R thinks \.
is a special character like \n
.
Use two \'s
cat('c\\.t')
grep('c\.t', vec, perl = TRUE)
grep('c\\.t', vec, perl = TRUE)
To match a \
, our pattern must represent \\
my_var <- '\n'
grep('\\n', my_var)
my_var <- ('\\')
grep('\\\\', my_var)
The sub
function allows search and replacement:
vec <-c('ct','cat','caat','caart','caaaat','caaraaat','c.t')
sub('a+', 'A', vec, perl = TRUE)
sub
replaces only first match, gsub
replaces all
Use backreferences \1, \2 etc to refer to first, second group etc
gsub('(a+)r(a+)', 'b\\1brc\\2c', vec, perl = TRUE)
Use \U, \L, \E to make following backreferences upper or lower case or leave unchanged respectively
gsub('(a+)r(a+)', '\\U\\1r\\2', vec, perl = TRUE)
gsub('(a+)r(a+)', '\\U\\1r\\E\\2', vec, perl = TRUE)