grep {base}R Documentation

Pattern Matching and Replacement


grep searches for matches to pattern (its first argument) within the character vector x (second argument). regexpr does too, but returns more detail in a different format.

sub and gsub perform replacement of matches determined by regular expression matching.


grep(pattern, x, = FALSE, extended = TRUE, perl = FALSE,
     value = FALSE, fixed = FALSE)
sub(pattern, replacement, x, = FALSE, extended = TRUE, perl = FALSE)
gsub(pattern, replacement, x, = FALSE, extended = TRUE, perl = FALSE)
regexpr(pattern, text,  extended = TRUE, perl = FALSE, fixed = FALSE)


pattern character string containing a regular expression (or character string for fixed = TRUE) to be matched in the given character vector.
x, text a character vector where matches are sought. if FALSE, the pattern matching is case sensitive and if TRUE, case is ignored during matching.
extended if TRUE, extended regular expression matching is used, and if FALSE basic regular expressions are used.
perl logical. Should perl-compatible regexps be used if available? Has priority over extended.
value if FALSE, a vector containing the (integer) indices of the matches determined by grep is returned, and if TRUE, a vector containing the matching elements themselves is returned.
fixed logical. If TRUE, pattern is a string to be matched as is. Overrides all other arguments.
replacement a replacement for matched pattern in sub and gsub.


The two *sub functions differ only in that sub replaces only the first occurrence of a pattern whereas gsub replaces all occurrences.

For regexpr it is an error for pattern to be NA, otherwise NA is permitted and matches only itself.

The regular expressions used are those specified by POSIX 1003.2, either extended or basic, depending on the value of the extended argument, unless perl = TRUE when they are those of PCRE, (The exact set of patterns supported may depend on the version of PCRE installed on the system in use.)


For grep a vector giving either the indices of the elements of x that yielded a match or, if value is TRUE, the matched elements.
For sub and gsub a character vector of the same length as the original.
For regexpr an integer vector of the same length as text giving the starting position of the first match, or -1 if there is none, with attribute "match.length" giving the length of the matched text (or -1 for no match).


Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole (grep)

See Also

regular expression for the details of the pattern specification.

agrep for approximate matching.

tolower, toupper and chartr for character translations. charmatch, pmatch, match. apropos uses regexps and has nice examples.


grep("[a-z]", letters)

txt <- c("arm","foot","lefroo", "bafoobar")
if(any(i <- grep("foo",txt)))
   cat("'foo' appears at least once in\n\t",txt,"\n")
i # 2 and 4

## Double all 'a' or 'b's;  "\" must be escaped, i.e., 'doubled'
gsub("([ab])", "\\1_\\1_", "abc and ABC")

txt <- c("The", "licenses", "for", "most", "software", "are",
  "designed", "to", "take", "away", "your", "freedom",
  "to", "share", "and", "change", "it.",
   "", "By", "contrast,", "the", "GNU", "General", "Public", "License",
   "is", "intended", "to", "guarantee", "your", "freedom", "to",
   "share", "and", "change", "free", "software", "--",
   "to", "make", "sure", "the", "software", "is",
   "free", "for", "all", "its", "users")
( i <- grep("[gu]", txt) ) # indices
stopifnot( txt[i] == grep("[gu]", txt, value = TRUE) )
(ot <- sub("[b-e]",".", txt))
txt[ot != gsub("[b-e]",".", txt)]#- gsub does "global" substitution

txt[gsub("g","#", txt) !=
    gsub("g","#", txt, = TRUE)] # the "G" words

regexpr("en", txt)

## trim trailing white space
str = 'Now is the time      '
sub(' +$', '', str)  ## spaces only
sub('[[:space:]]+$', '', str) ## white space, POSIX-style
  sub('\\s+$', '', str, perl = TRUE) ## perl-style white space

[Package Contents]