macOS NSRegularExpression

From Lazarus wiki
Jump to navigationJump to search

English (en)

macOSlogo.png

This article applies to macOS only.

See also: Multiplatform Programming Guide


Regular expressions

Regular expressions are patterns used to match specified alpha-numeric character combinations in the string data being searched.

Each character in a regular expression (that is, each character in the string describing its pattern) is either a metacharacter or operator, having a special meaning, or a regular character that has a literal meaning.

Regular expressions can be incredibly complex. Indeed, whole books have been written about them! For a gentle introduction to regular expressions, see this O'Reilly article.

NSRegularExpression Overview

The NSRegularExpression class has convenience methods for returning all the matches as an array, the total number of matches, the first match, and the range of the first match.

An individual match is represented by an instance of the NSTextCheckingResult class, which carries information about the overall matched range (via its range property), and the range of each individual capture group (via the rangeAtIndex method).

NSRegularExpression conforms to the International Components for Unicode (ICU) specification for regular expressions.

Metacharacters

For a comprehensive list of characters used by the NSRegularExpression class that have a special meaning in regular expression patterns, see the ICU listing.

Operators

For a comprehensive list of operators used by the NSRegularExpression class, see the ICU listing.

Example 1 - match a pattern

In this fairly trivial and contrived code example, we use the \d metacharacter which matches a decimal digit and the + operator to match the preceding decimal digit one or more times. This pattern \d+ aims to match all the occurrences of numbers in the search string which we then output using NSLog(). It uses the NSRegularExpression convenience methods for returning all of the matches in the search string as an array and the total number of matches.

Code

Program regex_ex1;

{$mode objfpc}{$H+}
{$modeswitch objectivec2}

Uses
  MacOSAll, CocoaAll, SysUtils;

Var
  srchStr : String;
  patnStr : String;
  myRegex : NSregularExpression;
  matches : NSArray;
  match   : NSTextCheckingResult;
  error   : NSErrorPtr;

Begin
  error   := Nil;
  srchStr := 'I have 43 bags of 60 marbles.';
  patnStr := '\d+';

  // Create a regular expression with given string and options
  myRegex := NSregularExpression.regularExpressionWithPattern_options_error(NSStr(patnStr), NSRegularExpressionOptions(NSRegularExpressionCaseInsensitive), error);

  // Check creation of regular expression with given string and options
  if(error <> Nil) then
    begin
      NSLog(NSStr('Regex creation error: %@'), error);
      Exit;
    end;

  // Save any matches in the given string in the matches array
  matches := myRegex.matchesInString_options_range(NSStr(srchStr), 0, NSMakeRange(0, srchStr.Length));

  // Output
  NSLog(NSStr('Search string: %@'), NSStr(srchStr));
  NSLog(NSStr('Pattern string: %@'), NSStr(patnStr));
  NSLog(NSStr('Number of matches: %lu'), myRegex.numberOfMatchesInString_options_range(NSStr(srchStr), 0, NSMakeRange(0, srchStr.Length)));

  for match in matches do
    NSLog(NSStr('match: %@'), NSStr(srchStr).substringWithRange(match.rangeAtIndex(0)));
End.

Output

The output from running the above code example is:

2021-06-12 21:05:46.335 regex_ex1[26138:232243] Search string: I have 43 bags of 60 marbles.
2021-06-12 21:05:46.336 regex_ex1[26138:232243] Pattern string: \d+
2021-06-12 21:05:46.336 regex_ex1[26138:232243] Number of matches: 2
2021-06-12 21:05:46.336 regex_ex1[26138:232243] match: 43
2021-06-12 21:05:46.336 regex_ex1[26138:232243] match: 60

Code explanation

The call to regularExpressionWithPattern_options_error() creates an NSRegularExpression object instance (myRegex) with the specified regular expression pattern and options.

Options are specified using NSRegularExpressionOptions(). Note that by default NSRegularExpression performs case-sensitive searches, so we specified the NSRegularExpressionCaseInsensitive option for case-insenstive searches although, because we are dealing with digits above, this has no effect and we might as well have specified Nil in this example for no options.

Once we have the NSRegularExpression object, we can then use it for matching text among other operations.

After checking that the creation of the regular expression did not fail with an error, we call the matchesInString_options_range() method to search for any matches and store them in our NSArray (matches). This method takes our search string, any options (there are none here) and the range to search. The range is specified by giving NSMakeRange() the starting location to search in the string (0 = the beginning of the string) and the length of the search string.

Next, we output the search string and pattern string, and then call the numberOfMatchesInString_options_range() method to determine the number of matches and output it.

Finally, we iterate through the matches NSArray and output the matches individually. The call to rangeAtIndex(0) is the full match and is equivalent to simply calling range. The code for this looks a little obscure. Let me try to unpack it for you.

If you just output the content of the matches NSArray you get this:

"<NSSimpleRegularExpressionCheckingResult: 0x14d617570>{7, 2}{<NSRegularExpression: 0x14d614fd0> \\d+ 0x1}",
"<NSSimpleRegularExpressionCheckingResult: 0x14d617610>{18, 2}{<NSRegularExpression: 0x14d614fd0> \\d+ 0x1}"

Notice the {7, 2} and {18, 2} ranges which locate the first number at position 7 (counting from zero) in the search string with a length of 2 and the second number at position 18 with a length of 2. Knowing those ranges, you could use:

NSLog(NSStr('match: ''%@'''), NSStr(srchStr).substringWithRange(NSMakeRange(7, 2)));

to output the first number. The substringWithRange() method extracts from our search string the substring that matches the specified range (7, 2). Clearer than mud? I hope so.

Example 2 - match pattern groups

This is similar to Example 1 above, except that this time we match groups of characters. Our search string is the same as before, but our pattern string has some added complexity. The pattern matches a decimal digit one or more times as before, but this time as a group which is delineated by using parentheses: (/d+). Next, we use a point . to match any character and ? to match zero or one times following the digit(s). Finally, we match the set of characters from a to z ([a-z]+) one or more times as a group.

Code

Program regex_ex2;

{$mode objfpc}{$H+}
{$modeswitch objectivec2}

Uses
  MacOSAll, CocoaAll, SysUtils;

Var
  srchStr : String;
  patnStr : String;
  myRegex : NSregularExpression;
  matches : NSArray;
  match   : NSTextCheckingResult;
  error   : NSErrorPtr;

Begin
  error   := Nil;
  srchStr := 'I have 43 Bags of 60 Marbles.';
  patnStr := '(\d+).?([a-z]+)';

  // Create a regular expression with given string and options
  myRegex := NSregularExpression.regularExpressionWithPattern_options_error(NSStr(patnStr), NSRegularExpressionOptions(NSRegularExpressionCaseInsensitive), error);

  // Check creation of regular expression with given string and options
  if(error <> Nil) then
    begin
      NSLog(NSStr('Regex creation error: %@'), error);
      Exit;
    end;

  // Save any matches in the given string in the matches array
  matches := myRegex.matchesInString_options_range(NSStr(srchStr), 0, NSMakeRange(0, srchStr.Length));

  // Output
  NSLog(NSStr('Search string: %@'), NSStr(srchStr));
  NSLog(NSStr('Pattern string: %@'), NSStr(patnStr));
  NSLog(NSStr('Number of matches: %lu'), myRegex.numberOfMatchesInString_options_range(NSStr(srchStr), 0, NSMakeRange(0, srchStr.Length)));

  for match in matches do
    begin
      NSLog(NSStr('match(0): ''%@'''), NSStr(srchStr).substringWithRange(match.rangeAtIndex(0)));
      NSLog(NSStr('match(1): ''%@'''), NSStr(srchStr).substringWithRange(match.rangeAtIndex(1)));
      NSLog(NSStr('match(2): ''%@'''), NSStr(srchStr).substringWithRange(match.rangeAtIndex(2)));
    end;
End.

Output

2021-06-14 17:39:51.255 program2[7376:149163] Search string: I have 43 bags of 60 marbles.
2021-06-14 17:39:51.255 program2[7376:149163] Pattern string: (\d+).?([a-z]+)
2021-06-14 17:39:51.255 program2[7376:149163] Number of matches: 2
2021-06-14 17:39:51.255 program2[7376:149163] match(0): '43 Bags'
2021-06-14 17:39:51.255 program2[7376:149163] match(1): '43'
2021-06-14 17:39:51.255 program2[7376:149163] match(2): 'Bags'
2021-06-14 17:39:51.255 program2[7376:149163] match(0): '60 Marbles'
2021-06-14 17:39:51.255 program2[7376:149163] match(1): '60'
2021-06-14 17:39:51.255 program2[7376:149163] match(2): 'Marbles'

Code explanation

The explanation is pretty much the same as for Example 1, except that:

1) The NSRegularExpressionCaseInsensitive option for case-insenstive searches now has a use. We specified the set of lowercase characters a to z, but because of the option we matched the words Bags and Marbles with initial capital letters.

2) The call to rangeAtIndex(0) or the equivalent range matches the full pattern; the call to rangeAtIndex(1) matches the first group of digits that we specified; and the call to rangeAtIndex(2) matches the second group of characters that we specified.

Example 3 - replace matched pattern

This is similar to Example 1 above, except that this time we replace the decimal numbers matched by the regular expression with words instead of numbers. This is done two ways: one uses the stringByReplacingMatchesInString_options_range_withTemplate() method which returns a new string with the replacements and the other uses the replaceMatchesInString_options_range_withTemplate() method which replaces the matches in the original search string.

Code

Program regex_ex3;

{$mode objfpc}{$H+}
{$modeswitch objectivec2}

Uses
  MacOSAll, CocoaAll, SysUtils;

Var
  srchStr : NSString;
  srchStr2: NSMutableString;
  patnStr : NSString;
  templStr: NSString;
  myRegex : NSregularExpression;
  matches : NSArray;
  match   : NSTextCheckingResult;
  numMatch: NSInteger;
  error   : NSErrorPtr;
  count   : ShortInt;

Begin
  error   := Nil;
  srchStr := NSStr('I have 43 Bags of 60 Marbles.');
  srchStr2:= NSMutableString.stringWithString(srchStr);
  patnStr := NSStr('\d+');

  // Create a regular expression with given string and options
  myRegex := NSregularExpression.regularExpressionWithPattern_options_error(patnStr, NSRegularExpressionOptions(NSRegularExpressionCaseInsensitive), error);

  // Check creation of regular expression with given string and options
  if(error <> Nil) then
    begin
      NSLog(NSStr('Regex creation error: %@'), error);
      Exit;
    end;

  // Save any matches in the given string in the matches array
  matches := myRegex.matchesInString_options_range(srchStr, 0, NSMakeRange(0, srchStr.Length));

  // Save number of matches
  numMatch := myRegex.numberOfMatchesInString_options_range(srchStr, 0, NSMakeRange(0, srchStr.Length));

  // Output
  NSLog(NSStr('Search string: %@'), srchStr);
  NSLog(NSStr('Pattern string: %@'), patnStr);
  NSLog(NSStr('Number of matches: %lu'), numMatch);

  // Alternative 1: Using stringByReplacingMatchesInString_options_range_withTemplate()
  for count := numMatch downto 1 do
    begin
      for match in matches do
        begin
          if(srchStr.substringWithRange(match.range) = NSStr('43')) then
            begin
              templStr := NSStr('forty-three');
              NSLog(NSStr('Match 1: %@ - Template string: %@'), srchStr.substringWithRange(match.range), templStr);
            end;

          if(srchStr.substringWithRange(match.range) = NSStr('60')) then
            begin
              templStr := NSStr('sixty');
              NSLog(NSStr('Match 2: %@ - Template string: %@'), srchStr.substringWithRange(match.range), templStr);
            end;

          // Do string replacement
          srchStr :=  myRegex.stringByReplacingMatchesInString_options_range_withTemplate(srchStr, NSRegularExpressionOptions(Nil), match.range, templStr);
        end;

        // Save any matches in the new search string in the matches array
        matches := myRegex.matchesInString_options_range(srchStr, 0, NSMakeRange(0, srchStr.Length));
     end;

  // Output string with replacements
  NSLog(NSStr('Result: %@'), srchStr);

  // Save any matches in the given string in the matches array
  matches := myRegex.matchesInString_options_range(srchStr2, 0, NSMakeRange(0, srchStr2.Length));

  // Alternative 2: Using replaceMatchesInString_options_range_withTemplate()
  for count := numMatch downto 1 do
    begin
      for match in matches do
        begin
          if(srchStr2.substringWithRange(match.range) = NSStr('43')) then
            begin
              templStr := NSStr('forty-three');
              NSLog(NSStr('Match 1: %@ - Template string: %@'), srchStr2.substringWithRange(match.range), templStr);
            end;
          if(srchStr2.substringWithRange(match.range) = NSStr('60')) then
            begin
              templStr := NSStr('sixty');
              NSLog(NSStr('Match 2: %@ - Template string: %@'), srchStr2.substringWithRange(match.range), templStr);
            end;

          // Do string replacement
          myRegex.replaceMatchesInString_options_range_withTemplate(srchStr2, NSRegularExpressionOptions(Nil), match.range, templStr);
       end;

       // Save any matches in the new search string in the matches array
       matches := myRegex.matchesInString_options_range(srchStr2, 0, NSMakeRange(0, srchStr2.Length));
    end;

  // Output string with replacements
  NSLog(NSStr('Result: %@'), srchStr2);
End.

Output

2021-06-14 22:42:55.807 program3[9115:227765] Search string: I have 43 Bags of 60 Marbles.
2021-06-14 22:42:55.808 program3[9115:227765] Pattern string: \d+
2021-06-14 22:42:55.808 program3[9115:227765] Number of matches: 2
2021-06-14 22:42:55.808 program3[9115:227765] Match 1: 43 - Template string: forty-three
2021-06-14 22:42:55.808 program3[9115:227765] Match 2: 60 - Template string: sixty
2021-06-14 22:42:55.808 program3[9115:227765] Result: I have forty-three Bags of sixty Marbles.
2021-06-14 22:42:55.808 program3[9115:227765] Match 1: 43 - Template string: forty-three
2021-06-14 22:42:55.808 program3[9115:227765] Match 2: 60 - Template string: sixty
2021-06-14 22:42:55.808 program3[9115:227765] Result: I have forty-three Bags of sixty Marbles.

Example 4 - rearrange matched pattern groups

This is similar to Example 2 above, except that this time not only do we match groups of characters, we rearrange them. Our search string is different this time and represents a contact list. We are going to rearrange the format from Firstname, Lastname to the more sensible Lastname, Firstname and then enclose the contact number state area code in parentheses. To do this, our pattern string has some added complexity.

The pattern matches the first word group as a group of word characters followed by a comma (\w+), and does exactly the same again for the second word group, and finally matches the two digits of the area code as a group (\d{2}). The curly braces enclosing the number {2} after the digit metacharacter specifies that we want to match exactly two digits which comprise the area code.

The template string rearranges the data groups by switching the order of the first two word groups and adding parentheses around the third group which comprises the area code number.

Code

Program regex_ex4;

{$mode objfpc}{$H+}
{$modeswitch objectivec2}

Uses
  MacOSAll, CocoaAll, SysUtils;

Var
  srchStr : NSString;
  srchStr2: NSMutableString;
  patnStr : NSString;
  tmplStr : NSString;
  myRegex : NSregularExpression;
  matches : NSArray;
  match   : NSTextCheckingResult;
  error   : NSErrorPtr;

Begin
  error   := Nil;
  srchStr := NSStr('Firstname, Lastname, 02 9428 4687');
  srchStr2:= NSMutableString.stringWithString(srchStr);
  patnStr := NSStr('(\w+), (\w+), (\d{2})');
  tmplStr := NSStr('$2, $1, ($3)');

  // Create a regular expression with given string and options
  myRegex := NSregularExpression.regularExpressionWithPattern_options_error(patnStr, NSRegularExpressionOptions(NSRegularExpressionCaseInsensitive), error);

  // Check creation of regular expression with given string and options
  if(error <> Nil) then
    begin
      NSLog(NSStr('Regex creation error: %@'), error);
      Exit;
    end;

  // Save any matches in the given string in the matches array
  matches := myRegex.matchesInString_options_range(srchStr, 0, NSMakeRange(0, srchStr.Length));

  // Output
  NSLog(NSStr('Search string: %@'), srchStr);
  NSLog(NSStr('Pattern string: %@'), patnStr);
  NSLog(NSStr('Template string: %@'), tmplStr);
  NSLog(NSStr('Number of matches: %lu'), myRegex.numberOfMatchesInString_options_range(srchStr, 0, NSMakeRange(0, srchStr.Length)));

  for match in matches do
    begin
      NSLog(NSStr('match(1): %@'), srchStr.substringWithRange(match.rangeAtIndex(1)));
      NSLog(NSStr('match(2): %@'), srchStr.substringWithRange(match.rangeAtIndex(2)));
      NSLog(NSStr('match(3): %@'), srchStr.substringWithRange(match.rangeAtIndex(3)));
      // Do the replacements
      myRegex.replaceMatchesInString_options_range_withTemplate(srchStr2, NSRegularExpressionOptions(Nil), match.range, tmplStr);
    end;

   NSLog(NSStr('Result: %@'), srchStr2);
End.

Output

2021-06-15 20:37:54.880 program4[23076:159228] Search string: Firstname, Lastname, 02 9428 4687
2021-06-15 20:37:54.880 program4[23076:159228] Pattern string: (\w+), (\w+), (\d{2})
2021-06-15 20:37:54.880 program4[23090:159932] Template string: $2, $1, ($3)
2021-06-15 20:37:54.881 program4[23076:159228] Number of matches: 1
2021-06-15 20:37:54.881 program4[23076:159228] match(1): Firstname
2021-06-15 20:37:54.881 program4[23076:159228] match(2): Lastname
2021-06-15 20:37:54.881 program4[23076:159228] match(3): 02
2021-06-15 20:37:54.881 program4[23076:159228] Result: Lastname, Firstname, (02) 9428 4687

See also

External links