|
Location: Desktop development - C/C++ License: The Intelliproject Open License (IPOL) How to use Regular Expressions in Visual C++Posted by Silviu CarageaCATLRegExp - Microsoft regular expression implementation. |
Skill: IntermediatePosted: 04/10/2008Views: 3414Rating: 2.33 /5Popularity: 1.11 |
| Sign Up to vote for this article |
Regular expressions provide a concise and flexible means for identifying strings of text of interest, such as particular characters, words, or patterns of characters. Regular expressions are written in a formal language that can be interpreted by a regular expression processor, a program that either serves as a parser generator or examines text and identifies parts that match the provided specification.
Most people prefer to use Boost::Regex or PCRE (Perl Compatible Regular Expressions) when they need to use regular expressions in a C++ project.
Microsoft has its own regular expression implementation as part of the ATL , and it is called CAtlRegExp.
In this article I will show you how we can use CAtlRegExp in ours projects.
The following table lists the metacharacters understood by CAtlRegExp.
| Metacharacters | Meaning |
| . | Matches any single character. |
| [] | Indicates a character class. Matches any character inside the brackets (for example, [abc] matches "a", "b", and "c"). |
| ^ | If this metacharacter occurs at the start of a character class, it negates the character class. A negated character class matches any character except those inside the brackets (for example, [^abc] matches all characters except "a", "b", and "c"). If ^ is at the beginning of the regular expression, it matches the beginning of the input (for example, ^[abc] will only match input that begins with "a", "b", or "c"). |
| - | In a character class, indicates a range of characters (for example, [0-9] matches any of the digits "0" through "9"). |
| ? | Indicates that the preceding expression is optional: it matches once or not at all (for example, [0-9][0-9]? matches "2" and "12"). |
| + | Indicates that the preceding expression matches one or more times (for example, [0-9]+ matches "1", "13", "666", and so on). |
| * | Indicates that the preceding expression matches zero or more times. |
| ??,+?,*? | Non-greedy versions of ?, +, and *. These match as little as possible, unlike the greedy versions which match as much as possible. Example: given the input "<abc><def>", <.*?> matches "<abc>" while <.*> matches "<abc><def>". |
| ( ) |
Grouping operator. Example: (\d+,)*\d+ matches a list of numbers separated by commas (such as "1" or "1,23,456"). |
| {} | Indicates a match group. The actual text in the input that matches the expression inside the braces can be retrieved through the CAtlREMatchContext object. |
| \ | Escape character: interpret the next character literally (for example, [0-9]+ matches one or more digits, but [0-9]\+ matches a digit followed by a plus character). Also used for abbreviations (such as \a for any alphanumeric character; see table below). If \ is followed by a number n, it matches the nth match group (starting from 0). Example: <{.*?}>.*?</\0> matches "<head>Contents</head>". Note that in C++ string literals, two backslashes must be used: "\\+", "\\a", "<{.*?}>.*?</\\0>". |
| $ | At the end of a regular expression, this character matches the end of the input. Example: [0-9]$ matches a digit at the end of the input. |
| | | Alternation operator: separates two expressions, exactly one of which matches (for example, T|the matches "The" or "the"). |
| ! | Negation operator: the expression following ! does not match the input. Example: a!b matches "a" not followed by "b". |
CAtlRegExp can handle abbreviations, such as \d instead of [0-9]. The abbreviations are provided by the character traits class passed in the CharTraits parameter. The predefined character traits classes provide the following abbreviations.
| Abbreviation | Matches |
| \a | Any alphanumeric character: ([a-zA-Z0-9]) |
| \b | White space (blank): ([ \\t]) |
| \c | Any alphabetic character: ([a-zA-Z]) |
| \d | Any decimal digit: ([0-9]) |
| \h | Any hexadecimal digit: ([0-9a-fA-F]) |
| \n | Newline: (\r|(\r?\n)) |
| \q | A quoted string: (\"[^\"]*\")|(\'[^\']*\') |
| \w | A simple word: ([a-zA-Z]+) |
| \z | An integer: ([0-9]+) |

To use CAtlRegExp in your projects you should include atlrx.h (#include <atlrx.h>).
In the source code attached to this article I built a basic class based on CAtlRegExp which can be useful in ours future projects.
This article, along with any associated source code and files, is licensed under The Intelliproject Open License (IPOL)
| Silviu Caragea
| Silviu Caragea is the Founder, Administrator and Chief Editor who wrote and runs The IntelliProject. He's been programming since 2000 and now he's student at The Faculty of Economic Cybernetics, Statistics and Informatics from Bucharest. In the same time he's working as software developer at Cratima Software, a Romanian software and web design company that activates both on the local and foreign market, providing its customers with software development services, internet and intranet solutions, web design, graphic design and IT consultancy. His programming experience includes: - C,C++, Visual C++(Win32 API, MFC, ADO, STL, DAO, ODBC, ATL, COM, DirectShow, DirectDraw, WTL) - Open Source libraries :CURL & Boost - HTML, CSS - Java (SE,ME) - JavaScript, Ajax, Google Web Toolkit (GWT) - Php, MySQL -Oracle, PL SQL - C# .NET -Objective C, IPhone SDK, Cocoa Location: |
Sign up to post message on the article message board!