Initial commit of quickbook conversion of docs.

[SVN r37942]
2007-06-08 09:13:34 +00:00
parent f4877f6698
commit 5f96b68080
52 changed files with 8859 additions and 0 deletions
--- a/doc/introduction.qbk
+++ b/doc/introduction.qbk
@ -0,0 +1,155 @@
+[section Introduction and Overview]
+
+Regular expressions are a form of pattern-matching that are often used in 
+text processing; many users will be familiar with the Unix utilities grep, sed  
+and awk, and the programming language Perl, each of which make extensive use 
+of regular expressions. Traditionally C++ users have been limited to the 
+POSIX C API's for manipulating regular expressions, and while Boost.Regex does 
+provide these API's, they do not represent the best way to use the library. 
+For example Boost.Regex can cope with wide character strings, or search and 
+replace operations (in a manner analogous to either sed or Perl), something 
+that traditional C libraries can not do.
+
+The class [basic_regex] is the key class in this library; it represents a 
+"machine readable" regular expression, and is very closely modeled on 
+`std::basic_string`, think of it as a string plus the actual state-machine 
+required by the regular expression algorithms. Like `std::basic_string` there 
+are two typedefs that are almost always the means by which this class is referenced:
+
+   namespace boost{
+
+   template <class charT, 
+            class traits = regex_traits<charT> >
+   class basic_regex;
+
+   typedef basic_regex<char> regex;
+   typedef basic_regex<wchar_t> wregex;
+
+   }
+
+To see how this library can be used, imagine that we are writing a credit 
+card processing application. Credit card numbers generally come as a string 
+of 16-digits, separated into groups of 4-digits, and separated by either a 
+space or a hyphen. Before storing a credit card number in a database 
+(not necessarily something your customers will appreciate!), we may want to 
+verify that the number is in the correct format. To match any digit we could 
+use the regular expression [0-9], however ranges of characters like this are 
+actually locale dependent. Instead we should use the POSIX standard 
+form \[\[:digit:\]\], or the Boost.Regex and Perl shorthand for this \\d (note 
+that many older libraries tended to be hard-coded to the C-locale, 
+consequently this was not an issue for them). That leaves us with the 
+following regular expression to validate credit card number formats:
+
+[pre (\d{4}[- ]){3}\d{4}]
+
+Here the parenthesis act to group (and mark for future reference) 
+sub-expressions, and the {4} means "repeat exactly 4 times". This is an 
+example of the extended regular expression syntax used by Perl, awk and egrep. 
+Boost.Regex also supports the older "basic" syntax used by sed and grep, 
+but this is generally less useful, unless you already have some basic regular 
+expressions that you need to reuse.
+
+Now let's take that expression and place it in some C++ code to validate the 
+format of a credit card number:
+
+   bool validate_card_format(const std::string& s)
+   {
+      static const boost::regex e("(\\d{4}[- ]){3}\\d{4}");
+      return regex_match(s, e);
+   }
+
+Note how we had to add some extra escapes to the expression: remember that 
+the escape is seen once by the C++ compiler, before it gets to be seen by 
+the regular expression engine, consequently escapes in regular expressions 
+have to be doubled up when embedding them in C/C++ code. Also note that 
+all the examples assume that your compiler supports argument-dependent-lookup
+lookup, if yours doesn't (for example VC6), then you will have to add some 
+`boost::` prefixes to some of the function calls in the examples.
+
+Those of you who are familiar with credit card processing, will have realized 
+that while the format used above is suitable for human readable card numbers, 
+it does not represent the format required by online credit card systems; these 
+require the number as a string of 16 (or possibly 15) digits, without any 
+intervening spaces. What we need is a means to convert easily between the two 
+formats, and this is where search and replace comes in. Those who are familiar 
+with the utilities sed and Perl will already be ahead here; we need two 
+strings - one a regular expression - the other a "format string" that provides 
+a description of the text to replace the match with. In Boost.Regex this 
+search and replace operation is performed with the algorithm [regex_replace], 
+for our credit card example we can write two algorithms like this to 
+provide the format conversions:
+
+   // match any format with the regular expression:
+   const boost::regex e("\\A(\\d{3,4})[- ]?(\\d{4})[- ]?(\\d{4})[- ]?(\\d{4})\\z");
+   const std::string machine_format("\\1\\2\\3\\4");
+   const std::string human_format("\\1-\\2-\\3-\\4");
+
+   std::string machine_readable_card_number(const std::string s)
+   {
+      return regex_replace(s, e, machine_format, boost::match_default | boost::format_sed);
+   }
+
+   std::string human_readable_card_number(const std::string s)
+   {
+      return regex_replace(s, e, human_format, boost::match_default | boost::format_sed);
+   }
+
+Here we've used marked sub-expressions in the regular expression to split out 
+the four parts of the card number as separate fields, the format string then 
+uses the sed-like syntax to replace the matched text with the reformatted version.
+
+In the examples above, we haven't directly manipulated the results of 
+a regular expression match, however in general the result of a match contains 
+a number of sub-expression matches in addition to the overall match. When the 
+library needs to report a regular expression match it does so using an instance 
+of the class [match_results], as before there are typedefs of this class for 
+the most common cases:
+
+   namespace boost{
+
+   typedef match_results<const char*>                  cmatch;
+   typedef match_results<const wchar_t*>               wcmatch;
+   typedef match_results<std::string::const_iterator>  smatch;
+   typedef match_results<std::wstring::const_iterator> wsmatch; 
+
+   }
+
+The algorithms [regex_search] and [regex_match] make use of [match_results] 
+to report what matched; the difference between these algorithms is that 
+[regex_match] will only find matches that consume /all/ of the input text, 
+where as [regex_search] will search for a match anywhere within the text being matched.
+
+Note that these algorithms are not restricted to searching regular C-strings, 
+any bidirectional iterator type can be searched, allowing for the 
+possibility of seamlessly searching almost any kind of data.
+
+For search and replace operations, in addition to the algorithm [regex_replace] 
+that we have already seen, the [match_results] class has a `format` member that 
+takes the result of a match and a format string, and produces a new string 
+by merging the two.
+
+For iterating through all occurences of an expression within a text, 
+there are two iterator types: [regex_iterator] will enumerate over the 
+[match_results] objects found, while [regex_token_iterator] will enumerate 
+a series of strings (similar to perl style split operations).
+
+For those that dislike templates, there is a high level wrapper class 
+[RegEx] that is an encapsulation of the lower level template code - it 
+provides a simplified interface for those that don't need the full 
+power of the library, and supports only narrow characters, and the 
+"extended" regular expression syntax. This class is now deprecated as 
+it does not form part of the regular expressions C++ standard library proposal.
+
+The POSIX API functions: [regcomp], [regexec], [regfree] and [regerr], 
+are available in both narrow character and Unicode versions, and are 
+provided for those who need compatibility with these API's.
+
+Finally, note that the library now has 
+[link boost_regex.background_information.locale run-time localization support], 
+and recognizes the full POSIX regular expression syntax - including 
+advanced features like multi-character collating elements and equivalence 
+classes - as well as providing compatibility with other regular expression 
+libraries including GNU and BSD4 regex packages, PCRE and Perl 5. 
+
+[endsect]
+