Initial commit of quickbook conversion of docs.

[SVN r37942]
2007-06-08 09:13:34 +00:00
parent f4877f6698
commit 5f96b68080
52 changed files with 8859 additions and 0 deletions
--- a/doc/syntax_perl.qbk
+++ b/doc/syntax_perl.qbk
@ -0,0 +1,508 @@
+
+[section:perl_syntax Perl Regular Expression Syntax]
+
+[h3 Synopsis]
+
+The Perl regular expression syntax is based on that used by the 
+programming language Perl .  Perl regular expressions are the 
+default behavior in Boost.Regex or you can pass the flag `perl` to the 
+[basic_regex] constructor, for example:
+
+   // e1 is a case sensitive Perl regular expression: 
+   // since Perl is the default option there's no need to explicitly specify the syntax used here:
+   boost::regex e1(my_expression);
+   // e2 a case insensitive Perl regular expression:
+   boost::regex e2(my_expression, boost::regex::perl|boost::regex::icase);
+
+[h3 Perl Regular Expression Syntax]
+
+In Perl regular expressions, all characters match themselves except for the 
+following special characters:
+
+[pre .\[{()\\\*+?|^$]
+
+[h4 Wildcard]
+
+The single character '.' when used outside of a character set will match 
+any single character except:
+
+* The NULL character when the [link boost_regex.ref.match_flag_type flag 
+   `match_no_dot_null`] is passed to the matching algorithms.
+* The newline character when the [link boost_regex.ref.match_flag_type 
+   flag `match_not_dot_newline`] is passed to 
+   the matching algorithms.
+   
+[h4 Anchors]
+
+A '^' character shall match the start of a line.
+
+A '$' character shall match the end of a line.
+
+[h4 Marked sub-expressions]
+
+A section beginning `(` and ending `)` acts as a marked sub-expression.  
+Whatever matched the sub-expression is split out in a separate field by 
+the matching algorithms.  Marked sub-expressions can also repeated, or 
+referred to by a back-reference.
+
+[h4 Non-marking grouping]
+
+A marked sub-expression is useful to lexically group part of a regular 
+expression, but has the side-effect of spitting out an extra field in 
+the result.  As an alternative you can lexically group part of a 
+regular expression, without generating a marked sub-expression by using 
+`(?:` and `)` , for example `(?:ab)+` will repeat `ab` without splitting 
+out any separate sub-expressions.
+
+[h4 Repeats]
+
+Any atom (a single character, a marked sub-expression, or a character class) 
+can be repeated with the `*`, `+`, `?`, and `{}` operators.
+
+The `*` operator will match the preceding atom zero or more times, 
+for example the expression `a*b` will match any of the following:
+
+   b
+   ab
+   aaaaaaaab
+
+The `+` operator will match the preceding atom one or more times, for 
+example the expression `a+b` will match any of the following:
+
+   ab
+   aaaaaaaab
+
+But will not match:
+
+   b
+
+The `?` operator will match the preceding atom zero or one times, for 
+example the expression ca?b will match any of the following:
+
+   cb
+   cab
+
+But will not match:
+
+   caab
+
+An atom can also be repeated with a bounded repeat:
+
+`a{n}`  Matches 'a' repeated exactly n times.
+
+`a{n,}`  Matches 'a' repeated n or more times.
+
+`a{n, m}`  Matches 'a' repeated between n and m times inclusive.
+
+For example:
+
+[pre ^a{2,3}$]
+
+Will match either of:
+
+   aa
+   aaa
+
+But neither of:
+
+   a
+   aaaa
+
+It is an error to use a repeat operator, if the preceding construct can not 
+be repeated, for example:
+
+   a(*)
+
+Will raise an error, as there is nothing for the `*` operator to be applied to.
+
+[h4 Non greedy repeats]
+
+The normal repeat operators are "greedy", that is to say they will consume as 
+much input as possible.  There are non-greedy versions available that will 
+consume as little input as possible while still producing a match.
+
+`*?` Matches the previous atom zero or more times, while consuming as little 
+   input as possible.
+
+`+?` Matches the previous atom one or more times, while consuming as 
+   little input as possible.
+
+`??` Matches the previous atom zero or one times, while consuming 
+   as little input as possible.
+
+`{n,}?` Matches the previous atom n or more times, while consuming as 
+   little input as possible.
+
+`{n,m}?` Matches the previous atom between n and m times, while 
+   consuming as little input as possible.
+   
+[h4 Back references]
+
+An escape character followed by a digit /n/, where /n/ is in the range 1-9, 
+matches the same string that was matched by sub-expression /n/.  For example 
+the expression:
+
+[pre ^(a\*).\*\\1$]
+
+Will match the string:
+
+   aaabbaaa
+
+But not the string:
+
+   aaabba
+
+[h4 Alternation]
+
+The `|` operator will match either of its arguments, so for example: 
+`abc|def` will match either "abc" or "def". 
+
+Parenthesis can be used to group alternations, for example: `ab(d|ef)` 
+will match either of "abd" or "abef".
+
+Empty alternatives are not allowed (these are almost always a mistake), but 
+if you really want an empty alternative use `(?:)` as a placeholder, for example:
+
+`|abc` is not a valid expression, but
+
+`(?:)|abc` is and is equivalent, also the expression:
+
+`(?:abc)??` has exactly the same effect.
+
+[h4 Character sets]
+
+A character set is a bracket-expression starting with `[` and ending with `]`, 
+it defines a set of characters, and matches any single character that is a 
+member of that set.
+
+A bracket expression may contain any combination of the following:
+
+[h5 Single characters]
+
+For example `[abc]`, will match any of the characters 'a', 'b', or 'c'.
+
+[h5 Character ranges]
+
+For example `[a-c]` will match any single character in the range 'a' to 'c'.  
+By default, for Perl regular expressions, a character x is within the 
+range y to z, if the code point of the character lies within the codepoints of
+the endpoints of the range.  Alternatively, if you set the 
+[link boost_regex.ref.syntax_option_type.syntax_option_type_perl `collate` flag] 
+when constructing the regular expression, then ranges are locale sensitive.
+
+[h5 Negation]
+
+If the bracket-expression begins with the ^ character, then it matches the 
+complement of the characters it contains, for example `[^a-c]` matches 
+any character that is not in the range `a-c`.
+
+[h5 Character classes]
+
+An expression of the form `[[:name:]]` matches the named character class 
+"name", for example `[[:lower:]]` matches any lower case character.  
+See [link boost_regex.syntax.character_classes character class names].
+
+[h5 Collating Elements]
+
+An expression of the form `[[.col.]` matches the collating element /col/.  
+A collating element is any single character, or any sequence of characters 
+that collates as a single unit.  Collating elements may also be used 
+as the end point of a range, for example: `[[.ae.]-c]` matches the 
+character sequence "ae", plus any single character in the range "ae"-c, 
+assuming that "ae" is treated as a single collating element in the current locale.
+
+As an extension, a collating element may also be specified via it's 
+[link boost_regex.syntax.collating_names symbolic name], for example:
+
+   [[.NUL.]]
+
+matches a `\0` character.
+
+[h5 Equivalence classes]
+
+An expression of the form `[[=col=]]`, matches any character or collating element 
+whose primary sort key is the same as that for collating element /col/, as with 
+collating elements the name /col/ may be a 
+[link boost_regex.syntax.collating_names symbolic name].  A primary sort key is 
+one that ignores case, accentation, or locale-specific tailorings; so for 
+example `[[=a=]]` matches any of the characters: 
+a, '''&#xC0;''', '''&#xC1;''', '''&#xC2;''', 
+'''&#xC3;''', '''&#xC4;''', '''&#xC5;''', A, '''&#xE0;''', '''&#xE1;''', 
+'''&#xE2;''', '''&#xE3;''', '''&#xE4;''' and '''&#xE5;'''.  
+Unfortunately implementation of this is reliant on the platform's collation 
+and localisation support; this feature can not be relied upon to work portably 
+across all platforms, or even all locales on one platform.
+
+[h5 Escaped Characters]
+
+All the escape sequences that match a single character, or a single character 
+class are permitted within a character class definition.  For example
+`[\[\]]` would match either of `[` or `]` while `[\W\d]` would match any character
+that is either a "digit", /or/ is /not/ a "word" character.
+
+[h5 Combinations]
+
+All of the above can be combined in one character set declaration, for example: 
+`[[:digit:]a-c[.NUL.]]`.
+
+[h4 Escapes]
+
+Any special character preceded by an escape shall match itself.
+
+The following escape sequences are all synonyms for single characters:
+
+[table
+[[Escape][Character]]
+[[`\a`][`\a`]]
+[[`\e`][`0x1B`]]
+[[`\f`][`\f`]]
+[[`\n`][`\n`]]
+[[`\r`][`\r`]]
+[[`\t`][`\t`]]
+[[`\v `][`\v`]]
+[[`\b`][`\b` (but only inside a character class declaration).]]
+[[`\cX`][An ASCII escape sequence - the character whose code point is X % 32]]
+[[`\xdd`][A hexadecimal escape sequence - matches the single character whose 
+      code point is 0xdd.]]
+[[`\x{dddd}`][A hexadecimal escape sequence - matches the single character whose 
+      code point is 0xdddd.]]
+[[`\0ddd`][An octal escape sequence - matches the single character whose 
+   code point is 0ddd.]]
+[[`\N{name}`][Matches the single character which has the 
+      [link boost_regex.syntax.collating_names symbolic name] /name/.  
+      For example `\N{newline}` matches the single character \\n.]]
+]      
+ 
+[h5 "Single character" character classes:]
+
+Any escaped character /x/, if /x/ is the name of a character class shall 
+match any character that is a member of that class, and any 
+escaped character /X/, if /x/ is the name of a character class, shall 
+match any character not in that class.
+
+The following are supported by default:
+
+[table
+[[Escape sequence][Equivalent to]]
+[[`\d`][`[[:digit:]]`]]
+[[`\l`][`[[:lower:]]`]]
+[[`\s`][`[[:space:]]`]]
+[[`\u`][`[[:upper:]]`]]
+[[`\w`][`[[:word:]]`]]
+[[`\D`][`[^[:digit:]]`]]
+[[`\L`][`[^[:lower:]]`]]
+[[`\S`][`[^[:space:]]`]]
+[[`\U`][`[^[:upper:]]`]]
+[[`\W`][`[^[:word:]]`]]
+]
+
+[h5 Character Properties]
+
+The character property names in the following table are all equivalent 
+to the [link boost_regex.syntax.character_classes names used in character classes].
+
+[table
+[[Form][Description][Equivalent character set form]]
+[[`\pX`][Matches any character that has the property X.][`[[:X:]]`]]
+[[`\p{Name}`][Matches any character that has the property Name.][`[[:Name:]]`]]
+[[`\PX`][Matches any character that does not have the property X.][`[^[:X:]]`]]
+[[`\P{Name}`][Matches any character that does not have the property Name.][`[^[:Name:]]`]]
+]
+
+For example `\pd` matches any "digit" character, as does `\p{digit}`.
+
+[h5 Word Boundaries]
+
+The following escape sequences match the boundaries of words:
+
+`\<` 	Matches the start of a word.
+
+`\>` 	Matches the end of a word.
+
+`\b` 	Matches a word boundary (the start or end of a word).
+
+`\B` 	Matches only when not at a word boundary.
+
+[h5 Buffer boundaries]
+
+The following match only at buffer boundaries: a "buffer" in this 
+context is the whole of the input text that is being matched against 
+(note that ^ and $ may match embedded newlines within the text).
+
+\\\` 	Matches at the start of a buffer only.
+
+\\' 	Matches at the end of a buffer only.
+
+\\A 	Matches at the start of a buffer only (the same as \\\`).
+
+\\z 	Matches at the end of a buffer only (the same as \\').
+
+\\Z 	Matches an optional sequence of newlines at the end of a buffer: 
+equivalent to the regular expression `\n*\z`
+
+[h5 Continuation Escape]
+
+The sequence `\G` matches only at the end of the last match found, or at 
+the start of the text being matched if no previous match was found.  
+This escape useful if you're iterating over the matches contained within a 
+text, and you want each subsequence match to start where the last one ended.
+
+[h5 Quoting escape]
+
+The escape sequence `\Q` begins a "quoted sequence": all the subsequent characters 
+are treated as literals, until either the end of the regular expression or \\E 
+is found.  For example the expression: `\Q\*+\Ea+` would match either of:
+
+    \*+a
+    \*+aaa
+
+[h5 Unicode escapes]
+
+`\C` 	Matches a single code point: in Boost regex this has exactly the 
+   same effect as a "." operator.
+`\X` 	Matches a combining character sequence: that is any non-combining 
+      character followed by a sequence of zero or more combining characters.
+    
+[h5 Any other escape]
+
+Any other escape sequence matches the character that is escaped, for example 
+\\@ matches a literal '@'.
+
+[h4 Perl Extended Patterns]
+
+Perl-specific extensions to the regular expression syntax all start with `(?`.
+
+[h5 Comments]
+
+`(?# ... )` is treated as a comment, it's contents are ignored.
+
+[h5 Modifiers]
+
+`(?imsx-imsx ... )` alters which of the perl modifiers are in effect within 
+the pattern, changes take effect from the point that the block is first seen 
+and extend to any enclosing `)`.  Letters before a '-' turn that perl 
+modifier on, letters afterward, turn it off.
+
+`(?imsx-imsx:pattern)` applies the specified modifiers to pattern only.
+
+[h5 Non-marking groups]
+
+`(?:pattern)` lexically groups pattern, without generating an additional 
+sub-expression.
+
+[h5 Lookahead]
+
+`(?=pattern)` consumes zero characters, only if pattern matches.
+
+`(?!pattern)` consumes zero characters, only if pattern does not match.
+
+Lookahead is typically used to create the logical AND of two regular 
+expressions, for example if a password must contain a lower case letter, 
+an upper case letter, a punctuation symbol, and be at least 6 characters long, 
+then the expression:
+
+    (?=.*[[:lower:]])(?=.*[[:upper:]])(?=.*[[:punct:]]).{6,}
+
+could be used to validate the password.
+
+[h5 Lookbehind]
+
+`(?<=pattern)` consumes zero characters, only if pattern could be matched 
+against the characters preceding the current position (pattern must be 
+of fixed length).
+
+`(?<!pattern)` consumes zero characters, only if pattern could not be 
+matched against the characters preceding the current position (pattern must 
+be of fixed length).
+
+[h5 Independent sub-expressions]
+
+`(?>pattern)` /pattern/ is matched independently of the surrounding patterns, 
+the expression will never backtrack into /pattern/.  Independent sub-expressions 
+are typically used to improve performance; only the best possible match 
+for pattern will be considered, if this doesn't allow the expression as a 
+whole to match then no match is found at all.
+
+[h5 Conditional Expressions]
+
+`(?(condition)yes-pattern|no-pattern)` attempts to match /yes-pattern/ if 
+the /condition/ is true, otherwise attempts to match /no-pattern/.
+
+`(?(condition)yes-pattern)` attempts to match /yes-pattern/ if the /condition/ 
+is true, otherwise fails.
+
+/condition/ may be either a forward lookahead assert, or the index of 
+a marked sub-expression (the condition becomes true if the sub-expression 
+has been matched).
+
+[h4 Operator precedence]
+
+The order of precedence for of operators is as follows:
+
+# Collation-related bracket symbols 	`[==] [::] [..]`
+# Escaped characters 	`\`
+# Character set (bracket expression) 	`[]`
+# Grouping 	`()`
+# Single-character-ERE duplication 	`* + ? {m,n}`
+# Concatenation 	
+# Anchoring 	^$
+# Alternation 	|
+
+[h3 What gets matched]
+
+If you view the regular expression as a directed (possibly cyclic) 
+graph, then the best match found is the first match found by a 
+depth-first-search performed on that graph, while matching the input text.
+
+Alternatively:
+
+The best match found is the 
+[link boost_regex.syntax.leftmost_longest_rule leftmost match], 
+with individual elements matched as follows;
+
+[table
+[[Construct][What gets matched]]
+[[`AtomA AtomB`][Locates the best match for /AtomA/ that has a following match for /AtomB/.]]
+[[`Expression1 | Expression2`][If /Expresion1/ can be matched then returns that match, 
+   otherwise attempts to match /Expression2/.]]
+[[`S{N}`][Matches /S/ repeated exactly N times.]]
+[[`S{N,M}`][Matches S repeated between N and M times, and as many times as possible.]]
+[[`S{N,M}?`][Matches S repeated between N and M times, and as few times as possible.]]
+[[`S?, S*, S+`][The same as `S{0,1}`, `S{0,UINT_MAX}`, `S{1,UINT_MAX}` respectively.]]
+[[`S??, S*?, S+?`][The same as `S{0,1}?`, `S{0,UINT_MAX}?`, `S{1,UINT_MAX}?` respectively.]]
+[[`(?>S)`][Matches the best match for /S/, and only that.]]
+[[`(?=S), (?<=S)`][Matches only the best match for /S/ (this is only 
+      visible if there are capturing parenthesis within /S/).]]
+[[`(?!S), (?<!S)`][Considers only whether a match for S exists or not.]]
+[[`(?(condition)yes-pattern | no-pattern)`][If condition is true, then 
+   only yes-pattern is considered, otherwise only no-pattern is considered.]]
+]
+
+[h3 Variations]
+
+The [link boost_regex.ref.syntax_option_type.syntax_option_type_perl options `normal`, 
+`ECMAScript`, `JavaScript` and `JScript`] are all synonyms for 
+`perl`.
+
+[h3 Options]
+
+There are a [link boost_regex.ref.syntax_option_type.syntax_option_type_perl 
+variety of flags] that may be combined with the `perl` option when 
+constructing the regular expression, in particular note that the 
+`newline_alt` option alters the syntax, while the `collate`, `nosubs` and 
+`icase` options modify how the case and locale sensitivity are to be applied.
+
+[h3 Pattern Modifiers]
+
+The perl `smix` modifiers can either be applied using a `(?smix-smix)` 
+prefix to the regular expression, or with one of the 
+[link boost_regex.ref.syntax_option_type.syntax_option_type_perl regex-compile time 
+flags `no_mod_m`, `mod_x`, `mod_s`, and `no_mod_s`].
+
+[h3 References]
+
+[@http://perldoc.perl.org/perlre.html Perl 5.8].
+
+
+[endsect]
+