mirror of
				https://github.com/fmtlib/fmt.git
				synced 2025-10-29 05:01:46 +01:00 
			
		
		
		
	
		
			
				
	
	
		
			411 lines
		
	
	
		
			14 KiB
		
	
	
	
		
			HTML
		
	
	
	
	
	
			
		
		
	
	
			411 lines
		
	
	
		
			14 KiB
		
	
	
	
		
			HTML
		
	
	
	
	
	
| <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
 | |
|    "http://www.w3.org/TR/html4/strict.dtd">
 | |
| <html>
 | |
| <head>
 | |
| <meta http-equiv="Content-Type" content="text/html;charset=US-ASCII">
 | |
| <title>Text Formatting</title>
 | |
| 
 | |
| <style type="text/css">
 | |
| 
 | |
| body { color: #000000; background-color: #FFFFFF; }
 | |
| del { text-decoration: line-through; color: #8B0040; }
 | |
| ins { text-decoration: underline; color: #005100; }
 | |
| 
 | |
| p.example { margin-left: 2em; }
 | |
| pre.example { margin-left: 2em; }
 | |
| div.example { margin-left: 2em; }
 | |
| 
 | |
| code.extract { background-color: #F5F6A2; }
 | |
| pre.extract { margin-left: 2em; background-color: #F5F6A2;
 | |
|   border: 1px solid #E1E28E; }
 | |
| 
 | |
| p.function { }
 | |
| .attribute { margin-left: 2em; }
 | |
| .attribute dt { float: left; font-style: italic;
 | |
|   padding-right: 1ex; }
 | |
| .attribute dd { margin-left: 0em; }
 | |
| 
 | |
| blockquote.std { color: #000000; background-color: #F1F1F1;
 | |
|   border: 1px solid #D1D1D1;
 | |
|   padding-left: 0.5em; padding-right: 0.5em; }
 | |
| blockquote.stddel { text-decoration: line-through;
 | |
|   color: #000000; background-color: #FFEBFF;
 | |
|   border: 1px solid #ECD7EC;
 | |
|   padding-left: 0.5empadding-right: 0.5em; ; }
 | |
| 
 | |
| blockquote.stdins { text-decoration: underline;
 | |
|   color: #000000; background-color: #C8FFC8;
 | |
|   border: 1px solid #B3EBB3; padding: 0.5em; }
 | |
| 
 | |
| table { border: 1px solid black; border-spacing: 0px;
 | |
|   margin-left: auto; margin-right: auto; }
 | |
| th { text-align: left; vertical-align: top;
 | |
|   padding-left: 0.8em; border: none; }
 | |
| td { text-align: left; vertical-align: top;
 | |
|   padding-left: 0.8em; border: none; }
 | |
| 
 | |
| </style>
 | |
| 
 | |
| </head>
 | |
| <body>
 | |
| <h1>Text Formatting</h1>
 | |
| 
 | |
| <p>
 | |
| 2016-08-19
 | |
| </p>
 | |
| 
 | |
| <address>
 | |
| Victor Zverovich, victor.zverovich@gmail.com
 | |
| </address>
 | |
| 
 | |
| <p>
 | |
| <a href="#Introduction">Introduction</a><br>
 | |
| <a href="#Design">Design</a><br>
 | |
|     <a href="#Syntax">Format String Syntax</a><br>
 | |
|     <a href="#Extensibility">Extensibility</a><br>
 | |
|     <a href="#Safety">Safety</a><br>
 | |
|     <a href="#Locale">Locale Support</a><br>
 | |
|     <a href="#PosArguments">Positional Arguments</a><br>
 | |
|     <a href="Footprint">Binary Footprint</a><br>
 | |
| <a href="#Wording">Proposed Wording</a><br>
 | |
| <a href="#References">References</a><br>
 | |
| </p>
 | |
| 
 | |
| <h2><a name="Introduction">Introduction</a></h2>
 | |
| 
 | |
| <p>
 | |
| This paper proposes a new text formatting functionality that can be used as a
 | |
| safe and extensible alternative to the <code>printf</code> family of functions.
 | |
| It is intended to complement the existing C++ I/O streams library and reuse
 | |
| some of its infrastructure such as overloaded insertion operators for
 | |
| user-defined types.
 | |
| </p>
 | |
| 
 | |
| <p>
 | |
| Example:
 | |
| 
 | |
| <pre class="example">
 | |
| <code>std::string message = std::format("The answer is {}.", 42);</code>
 | |
| </pre>
 | |
| 
 | |
| <h2><a name="Design">Design</a></h2>
 | |
| 
 | |
| <h3><a name="Syntax">Format String Syntax</a></h3>
 | |
| 
 | |
| <p>
 | |
| Variations of the printf format string syntax are arguably the most popular
 | |
| among the programming languages and C++ itself inherits <code>printf</code>
 | |
| from C <a href="#1">[1]</a>. The advantage of the printf syntax is that many
 | |
| programmers are familiar with it. However, in its current form it has a number
 | |
| of issues:
 | |
| </p>
 | |
| 
 | |
| <ul>
 | |
| <li>Many format specifiers like <code>hh</code>, <code>h</code>, <code>l</code>,
 | |
|     <code>j</code>, etc. are used only to convey type information.
 | |
|     They are redundant in type-safe formatting and would unnecessarily
 | |
|     complicate specification and parsing.</li>
 | |
| <li>There is no standard way to extend the syntax for user-defined types.</li>
 | |
| <li>There are subtle differences between different implementations. For example,
 | |
|     POSIX positional arguments <a href="#2">[2]</a> are not supported on
 | |
|     some systems <a href="#6">[6]</a>.</li>
 | |
| <li>Using <code>'%'</code> in a custom format specifier, e.g. for
 | |
|     <code>put_time</code>-like time formatting, poses difficulties.</li>
 | |
| </ul>
 | |
| 
 | |
| <p>
 | |
| Although it is possible to address these issues, this will break compatibility
 | |
| and can potentially be more confusing to users than introducing a different
 | |
| syntax.
 | |
| </p>
 | |
| 
 | |
| </p>
 | |
| Therefore we propose a new syntax based on the ones used in Python
 | |
| <a href="#3">[3]</a>, the .NET family of languages <a href="#4">[4]</a>,
 | |
| and Rust <a href="#5">[5]</a>. This syntax employs <code>'{'</code> and
 | |
| <code>'}'</code> as replacement field delimiters instead of <code>'%'</code>
 | |
| and it is described in details in TODO:link. Here are some of the advantages:
 | |
| </p>
 | |
| 
 | |
| <ul>
 | |
| <li>Consistent and easy to parse mini-language focused on formatting rather
 | |
|     than conveying type information</li>
 | |
| <li>Extensibility and support for custom format strings for user-defined
 | |
|     types</li>
 | |
| <li>Positional arguments</li>
 | |
| <li>Support for both locale-specific and locale-independent formatting (see
 | |
|     <a href="#Locale">Locale Support</a>)</li>
 | |
| <li>Formatting improvements such as better alignment control, fill character,
 | |
|     and binary format
 | |
| </ul>
 | |
| 
 | |
| <p>
 | |
| The syntax is expressive enough to enable translation, possibly automated,
 | |
| of most printf format strings. The correspondence between <code>printf</code>
 | |
| and the new syntax is given in the following table.
 | |
| </p>
 | |
| 
 | |
| <table>
 | |
| <thead>
 | |
| <tr><th>printf</th><th>new</th></tr>
 | |
| </thead>
 | |
| <tbody>
 | |
| <tr><td>-</td><td><</td></tr>
 | |
| <tr><td>+</td><td>+</td></tr>
 | |
| <tr><td><em>space</em></td><td><em>space</em></td></tr>
 | |
| <tr><td>#</td><td>#</td></tr>
 | |
| <tr><td>0</td><td>0</td></tr>
 | |
| <tr><td>hh</td><td>unused</td></tr>
 | |
| <tr><td>h</td><td>unused</td></tr>
 | |
| <tr><td>l</td><td>unused</td></tr>
 | |
| <tr><td>ll</td><td>unused</td></tr>
 | |
| <tr><td>j</td><td>unused</td></tr>
 | |
| <tr><td>z</td><td>unused</td></tr>
 | |
| <tr><td>t</td><td>unused</td></tr>
 | |
| <tr><td>L</td><td>unused</td></tr>
 | |
| <tr><td>c</td><td>c (optional)</td></tr>
 | |
| <tr><td>s</td><td>s (optional)</td></tr>
 | |
| <tr><td>d</td><td>d (optional)</td></tr>
 | |
| <tr><td>i</td><td>d (optional)</td></tr>
 | |
| <tr><td>o</td><td>o</td></tr>
 | |
| <tr><td>x</td><td>x</td></tr>
 | |
| <tr><td>X</td><td>X</td></tr>
 | |
| <tr><td>u</td><td>d (optional)</td></tr>
 | |
| <tr><td>f</td><td>f</td></tr>
 | |
| <tr><td>F</td><td>F</td></tr>
 | |
| <tr><td>e</td><td>e</td></tr>
 | |
| <tr><td>E</td><td>E</td></tr>
 | |
| <tr><td>a</td><td>a</td></tr>
 | |
| <tr><td>A</td><td>A</td></tr>
 | |
| <tr><td>g</td><td>g (optional)</td></tr>
 | |
| <tr><td>G</td><td>G</td></tr>
 | |
| <tr><td>n</td><td>unused</td></tr>
 | |
| <tr><td>p</td><td>p (optional)</td></tr>
 | |
| </tbody>
 | |
| </table>
 | |
| 
 | |
| <p>
 | |
| Width and precision are represented similarly in <code>printf</code> and the
 | |
| proposed syntax with the only difference that runtime value is specified by
 | |
| <code>*</code> in the former and <code>{}</code> in the latter, possibly with
 | |
| the index of the argument inside the braces.
 | |
| </p>
 | |
| 
 | |
| <p>
 | |
| As can be seen from the table above, most of the specifiers remain the same
 | |
| which simplifies migration from <code>printf</code>. Notable difference is
 | |
| in the alignment specification. The proposed syntax allows left, center,
 | |
| and right alignment represented by <code>'<'</code>, <code>'^'</code>,
 | |
| and <code>'>'</code> respectively which is more expressive than the
 | |
| corresponding <code>printf</code> syntax. The latter only supports left and
 | |
| right (the default) alignment.
 | |
| </p>
 | |
| 
 | |
| <p>
 | |
| The following example uses center alignment and <code>'*'</code> as a fill
 | |
| character:
 | |
| </p>
 | |
| 
 | |
| <pre class="example">
 | |
| <code>std::format("{:*^30}", "centered");</code>
 | |
| </pre>
 | |
| 
 | |
| <p>
 | |
| resulting in <code>"***********centered***********"</code>.
 | |
| The same formatting cannot be easily achieved with <code>printf</code>.
 | |
| </p>
 | |
| 
 | |
| <h3><a name="Extensibility">Extensibility</a></h3>
 | |
| 
 | |
| <p>
 | |
| Both the format string syntax and the API are designed with extensibility in
 | |
| mind. The mini-language can be extended for user-defined types and users can
 | |
| provide functions that do parsing and formatting for such types.
 | |
| </p>
 | |
| 
 | |
| <p>The general syntax of a replacement field in a format string is
 | |
| 
 | |
| <pre>
 | |
| <code>replacement-field ::=  '{' [arg-id] [':' format-spec] '}'</code>
 | |
| </pre>
 | |
| 
 | |
| <p>
 | |
| where <code>format-spec</code> is predefined for built-in types, but can be
 | |
| customized for user-defined types. For example, the syntax can be extended
 | |
| for <code>put_time</code>-like date and time formatting
 | |
| </p>
 | |
| 
 | |
| <pre class="example">
 | |
| <code>std::time_t t = std::time(nullptr);
 | |
| std::string date = std::format("The date is {0:%Y-%m-%d}.", *std::localtime(&t));</code>
 | |
| </pre>
 | |
| 
 | |
| <p>by providing an overload of <code>std::format_arg</code> for
 | |
| <code>std::tm</code>:</p>
 | |
| 
 | |
| TODO: example
 | |
| 
 | |
| <h3><a name="Safety">Safety</a></h3>
 | |
| 
 | |
| Formatting functions rely on variadic templates instead of the mechanism
 | |
| provided by <code><cstdarg></code>. The type information is captured
 | |
| automatically and passed to formatters guaranteeing type safety and making
 | |
| many of the <code>printf</code> specifiers redundant (see <a href="#Syntax">
 | |
| Format String Syntax</a>). Buffer management is also automatic to prevent
 | |
| buffer overflow errors common to <code>printf</code>.
 | |
| 
 | |
| <h3><a name="Locale">Locale Support</a></h3>
 | |
| 
 | |
| <p>
 | |
| As pointed out in
 | |
| <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0067r1.html">
 | |
| P0067R1: Elementary string conversions</a> there is a number of use
 | |
| cases that do not require internationalization support, but do require high
 | |
| throughput when produced by a server. These include various text-based
 | |
| interchange formats such as JSON or XML. The need for locale-independent
 | |
| functions for conversions between integers and strings and between
 | |
| floating-point numbers and strings has also been highlighted in
 | |
| <a href="http://open-std.org/JTC1/SC22/WG21/docs/papers/2015/n4412.html">
 | |
| N4412: Shortcomings of iostreams</a>. Therefore a user should be able to
 | |
| easily control whether to use locales or not during formatting.
 | |
| </p>
 | |
| 
 | |
| <p>
 | |
| We follow Python's approach <a href="#3">[3]</a> and designate a separate format
 | |
| specifier <code>'n'</code> for locale-aware numeric formatting. It applies to
 | |
| all integral and floating-point types. All other specifiers produce output
 | |
| unaffected by locale settings. This can also have positive peformance effect
 | |
| because locale-independent formatting can be implemented more efficiently.
 | |
| </p>
 | |
| 
 | |
| <h3><a name="PosArguments">Positional Arguments</a></h3>
 | |
| 
 | |
| <p>
 | |
| An important feature for localization is the ability to rearrange formatting
 | |
| arguments because the word order may vary in different languages
 | |
| <a href="#3">[3]</a>. For example:
 | |
| </p>
 | |
| 
 | |
| <pre class="example">
 | |
| <code>printf("String `%s' has %d characters\n", string, length(string)))</code>
 | |
| </pre>
 | |
| 
 | |
| <p>A possible German translation of the format string might be:</p>
 | |
| 
 | |
| <pre class="example">
 | |
| <code>"%2$d Zeichen lang ist die Zeichenkette `%1$s'\n"</code>
 | |
| </pre>
 | |
| 
 | |
| <p>
 | |
| using POSIX positional arguments <a href="#2">[2]</a>. Unfortunately these
 | |
| positional specifiers are not portable <a href="#6">[6]</a>. The C++ I/O
 | |
| streams don't support positional arguments by design because formatting
 | |
| arguments are interleaved with the portions of the literal string:
 | |
| </p>
 | |
| 
 | |
| <pre class="example">
 | |
| <code>std::cout << "String `" << string << "' has " << length(string) << " characters\n"</code>
 | |
| </pre>
 | |
| 
 | |
| <p>
 | |
| The current proposal allows both positional and automatically numbered
 | |
| arguments, for example:
 | |
| </p>
 | |
| 
 | |
| <pre class="example">
 | |
| <code>std::format("String `{}' has {} characters\n", string, length(string)))</code>
 | |
| </pre>
 | |
| 
 | |
| <p>with the German translation of the format string:</p>
 | |
| 
 | |
| <pre class="example">
 | |
| <code>"{1} Zeichen lang ist die Zeichenkette `{0}'\n"</code>
 | |
| </pre>
 | |
| 
 | |
| <h3><a name="Locale">Performance</a></h3>
 | |
| 
 | |
| <p>TODO</p>
 | |
| 
 | |
| <h3><a name="Footprint">Binary Footprint</a></h3>
 | |
| 
 | |
| <p>TODO</p>
 | |
| 
 | |
| <h2><a name="Wording">Proposed Wording</a></h2>
 | |
| 
 | |
| <p>
 | |
| The header <code><format></code> defines the function templates
 | |
| <code>format</code> that format arguments and return the results as strings.
 | |
| TODO: rephrase and mention format_args
 | |
| </p>
 | |
| 
 | |
| <h3>Header <code><format></code> synopsis</h3>
 | |
| 
 | |
| <pre>
 | |
| <code>namespace std {
 | |
|   class format_args;
 | |
| 
 | |
|   template <class Char>
 | |
|   basic_string<Char> format(const Char *fmt, format_args args);
 | |
| 
 | |
|   template <class Char, class ...Args>
 | |
|   basic_string<Char> format(const Char *fmt, const Args&... args);
 | |
| }</code>
 | |
| </pre>
 | |
| 
 | |
| <h3>Format string syntax</h3>
 | |
| 
 | |
| <pre>
 | |
| <code>replacement-field ::=  '{' [arg-id] [':' format-spec] '}'
 | |
| arg-id            ::=  integer
 | |
| integer           ::=  digit+
 | |
| digit             ::=  '0'...'9'
 | |
| </pre>
 | |
| 
 | |
| <!-- The notation is the same as in n4296 22.4.3.1. -->
 | |
| <pre>
 | |
| <code>format-spec ::=  [[fill] align] [sign] ['#'] ['0'] [width] ['.' precision] [type]
 | |
| fill        ::=  <a character other than '{' or '}'>
 | |
| align       ::=  '<' | '>' | '=' | '^'
 | |
| sign        ::=  '+' | '-' | ' '
 | |
| width       ::=  integer | '{' arg-id '}'
 | |
| precision   ::=  integer | '{' arg-id '}'
 | |
| type        ::=  int-type | 'a' | 'A' | 'c' | 'e' | 'E' | 'f' | 'F' | 'g' | 'G' | 'p' | 's'
 | |
| int-type    ::=  'b' | 'B' | 'd' | 'o' | 'x' | 'X'</code>
 | |
| </pre>
 | |
| 
 | |
| <h2><a name="Implementation">Implementation</a></h2>
 | |
| 
 | |
| <p>
 | |
| The ideas proposed in this paper have been implemented in the open-source fmt
 | |
| library. TODO: link and mention other implementations (Boost Format, FastFormat)
 | |
| </p>
 | |
| 
 | |
| <h2><a name="References">References</a></h2>
 | |
| 
 | |
| <p>
 | |
| <a name="1">[1]</a>
 | |
| <cite>The <code>fprintf</code> function. ISO/IEC 9899:2011. 7.21.6.1.</cite><br/>
 | |
| <a name="2">[2]</a>
 | |
| <cite><a href="http://pubs.opengroup.org/onlinepubs/009695399/functions/fprintf.html">
 | |
| fprintf, printf, snprintf, sprintf - print formatted output</a>. The Open
 | |
| Group Base Specifications Issue 6 IEEE Std 1003.1, 2004 Edition.</cite><br/>
 | |
| <a name="3">[3]</a>
 | |
| <cite><a href="https://docs.python.org/3/library/string.html#format-string-syntax">
 | |
| 6.1.3. Format String Syntax</a>. Python 3.5.2 documentation.</cite><br/>
 | |
| <a name="4">[4]</a>
 | |
| <cite><a href="https://msdn.microsoft.com/en-us/library/system.string.format(v=vs.110).aspx">
 | |
| String.Format Method</a>. .NET Framework Class Library.</cite><br/>
 | |
| <a name="5">[5]</a>
 | |
| <cite><a href="https://doc.rust-lang.org/std/fmt/">
 | |
| Module <code>std::fmt</code></a>. The Rust Standard Library.</cite><br/>
 | |
| <a name="6">[6]</a>
 | |
| <cite><a href="https://msdn.microsoft.com/en-us/library/56e442dc(v=vs.120).aspx">
 | |
| Format Specification Syntax: printf and wprintf Functions</a>. C++ Language and
 | |
| Standard Libraries.</cite><br/>
 | |
| <a name="7">[7]</a>
 | |
| <cite><a href="ftp://ftp.gnu.org/old-gnu/Manuals/gawk-3.1.0/html_chapter/gawk_11.html">
 | |
| 10.4.2 Rearranging printf Arguments</a>. The GNU Awk User's Guide.</cite><br/>
 | |
| </p>
 | |
| 
 | |
| </body>
 |