regex/doc/regex_split.html

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
   <head>
      <title>Boost.Regex: Algorithm regex_split (deprecated)</title>
      <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
      <link rel="stylesheet" type="text/css" href="../../../boost.css">
   </head>
   <body>
      <P>
         <TABLE id="Table1" cellSpacing="1" cellPadding="1" width="100%" border="0">
            <TR>
               <td valign="top" width="300">
                  <h3><a href="../../../index.htm"><img height="86" width="277" alt="C++ Boost" src="../../../c++boost.gif" border="0"></a></h3>
               </td>
               <TD width="353">
                  <H1 align="center">Boost.Regex</H1>
                  <H2 align="center">Algorithm regex_split (deprecated)</H2>
               </TD>
               <td width="50">
                  <h3><a href="index.html"><img height="45" width="43" alt="Boost.Regex Index" src="uarrow.gif" border="0"></a></h3>
               </td>
            </TR>
         </TABLE>
      </P>
      <HR>
      <p></p>
      <P>The algorithm regex_split has been deprecated in favor of the iterator <A href="regex_token_iterator.html">
            regex_token_iterator</A> which has a more flexible and powerful interface,
         as well as following the more usual standard library "pull" rather than "push"
         semantics.</P>
      <P>Code which uses regex_split will continue to compile, the following
         documentation is taken from the previous boost.regex version:</P>
      <H3><A name="regex_split"></A>Algorithm regex_split</H3>
      <PRE>#include &lt;<A href="../../../boost/regex.hpp">boost/regex.hpp</A>&gt; </PRE>
      <P>Algorithm regex_split performs a similar operation to the perl split operation,
         and comes in three overloaded forms:
      </P>
      <PRE><B>template</B> &lt;<B>class</B> OutputIterator, <B>class</B> charT, <B>class</B> Traits1, <B>class</B> Alloc1, <B>class</B> Traits2, <B>class</B> Alloc2&gt;
std::size_t regex_split(OutputIterator out,&nbsp;
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; std::basic_string&lt;charT, Traits1, Alloc1&gt;&amp; s,&nbsp;
&nbsp;<B>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; const</B> basic_regex&lt;charT, Traits2, Alloc2&gt;&amp; e,
&nbsp;<B>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; unsigned</B> flags,
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; std::size_t max_split);

<B>template</B> &lt;<B>class</B> OutputIterator, <B>class</B> charT, <B>class</B> Traits1, <B>class</B> Alloc1, <B>class</B> Traits2, <B>class</B> Alloc2&gt;
std::size_t regex_split(OutputIterator out,&nbsp;
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; std::basic_string&lt;charT, Traits1, Alloc1&gt;&amp; s,&nbsp;
&nbsp;<B>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; const</B> basic_regex&lt;charT, Traits2, Alloc2&gt;&amp; e,
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <B>unsigned</B> flags = match_default);

<B>template</B> &lt;<B>class</B> OutputIterator, <B>class</B> charT, <B>class</B> Traits1, <B>class</B> Alloc1&gt;
std::size_t regex_split(OutputIterator out,
                        std::basic_string&lt;charT, Traits1, Alloc1&gt;&amp; s);</PRE>
      <P><STRONG>Effects: </STRONG>Each version of the algorithm takes an
         output-iterator for output, and a string for input. If the expression contains
         no marked sub-expressions, then the algorithm writes one string onto the
         output-iterator for each section of input that does not match the expression.
         If the expression does contain marked sub-expressions, then each time a match
         is found, one string for each marked sub-expression will be written to the
         output-iterator. No more than <I>max_split </I>strings will be written to the
         output-iterator. Before returning, all the input processed will be deleted from
         the string <I>s</I> (if <I>max_split </I>is not reached then all of <I>s</I> will
         be deleted). Returns the number of strings written to the output-iterator. If
         the parameter <I>max_split</I> is not specified then it defaults to UINT_MAX.
         If no expression is specified, then it defaults to "\s+", and splitting occurs
         on whitespace.
      </P>
      <P><STRONG>Throws:</STRONG> <CODE>std::runtime_error</CODE> if the complexity of
         matching the expression against an N character string begins to exceed O(N<SUP>2</SUP>),
         or if the program runs out of stack space while matching the expression (if
         Boost.regex is <A href="configuration.html">configured</A> in recursive mode),
         or if the matcher exhausts it's permitted memory allocation (if Boost.regex is <A href="configuration.html">
            configured</A> in non-recursive mode).</P>
      <P><A href="../example/snippets/regex_split_example_1.cpp">Example</A>: the
         following function will split the input string into a series of tokens, and
         remove each token from the string <I>s</I>:
      </P>
      <PRE><B>unsigned</B> tokenise(std::list&lt;std::string&gt;&amp; l, std::string&amp; s)
{
<B>&nbsp;&nbsp; return</B> boost::regex_split(std::back_inserter(l), s);
}</PRE>
      <P><A href="../example/snippets/regex_split_example_2.cpp">Example</A>: the
         following short program will extract all of the URL's from a html file, and
         print them out to <I>cout</I>:
      </P>
      <PRE><FONT color=#008000>#include &lt;list&gt;
#include &lt;fstream&gt;
#include &lt;iostream&gt;
#include &lt;boost/regex.hpp&gt;
</FONT>
boost::regex e(<FONT color=#000080>"&lt;\\s*A\\s+[^&gt;]*href\\s*=\\s*\"([^\"]*)\""</FONT>,
               boost::regbase::normal | boost::regbase::icase);

<B>void</B> load_file(std::string&amp; s, std::istream&amp; is)
{
   s.erase();
   <FONT color=#000080>//
   // attempt to grow string buffer to match file size,
   // this doesn't always work...
</FONT>   s.reserve(is.rdbuf()-&amp;gtin_avail());
   <B>char</B> c;
   <B>while</B>(is.get(c))
   {
      <FONT color=#000080>// use logarithmic growth stategy, in case
      // in_avail (above) returned zero:
</FONT>      <B>if</B>(s.capacity() == s.size())
         s.reserve(s.capacity() * 3);
      s.append(1, c);
   }
}


<B>int</B> main(<B>int</B> argc, <B>char</B>** argv)
{
   std::string s;
   std::list&lt;std::string&gt; l;

   <B>for</B>(<B>int</B> i = 1; i &lt; argc; ++i)
   {
      std::cout &lt;&lt; <FONT color=#000080>"Findings URL's in "</FONT> &lt;&lt; argv[i] &lt;&lt; <FONT color=#000080>":"</FONT> &lt;&lt; std::endl;
      s.erase();
      std::ifstream is(argv[i]);
      load_file(s, is);
      boost::regex_split(std::back_inserter(l), s, e);
      <B>while</B>(l.size())
      {
         s = *(l.begin());
         l.pop_front();
         std::cout &lt;&lt; s &lt;&lt; std::endl;
      }
   }
   <B>return</B> 0;
}</PRE>
      <HR>
      <p>Revised
         <!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan -->
         17 May 2003
         <!--webbot bot="Timestamp" endspan i-checksum="39359" -->
      </p>
      <P><I>© Copyright <a href="mailto:jm@regex.fsnet.co.uk">John Maddock</a>&nbsp;1998-<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan --> 2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></I></P>
      <P align="left"><I>Permission to use, copy, modify, distribute and sell this software
            and its documentation for any purpose is hereby granted without fee, provided
            that the above copyright notice appear in all copies and that both that
            copyright notice and this permission notice appear in supporting documentation.
            Dr John Maddock makes no representations about the suitability of this software
            for any purpose. It is provided "as is" without express or implied warranty.</I></P>
   </body>
</html>