Utility - Lexicographic

The class boost::lexicographic provides an easy way to avoid complex and errorprone if-else cascades to do lexicographic comparisions on certain different criteria. The class is in the header boost/utility/lexicographic.hpp and depends on no others headers. The test code is in lexicographic_test.cpp.

Introduction
Examples
Synopsis
Members
Free Functions
Credits

Introduction

Often one has to write comparisions which give an ordering between various kinds of data. When they look in a certain specified order at one relation between two data items at a time and result in a lexicographic comparision of all these relations the programmer often has to write long if-else cascades. These cascades are often complex and difficult to maintain. The class boost::lexicographic helps in this scenario. Its constructor and function call operator takes two data items which need to be compared as arguments and performs to comparision. The order in which the function call operators are called determine the lexicographic order of the relations. Since the result of all further comparisions might not be needed after a certain step, they are not executed.
The logic of the class assumes an ascending order as implied by the operator <. If a descending order needs to be obtained one can just switch the order of the arguments. Additionally, both the constructor and the function call operator provide also a three argument form which takes a functor for comparisions as a third argument.

Relation to `std::lexicographic_compare`

The standard C++ algorithm std::lexicographic_compare does essentially the same thing but in a different situation. It compares sequences of data items of equal type. Whereas boost::lexicographic compares individual data items of different type, and every comparison must be specified explicitly by using the function call operator of the class.

Relation to if-else-cascades

Advantages

The order of comparisons can easily be changed.
Single comparisons can be added or removed in one line.
Comparisons can be split up to be computed partly in one function and partly in another by using boost::lexicographic as a functor.
It documents the code in a better fashion and expresses the users intention directly.
If the comparison arguments do not need computation, there is in theory no performance overhead.

Disadvantages

There is no short-circuiting. All arguments will be evaluated, also if an earlier comparison step already gave the final result. As long as the arguments are trivial there should be no performance overhead. The only way to avoid evaluation of arguments is to place every comparison step in an if-statement like:
```
boost::lexicographic cmp (complex_computation (a), complex_computation (b));
if (cmp.result () == lexicographic::equivalent) 
{
	cmp (complex_computation (c), complex_computation (d));
	if (cmp.result () == lexicographic::equivalent)
	{
		cmp (complex_computation (e), complex_computation (f));
	}
}
// do something with cmp
```
But this construct eats up many of the advantages of using boost::lexicographic.
The performance of using boost::lexicographic, besides the lack of short-circuiting, is not negligible. Tests with gcc 3.2.2 showed, that the algorithmic overhead is about 40% in comparison to according to if-else-cascades. Additionally gcc failed to inline everything properly, so that the resulting performance overhead was about a factor two.

Examples

An example usage are special sorting operators, such as the lexicographic ordering of tuples:

struct position
{
    double x, y, z;
};

bool operator < (position const &p1, position const &p2)
{
    return boost::lexicographic (p1.x, p2.x)
                                (p1.y, p2.y)
                                (p1.z, p2.z);
}

An alternative form of writing this without boost::lexicographic would be this:

bool operator < (position const &p1, position const &p2)
{
    if (p1.x == p2.x)
        if (p1.y == p2.y)
            return p1.z < p2.z;
        else
            return p1.y < p2.y;
    else
        return p1.x < p2.x;
}

It is also easy to use different functor such as a case insensitive comparision function object in the next example.

struct person
{
    std::string firstname, lastname;
};

bool operator < (person const &p1, person const &p2)
{
    return boost::lexicographic
        (p1.lastname, p2.lastname, cmp_case_insensitive)
        (p1.firstname, p2.firstname, cmp_case_insensitive);
}

Synopsis

namespace boost
{

  class lexicographic
  {
    public:
      enum result_type { minus = -1, equivalent, plus };

      template <typename T1, typename T2>
      lexicographic (T1 const &a, T2 const &b);
      
      template <typename T1, typename T2, typename Cmp>
      lexicographic (T1 const &a, T2 const &b, Cmp cmp);

      template <typename T1, typename T2>
      lexicographic &operator () (T1 const &a, T2 const &b);

      template <typename T1, typename T2, typename Cmp>
      lexicographic &operator () (T1 const &a, T2 const &b, Cmp cmp);

      result_type result () const;
      operator unspecified_bool_type () const;
  };

  bool operator == (lexicographic l1, lexicographic l2);
  bool operator != (lexicographic l1, lexicographic l2);

}

Members

result_type

enum result_type { minus = -1, equivalent = 0, plus = +1 };

Defines the result type of the class. It is kept as internal state and is returned by result (). The integer representation of it is equivalent to the one returned by std::strcmp.

minus - the sequence of the first arguments of constructor and function call operators is lexicographically less than the according sequence of the second arguments.
equivalent - all elements of the sequences of the first and the second arguments are identical.
plus - the sequence of the first arguments of constructor and function call operators is lexicographically greater than the according sequence of the second arguments.

constructors

template <typename T1, typename T2>

	lexicographic (T1 const &a, T2 const &b);

Constructs new object and does the first comparision step between a and b. It uses operator < for comparisions.

template <typename T1, typename T2, typename Cmp>

	lexicographic (T1 const &a, T2 const &b, Cmp cmp);

Constructs new object and does the first comparision step between a and b. It uses cmp for comparisions.

function call operators

template <typename T1, typename T2>

	lexicographic &operator () (T1 const &a, T2 const &b);

Does next comparision step on object between a and b. It uses operator < for comparisions.

template <typename T1, typename T2, typename Cmp>

	lexicographic &operator () (T1 const &a, T2 const &b, Cmp cmp);

Does next comparision step on object between a and b. It uses cmp for comparisions.

result

result_type result () const;

Gives result of already done comparision steps.

conversions

operator unspecified_bool_type () const;

This conversion operator allows objects to be used in boolean contexts, like if (lexicographic (a, b)) {}. The actual target type is typically a pointer to a member function, avoiding many of the implicit conversion pitfalls.
It evaluates to true if result () == minus, otherwise to false.

Free Functions

comparision

bool operator == (lexicographic l1, lexicographic l2);

Returns l1.result () == l2.result (). That means it returns true if both objects are in the same state.

bool operator != (lexicographic l1, lexicographic l2);

Returns l1.result () != l2.result (). That means it returns true if the two objects are in the a different state.

Credits

The author of boost::lexicographic is Jan Langer (jan@langernetz.de). Ideas and suggestions from Steve Cleary, David Abrahams, Gennaro Proata, Paul Bristow, Daniel Frey, Daryle Walker and Brian McNamara were used.

October 5, 2003

© Copyright Jan Langer 2003
Use, modification, and distribution is subject to the Boost Software License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at www.boost.org/LICENSE_1_0.txt)