The Library
Help/Info
Current Release









Last Modified:
Jan 03, 2011

Parsing



This page documents the objects and functions that in some way deal with parsing or otherwise manipulating text. Everything here follows the same conventions as the rest of the library.


Objects
Global Functions
[top]

base64



This object allows you to encode and decode data to and from the Base64 Content-Transfer-Encoding defined in section 6.8 of rfc2045.

Specification: dlib/base64/base64_kernel_abstract.h
File to include: dlib/base64.h
Code Examples: 1

Implementations:
base64_kernel_1:
This implementation is done using a lookup table in the obvious way.
kernel_1a
is a typedef for base64_kernel_1
[top]

basic_utf8_ifstream



This object represents an input file stream much like the normal std::ifstream except that it knows how to read UTF-8 data. So when you read characters out of this stream it will automatically convert them from the UTF-8 multibyte encoding into a fixed width wide character encoding.

There are also two typedefs of this object. The first is utf8_wifstream which is a typedef for wchar_t as the wide character to read into. The second is utf8_uifstream which uses unichar instead of wchar_t.



Specification: dlib/unicode/unicode_abstract.h
File to include: dlib/unicode.h

[top]

cast_to_string



cast_to_string is a templated function which makes it easy to convert arbitrary objects to std::string strings. The types supported are any types that can be written to std::ostream via operator<<.

Specification: dlib/string/string_abstract.h
File to include: dlib/string.h

[top]

cast_to_wstring



cast_to_string is a templated function which makes it easy to convert arbitrary objects to std::wstring strings. The types supported are any types that can be written to std::wostream via operator<<.

Specification: dlib/string/string_abstract.h
File to include: dlib/string.h

[top]

cmd_line_parser



This object allows you to easily parse a command line. Note that the documentation for the cmd_line_parser_option (the object returned by the parser's .option() function) is in a separate file.

Specification: dlib/cmd_line_parser/cmd_line_parser_kernel_abstract.h
File to include: dlib/cmd_line_parser.h
Code Examples: 1

Implementations:
cmd_line_parser_kernel_1:
This implementation uses the map and sequence containers to keep track of the command line options and arguments. For further details see the above link.
kernel_1a
is a typedef for cmd_line_parser_kernel_1 that uses map_kernel_1a and sequence_kernel_2a
kernel_1a_c
is a typedef for kernel_1a that checks its preconditions.

Extensions to cmd_line_parser

cmd_line_parser_check

This gives a cmd_line_parser object the ability to easily perform various kinds of validation on the command line input.

Specification: dlib/cmd_line_parser/cmd_line_parser_check_abstract.h

Implementations:
cmd_line_parser_check_1:
This implementation is done in the obvious way. See the source for details
check_1a
is a typedef for cmd_line_parser_print_1 extended by cmd_line_parser_check_1
check_1a_c
is a typedef for check_1a that checks its preconditions.
cmd_line_parser_print

This extension gives a cmd_line_parser object the ability to print its command line options in a nice format.

Specification: dlib/cmd_line_parser/cmd_line_parser_print_abstract.h

Implementations:
cmd_line_parser_print_1:
This implementation is done by enumerating the options of the parser and printing them.
print_1a
is a typedef for cmd_line_parser_kernel_1 extended by cmd_line_parser_print_1
print_1a_c
is a typedef for print_1a that checks its preconditions.
[top]

config_reader



This object represents something which is intended to be used to read text configuration files.

Specification: dlib/config_reader/config_reader_kernel_abstract.h
File to include: dlib/config_reader.h
Code Examples: 1

Implementations:
config_reader_kernel_1:
This implementation is done using the map object in the obvious way.
kernel_1a
is a typedef for config_reader_kernel_1 that uses map_kernel_1b

Extensions to config_reader

config_reader_thread_safe

This object extends a normal config_reader by simply wrapping all its member functions inside mutex locks to make it safe to use in a threaded program.

Specification: dlib/config_reader/config_reader_thread_safe_abstract.h

Implementations:
config_reader_thread_safe_1:
This implementation is done in the obvious way. See the source for details
thread_safe_1a
is a typedef for config_reader_kernel_1 extended by config_reader_thread_safe_1
[top]

convert_utf8_to_utf32



This is a global function that can convert UTF-8 strings into strings of 32bit unichar characters.

Specification: dlib/unicode/unicode_abstract.h
File to include: dlib/unicode.h

[top]

cpp_pretty_printer



This object represents an HTML pretty printer for C++ source code.

Specification: dlib/cpp_pretty_printer/cpp_pretty_printer_kernel_abstract.h
File to include: dlib/cpp_pretty_printer.h

Implementations:
cpp_pretty_printer_kernel_1:
This is implemented by using the cpp_tokenizer object. This is the pretty printer I use on all the source in this library. It applies a color scheme, turns include directives such as #include "file.h" into links to file.h.html and puts HTML anchor points on function and class declarations. It also looks for comments starting with /*!A and puts an anchor before the comment using the word following the A as the name of the anchor.
kernel_1a
is a typedef for cpp_pretty_printer_kernel_1
cpp_pretty_printer_kernel_2:
This is implemented by using the cpp_tokenizer object. It applies a black and white color scheme suitable for printing on a black and white printer. It also places the document title prominently at the top of the pretty printed source file.
kernel_2a
is a typedef for cpp_pretty_printer_kernel_2
[top]

cpp_tokenizer



This object represents a simple tokenizer for C++ source code.

Specification: dlib/cpp_tokenizer/cpp_tokenizer_kernel_abstract.h
File to include: dlib/cpp_tokenizer.h

Implementations:
cpp_tokenizer_kernel_1:
This is implemented by using the tokenizer object in the obvious way.
kernel_1a
is a typedef for cpp_tokenizer_kernel_1
kernel_1a_c
is a typedef for kernel_1a that checks its preconditions.
[top]

is_combining_char



This is a global function that can tell you if a character is a Unicode combining character or not.

Specification: dlib/unicode/unicode_abstract.h
File to include: dlib/unicode.h

[top]

left_substr



This is a function to return the part of a string to the left of a user supplied delimiter.

Specification: dlib/string/string_abstract.h
File to include: dlib/string.h

[top]

lpad



This is a function to pad whitespace (or user specified characters) onto the left most end of a string.

Specification: dlib/string/string_abstract.h
File to include: dlib/string.h

[top]

ltrim



This is a function to remove the whitespace (or user specified characters) from the left most end of a string.

Specification: dlib/string/string_abstract.h
File to include: dlib/string.h

[top]

narrow



This is a function for converting a string of type std::string or std::wstring to a plain std::string.

Specification: dlib/string/string_abstract.h
File to include: dlib/string.h

[top]

pad



This is a function to pad whitespace (or user specified characters) onto the ends of a string.

Specification: dlib/string/string_abstract.h
File to include: dlib/string.h

[top]

right_substr



This is a function to return the part of a string to the right of a user supplied delimiter.

Specification: dlib/string/string_abstract.h
File to include: dlib/string.h

[top]

rpad



This is a function to pad whitespace (or user specified characters) onto the right most end of a string.

Specification: dlib/string/string_abstract.h
File to include: dlib/string.h

[top]

rtrim



This is a function to remove the whitespace (or user specified characters) from the right most end of a string.

Specification: dlib/string/string_abstract.h
File to include: dlib/string.h

[top]

strings_equal_ignore_case



This is a pair of functions to do a case insensitive comparison between strings.

Specification: dlib/string/string_abstract.h
File to include: dlib/string.h

[top]

string_assign



string_assign is an object which makes it easy to convert strings to other types. The types supported are any types that can be read by the basic_istream operator>>. It also supports casting between wstring, string, and ustring objects. Since string_assign is a simple stateless object there is a global instance of it called dlib::sa.

Specification: dlib/string/string_abstract.h
File to include: dlib/string.h
Code Examples: 1

[top]

string_cast



string_cast is a templated function which makes it easy to convert strings to other types. The types supported are any types that can be read by the basic_istream operator>>. It also supports casting between wstring, string, and ustring objects.

Specification: dlib/string/string_abstract.h
File to include: dlib/string.h

[top]

tokenizer



This object represents a simple tokenizer for textual data.

Specification: dlib/tokenizer/tokenizer_kernel_abstract.h
File to include: dlib/tokenizer.h

Implementations:
tokenizer_kernel_1:
This is implemented in the obvious way.
kernel_1a
is a typedef for tokenizer_kernel_1
kernel_1a_c
is a typedef for kernel_1a that checks its preconditions.
[top]

tolower



This is a function to convert a string to all lowercase.

Specification: dlib/string/string_abstract.h
File to include: dlib/string.h

[top]

toupper



This is a function to convert a string to all uppercase.

Specification: dlib/string/string_abstract.h
File to include: dlib/string.h

[top]

trim



This is a function to remove the whitespace (or user specified characters) from the ends of a string.

Specification: dlib/string/string_abstract.h
File to include: dlib/string.h

[top]

unichar



This is a typedef for an unsigned 32bit integer which we use to store Unicode values.

Specification: dlib/unicode/unicode_abstract.h
File to include: dlib/unicode.h

[top]

ustring



This is a typedef for a std::basic_string<unichar>. That is, it is a typedef for a string object that stores unichar Unicode characters.

Specification: dlib/unicode/unicode_abstract.h
File to include: dlib/unicode.h

[top]

wrap_string



wrap_string is a function that takes a string and breaks it into a number of lines of a given length. You can use this to make a string fit nicely into a command prompt window for example.

Specification: dlib/string/string_abstract.h
File to include: dlib/string.h

[top]

xml_parser



This object represents a simple SAX style event driven XML parser. It takes its input from an input stream object and sends events to all registered document_handler and error_handler objects.

The xml_parser object also uses the interface classes document_handler and error_handler. Subclasses of these classes are passed to the xml_parser which generates events while it's parsing and sends them to the appropriate handler.

Specification: dlib/xml_parser/xml_parser_kernel_abstract.h
File to include: dlib/xml_parser.h
Code Examples: 1

Implementations:
xml_parser_kernel_1:
This implementation is done using a stack (as opposed to recursive descent) to parse xml documents. It also uses a map to implement the attribute_list interface and internally uses the sequence object to keep track of all registered document and error handlers.
kernel_1a
is a typedef for xml_parser_kernel_1 that uses map_kernel_1a, stack_kernel_1a, and sequence_kernel_2a
kernel_1a_c
is a typedef for kernel_1a that checks its preconditions.