C to HTML Converter (PHP Version)
by bkenwright@xbdev.net
Converting code into 'colored' HTML isn't as simple as you'd think - as you have to identify which parts of the code are comments, keywords and other delimeters.
Different languages have different criteria/syntax - so one size does not fit all.
However, a popular language style - is the 'c-style' language - which is used in 'JavaScript', 'C#', 'C++', 'GLSL' etc. The keywords and syntax isn't identical - but they're very similar (e.g., single and multiline comments, function/variable declaration...)
Writing a code parser is similar to writing a compiler - you have to convert the text into tokens - then go token by token to analyse what that token means - such as, is it a string, is it a comment and so on.
The following provides a complete PHP implementation for converting C files (and other C-style code) to color vanilla HTML.
In fact, the actual code shown below was generated using the php implementation (c2html.php) - so you can see how it looks when generating 'php' type syntax as well.
 | C2HTML PHP Code |  |
<?php
/********************************************************************************/
/* */
/* File: c2html.php */
/* Auth: Ben Kenwright */
/* Email: bkenwright@xbdev.net */
/* Url: www.xbdev.net */
/* Date: 01/01/06 */
/* */
/********************************************************************************/
/*
About?
Well its such a simple program, and I'm sure everyone has wrote one, one time
or another.
Convert our code, c/c++ so that its color coded! This code takes it a bit
further and generates a html file, so you can take your c/c++ text file
and output a html file which will have all the great colour coding!
I've kept it simple, as you can mix the html so it uses styles and .css files
but I like to use the font color html tag....but its very easy to convert
to the alternative method if you prefair.
To use?
// load file (get code)
$content = file_get_contents($filename);
// convert it to html using function
$html = convert2html($content);
Further work / Thinking:
o Bit of modification build website c/c++ color coding.
*/
/********************************************************************************/
define("ZCOMMENT", "<font color=\"navy\">");
define("ZSTRING", "<font color=\"green\">");
define("ZMACRO", "<font color=\"violet\">");
define("ZKEYWORDS", "<font color=\"maroon\">");
define("ZDECLARATIONS", "<font color=\"maroon\">");
define("ZUNIQUE", "<font color=\"red\">");
//--------------------------------------------------------------------------------
function parse($content) {
$output = '';
$length = strlen($content);
$pos = 0;
while ($pos < $length) {
$c = $content[$pos];
$pos++;
// Start of a comment
if ($c == '/') {
if ($pos >= $length) {
return $output;
}
$next_char = $content[$pos];
$pos++;
if ($next_char == '/' || $next_char == '*') {
$comment_type = $next_char;
$output .= ZCOMMENT . "/";
$output .= put_char($next_char);
$prev_char = '';
while ($pos < $length) {
$current_char = $content[$pos];
$pos++;
$output .= put_char($current_char);
if ($comment_type == '/' && $current_char == "\n") {
break;
} elseif ($comment_type == '*' && $prev_char == '*' && $current_char == '/') {
break;
}
$prev_char = $current_char;
}
$output .= "</font>";
} else {
$pos--;
$output .= "/";
}
}
elseif ($c == '\'' || $c == '"') {
// Quotation
$quote = $c;
$back_slash = false;
$output .= ZSTRING;
$output .= put_char($c);
while ($pos < $length) {
$current_char = $content[$pos];
$pos++;
$output .= put_char($current_char);
if ($current_char == $quote && !$back_slash) {
break;
}
if ($current_char == '\' && !$back_slash) {
$back_slash = true;
} else {
$back_slash = false;
}
}
$output .= "</font>";
}
elseif ($c == '#') {
// Start of a macro
$output .= ZMACRO;
$output .= put_char($c);
// Skip whitespace
while ($pos < $length) {
$current_char = $content[$pos];
if (ctype_space($current_char)) {
$pos++;
$output .= put_char($current_char);
} else {
break;
}
}
$buffer = '';
while ($pos < $length) {
$current_char = $content[$pos];
if (ctype_alpha($current_char)) {
$buffer .= $current_char;
$pos++;
} else {
break;
}
}
if (is_macro($buffer)) {
$output .= "$buffer</font>";
} else {
$output .= "</font>$buffer";
}
if ($pos < $length) {
$output .= put_char($content[$pos]);
$pos++;
}
}
else {
if (ctype_lower($c) || ctype_alnum($c) || $c == '_') {
$buffer = $c;
while ($pos < $length) {
$current_char = $content[$pos];
if (ctype_lower($current_char) || ctype_alnum($current_char) || $current_char == '_') {
$buffer .= $current_char;
$pos++;
} else {
break;
}
}
$output .= is_token($buffer);
if ($pos < $length) {
$output .= put_char($content[$pos]);
$pos++;
}
} else {
$output .= put_char($c);
}
}
}
return $output;
}// End Parse(..)
//--------------------------------------------------------------------------------
function put_char($c) {
$restul = '';
switch ($c) {
case '<':
$result = "<";
break;
case '>':
$result = ">";
break;
case '&':
$result = "&";
break;
case "\t":
$result = " ";
break;
default:
$result = $c;
break;
}
return $result;
}// End put_char(..)
//--------------------------------------------------------------------------------
function is_token($buffer) {
$result = '';
if (is_keyword($buffer)) {
$result = ZKEYWORDS . $buffer . "</font>";
} elseif (is_decl($buffer)) {
$result = ZDECLARATIONS . $buffer . "</font>";
} elseif (is_uniq($buffer) || is_number($buffer)) {
$result = ZUNIQUE . $buffer . "</font>";
} else {
$result = $buffer;
}
return $result;
}// End is_token(..)
//--------------------------------------------------------------------------------
function is_keyword($buffer) {
$keywords = array(
"break", "case", "continue", "default", "do", "else", "for",
"goto", "if", "return", "sizeof", "switch", "while"
);
return in_array($buffer, $keywords);
}// End is_keyword(..)
//--------------------------------------------------------------------------------
function is_decl($buffer) {
$declarations = array(
"auto", "char", "const", "DIR", "double", "enum", "extern",
"FILE", "float", "fpos_t", "int", "int8_t", "int16_t",
"int32_t", "int64_t", "long", "mode_t", "pid_t", "register",
"short", "signed", "size_t", "ssize_t", "static", "struct",
"typedef", "union", "unsigned", "va_list", "void", "volatile",
"class", "public", "protected", "private"
);
return in_array($buffer, $declarations);
}// End is_decl(..)
//--------------------------------------------------------------------------------
function is_uniq($buffer) {
$unique = array(
"__DATE__", "__TIME__", "EACCES", "EAGAIN", "EBADF",
"EBUSY", "EOF", "ECHILD", "EDEADLK", "EDOM",
"EFAULT", "EINVAL", "EILSEQ", "EINTR", "EFBIG",
"EISDIR", "stdin", "EMFILE", "EMLINK", "EMSGSIZE",
"ENFILE", "ENODEV", "ENOENT", "ENOLCK", "stdout",
"ENOMEM", "ENOTDIR", "ENOSPC", "ENOSYS", "ENOTEMPTY",
"ENOTSUP", "ENOTTY", "ENOEXEC", "ENXIO", "ECANCELED",
"EPIPE", "ERANGE", "EROFS", "ESPIPE", "ESRCH",
"EXDEV", "__FILE__", "__LINE__", "NULL", "SEEK_SET",
"SEEK_CUR", "SEEK_END", "SIGABRT", "SIGALRM", "SIGCHLD",
"SIGCONT", "SIG_DFL", "SIG_ERR", "SIGHUP", "SIG_IGN",
"SIGINT", "SIGFPE", "SIGKILL", "SIGQUIT", "SIGSEGV",
"SIGSTP", "SIGTERM", "SIGTRAP", "SIGTTIN", "SIGTTOU",
"SIGUSR1", "SIGUSR2", "__STDC__", "stderr", "EINPROGRESS",
"E2BIG", "EBADMSG", "EEXIST", "EIO", "ENAMETOOLONG",
"SIGILL", "EPERM", "SIGSTOP", "ETIMEDOUT",
);
return in_array($buffer, $unique);
}// End is_uniq(..)
//--------------------------------------------------------------------------------
function is_number($buffer) {
if (!preg_match('/^[0-9a-fA-FxX]+$/', $buffer)) {
return false;
}
if (strlen($buffer) > 1 && $buffer[0] == '0' && ($buffer[1] == 'x' || $buffer[1] == 'X')) {
// Hex number
return ctype_xdigit(substr($buffer, 2));
}
return ctype_digit($buffer);
}// End is_number(..)
//--------------------------------------------------------------------------------
function is_macro($buffer) {
$macros = array(
"define", "elif", "else", "endif", "error", "if",
"ifdef", "ifndef", "include", "line", "pragma"
);
return in_array($buffer, $macros);
}// End is_macro(..)
//--------------------------------------------------------------------------------
function page_head() {
echo "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\">\n";
echo "<html>\n";
echo "<head>\n";
echo "<title>www.xbdev.net c2html demo</title>\n";
echo "</head>\n";
echo "<body>\n\n";
}// End page_head(..)
//--------------------------------------------------------------------------------
function page_foot() {
echo "\n\n</body>\n";
echo "</html>\n";
}// End page_foot(..)
//--------------------------------------------------------------------------------
function file_start() {
echo "<pre>\n";
}// End file_start()
//--------------------------------------------------------------------------------
function file_end() {
echo "</pre>\n";
echo "<hr />\n";
}// End file_end()
/********************************************************************************/
/* */
/* Program Entry Point */
/* */
/********************************************************************************/
function convert2html($content)
{
page_head();
file_start();
$cc = parse($content);
echo( $cc );
file_end();
page_foot();
}// End convert2html(..)
//--------------------------------------------------------------------------------
?>
 | Things to Try |  |
A few ideas for you to explore if you're interested in taking this implementation futher:
• Additional styles and colors
• Options to select the 'style' (different color options)
• Use 'styles' instead of hard coded elements (easier to update and maintain) - also the future direction of the web.
• Add tooltip or hover over options - hover over blocks of code - you can add a 'popup' tooltip?
• Add line numbers down the side
• Explore different fonts/layouts to make the code more readable
 | Resources & Links |  |
• Based on C2HTML Version in C++ [LINK]
• Live Exampel of the Code [LINK]
|