Block comments are parsed out, but the commentchar removal isn't working
yet. I'll refactor that code out of it's current home, and move it into
`parse`, as I need to know what _kind_ of comment it is that I'm
stripping. Carrying that metadata around doesn't make any sense, so
I'll just convert the comment on the fly into a set of non-comment
strings.
In the same way that it makes sense to skip the shebang (#!) line in
scripts, it makes sense to skip the encoding definition in Python files
(described by [PEP 263][p]) and Ruby 1.9 files (similar enough syntax
that it's not worth worrying about.
[p]: http://www.python.org/dev/peps/pep-0263/
Adding comment characters for bash, c, c++, coffee script, java, javascript, lua, python, ruby, and scheme. Paving the way for block-comment parsing later on...
Closes issue #20.
`pygmentize` 1.0+ has an `-N` option that attempts to match a file (via
the extension) to a language lexer. If `pygmentize` is installed, we'll
run it with this option to get a language.
If no language is detected, `pygmentize -N` returns `text`. In that case,
we'll first look for a user-provided language to use as a fallback. If no
language was provided, highlight using `ruby` as a reasonable default.
Closes issue #19.
This closes issue #10, in theory, but I'm not completely happy with the
behavior. The output for both UTF-8 and ISO-8859-1 sources is arguably
correct, but I think it'd be better to do some autodetecting of the file
encoding, and explicitly convert everything to UTF-8 on input. One
option is the [`chardet` gem][gem], but I'm loath to add another
dependency to Rocco...
[gem]: http://rubygems.org/gems/chardet/versions/0.9.0