Languages without single-line comments (CSS) explode when running
through `highlight`, as the `DIVIDER` mechanism doesn't deal well with
`nil` comment characters. I've reworked the mechanism such that it
uses multi-line comments when single-line comments aren't available.
That is:
def function:
"""
This is a comment
with _lots_ of leading
space! OMG!
"""
pass
Will parse into:
[
[
[ "This is a comment",
"with _lots_ of leading",
"space! OMG!"
],
...
]
]
Block comments are parsed out, but the commentchar removal isn't working
yet. I'll refactor that code out of it's current home, and move it into
`parse`, as I need to know what _kind_ of comment it is that I'm
stripping. Carrying that metadata around doesn't make any sense, so
I'll just convert the comment on the fly into a set of non-comment
strings.
In the same way that it makes sense to skip the shebang (#!) line in
scripts, it makes sense to skip the encoding definition in Python files
(described by [PEP 263][p]) and Ruby 1.9 files (similar enough syntax
that it's not worth worrying about.
[p]: http://www.python.org/dev/peps/pep-0263/
Adding comment characters for bash, c, c++, coffee script, java, javascript, lua, python, ruby, and scheme. Paving the way for block-comment parsing later on...
Closes issue #20.
`pygmentize` 1.0+ has an `-N` option that attempts to match a file (via
the extension) to a language lexer. If `pygmentize` is installed, we'll
run it with this option to get a language.
If no language is detected, `pygmentize -N` returns `text`. In that case,
we'll first look for a user-provided language to use as a fallback. If no
language was provided, highlight using `ruby` as a reasonable default.
Closes issue #19.
This closes issue #10, in theory, but I'm not completely happy with the
behavior. The output for both UTF-8 and ISO-8859-1 sources is arguably
correct, but I think it'd be better to do some autodetecting of the file
encoding, and explicitly convert everything to UTF-8 on input. One
option is the [`chardet` gem][gem], but I'm loath to add another
dependency to Rocco...
[gem]: http://rubygems.org/gems/chardet/versions/0.9.0
In v0.5, the Mustache template is hardcoded as
`./lib/rocco/layout.mustache`. This makes it quite difficult to
style generated content as one must edit the layout file inside the
gem itself to make changes.
I propose leaving that file as a sensible default, but allowing the user
to specify an absolute or relative (to the current working directory)
path to a mustach template of her choosing. That's implemented in this
commit.
The following works in Docco, but not in Rocco:
Level 1 Heading
===============
Level 2 Heading
---------------
Happily, the fix is trivial. In Docco, the regex for comments is:
# Does the line begin with a comment?
l.comment_matcher = new RegExp('^\\s*' + l.symbol + '\\s?')
Changing Rocco's comment pattern to:
@comment_pattern = Regexp.new("^\\s*#{@options[:comment_chars]}\s?")
Solves the problem for me.
Rocco splits against `<span class="c.">`, which works fine for Ruby
where the `span` has a class of `c1`, but fails for Bash (and probably
other languages), where the `span` has a class of `c`. The fix is
trivial.