Commit Graph

13 Commits

Author SHA1 Message Date
Mike West
b11543d382 Normalizing leading space in comments
That is:

    def function:
        """
            This is a comment
            with _lots_ of leading
            space!  OMG!
        """
        pass

Will parse into:

    [
        [
            [   "This is a comment",
                "with _lots_ of leading",
                "space!  OMG!"
            ],
            ...
        ]
    ]
2010-11-22 14:42:21 +01:00
Mike West
d0211ecc99 Python block comments (no middle character), and CSS syntax 2010-11-22 13:38:03 +01:00
Mike West
77dff765b6 Fixing tests for block comments. 2010-11-22 08:41:54 +01:00
Mike West
f177a9d7e2 Block comment parsing: basics.
Block comments are parsed out, but the commentchar removal isn't working
yet.  I'll refactor that code out of it's current home, and move it into
`parse`, as I need to know what _kind_ of comment it is that I'm
stripping.  Carrying that metadata around doesn't make any sense, so
I'll just convert the comment on the fly into a set of non-comment
strings.
2010-11-22 08:25:40 +01:00
Mike West
d067210faa Refactoring comment_char internals: prepping for block comments 2010-11-21 16:53:22 +01:00
Mike West
6aa2217706 Refactoring tests out into separate files. 2010-10-24 11:08:04 +02:00
Mike West
e506c5172a Skipping Python/Ruby 1.9 source encoding
In the same way that it makes sense to skip the shebang (#!) line in
scripts, it makes sense to skip the encoding definition in Python files
(described by [PEP 263][p]) and Ruby 1.9 files (similar enough syntax
that it's not worth worrying about.

[p]: http://www.python.org/dev/peps/pep-0263/
2010-10-21 20:10:30 +02:00
Mike West
185da24fc3 Cleaning up indent spacing in test file. 2010-10-20 17:07:32 +02:00
Mike West
020e8050bc Autopopulate comment_chars for known languages
Adding comment characters for bash, c, c++, coffee script, java, javascript, lua, python, ruby, and scheme.  Paving the way for block-comment parsing later on...

Closes issue #20.
2010-10-20 17:07:14 +02:00
Mike West
0b392c1094 Attempt to autodetect file language
`pygmentize` 1.0+ has an `-N` option that attempts to match a file (via
the extension) to a language lexer.  If `pygmentize` is installed, we'll
run it with this option to get a language.

If no language is detected, `pygmentize -N` returns `text`.  In that case,
we'll first look for a user-provided language to use as a fallback.  If no
language was provided, highlight using `ruby` as a reasonable default.

Closes issue #19.
2010-10-20 15:11:07 +02:00
Mike West
1b211bcc08 Specify encoding for Pygments
This closes issue #10, in theory, but I'm not completely happy with the
behavior.  The output for both UTF-8 and ISO-8859-1 sources is arguably
correct, but I think it'd be better to do some autodetecting of the file
encoding, and explicitly convert everything to UTF-8 on input.  One
option is the [`chardet` gem][gem], but I'm loath to add another
dependency to Rocco...

[gem]: http://rubygems.org/gems/chardet/versions/0.9.0
2010-10-19 13:32:03 +02:00
Mike West
38683a8cc2 Cleaning up tests after bugfix merges:
As a result of fixing issue #15, a few tests broken.  This commit brings
the tests up to date with the latest behavior.
2010-10-19 13:08:13 +02:00
Mike West
6cf8de0a02 Adding a basic test suite. 2010-10-17 20:46:28 +02:00