This closes issue #10, in theory, but I'm not completely happy with the behavior. The output for both UTF-8 and ISO-8859-1 sources is arguably correct, but I think it'd be better to do some autodetecting of the file encoding, and explicitly convert everything to UTF-8 on input. One option is the [`chardet` gem][gem], but I'm loath to add another dependency to Rocco... [gem]: http://rubygems.org/gems/chardet/versions/0.9.0
225 lines
7.4 KiB
225 lines
7.4 KiB
# **Rocco** is a Ruby port of [Docco][do], the quick-and-dirty,
# hundred-line-long, literate-programming-style documentation generator.
# Rocco reads Ruby source files and produces annotated source documentation
# in HTML format. Comments are formatted with [Markdown][md] and presented
# alongside syntax highlighted code so as to give an annotation effect.
# This page is the result of running Rocco against [its own source file][so].
# Most of this was written while waiting for [node.js][no] to build (so I
# could use Docco!). Docco's gorgeous HTML and CSS are taken verbatim.
# The main difference is that Rocco is written in Ruby instead of
# [CoffeeScript][co] and may be a bit easier to obtain and install in
# existing Ruby environments or where node doesn't run yet.
# Install Rocco with Rubygems:
# gem install rocco
# Once installed, the `rocco` command can be used to generate documentation
# for a set of Ruby source files:
# rocco lib/*.rb
# The HTML files are written to the current working directory.
# [no]: http://nodejs.org/
# [do]: http://jashkenas.github.com/docco/
# [co]: http://coffeescript.org/
# [md]: http://daringfireball.net/projects/markdown/
# [so]: http://github.com/rtomayko/rocco/blob/master/lib/rocco.rb#commit
#### Prerequisites
# We'll need a Markdown library. [RDiscount][rd], if we're lucky. Otherwise,
# issue a warning and fall back on using BlueCloth.
# [rd]: http://github.com/rtomayko/rdiscount
require 'rdiscount'
rescue LoadError => boom
warn "WARNING: #{boom}. Trying bluecloth."
require 'bluecloth'
Markdown = BlueCloth
# We use [{{ mustache }}](http://defunkt.github.com/mustache/) for
# HTML templating.
require 'mustache'
# We use `Net::HTTP` to highlight code via <http://pygments.appspot.com>
require 'net/http'
# Code is run through [Pygments](http://pygments.org/) for syntax
# highlighting. If it's not installed, locally, use a webservice.
include FileTest
if !ENV['PATH'].split(':').any? { |dir| executable?("#{dir}/pygmentize") }
warn "WARNING: Pygments not found. Using webservice."
#### Public Interface
# `Rocco.new` takes a source `filename`, an optional list of source filenames
# for other documentation sources, an `options` hash, and an optional `block`.
# The `options` hash respects two members: `:language`, which specifies which
# Pygments lexer to use; and `:comment_chars`, which specifies the comment
# characters of the target language. The options default to `'ruby'` and `'#'`,
# respectively.
# When `block` is given, it must read the contents of the file using whatever
# means necessary and return it as a string. With no `block`, the file is read
# to retrieve data.
class Rocco
VERSION = '0.5'
def initialize(filename, sources=[], options={}, &block)
@file = filename
@data =
if block_given?
defaults = {
:language => 'ruby',
:comment_chars => '#',
:template_file => nil
@options = defaults.merge(options)
@sources = sources
@comment_pattern = Regexp.new("^\\s*#{@options[:comment_chars]}\s?")
@template_file = @options[:template_file]
@sections = highlight(split(parse(@data)))
# The filename as given to `Rocco.new`.
attr_reader :file
# A list of two-tuples representing each *section* of the source file. Each
# item in the list has the form: `[docs_html, code_html]`, where both
# elements are strings containing the documentation and source code HTML,
# respectively.
attr_reader :sections
# A list of all source filenames included in the documentation set. Useful
# for building an index of other files.
attr_reader :sources
# An absolute path to a file that ought be used as a template for the
# HTML-rendered documentation.
attr_reader :template_file
# Generate HTML output for the entire document.
require 'rocco/layout'
def to_html
Rocco::Layout.new(self, @template_file).render
#### Internal Parsing and Highlighting
# Parse the raw file data into a list of two-tuples. Each tuple has the
# form `[docs, code]` where both elements are arrays containing the
# raw lines parsed from the input file. The first line is ignored if it
# is a shebang line.
def parse(data)
sections = []
docs, code = [], []
lines = data.split("\n")
lines.shift if lines[0] =~ /^\#\!/
lines.each do |line|
case line
when @comment_pattern
if code.any?
sections << [docs, code]
docs, code = [], []
docs << line
code << line
sections << [docs, code] if docs.any? || code.any?
# Take the list of paired *sections* two-tuples and split into two
# separate lists: one holding the comments with leaders removed and
# one with the code blocks.
def split(sections)
docs_blocks, code_blocks = [], []
sections.each do |docs,code|
docs_blocks << docs.map { |line| line.sub(@comment_pattern, '') }.join("\n")
code_blocks << code.map do |line|
tabs = line.match(/^(\t+)/)
tabs ? line.sub(/^\t+/, ' ' * tabs.captures[0].length) : line
[docs_blocks, code_blocks]
# Take the result of `split` and apply Markdown formatting to comments and
# syntax highlighting to source code.
def highlight(blocks)
docs_blocks, code_blocks = blocks
# Combine all docs blocks into a single big markdown document with section
# dividers and run through the Markdown processor. Then split it back out
# into separate sections.
markdown = docs_blocks.join("\n\n##### DIVIDER\n\n")
docs_html = Markdown.new(markdown, :smart).
# Combine all code blocks into a single big stream and run through either
# `pygmentize(1)` or <http://pygments.appspot.com>
code_stream = code_blocks.join("\n\n#{@options[:comment_chars]} DIVIDER\n\n")
if ENV['PATH'].split(':').any? { |dir| executable?("#{dir}/pygmentize") }
code_html = highlight_pygmentize(code_stream)
code_html = highlight_webservice(code_stream)
# Do some post-processing on the pygments output to split things back
# into sections and remove partial `<pre>` blocks.
code_html = code_html.
split(/\n*<span class="c.?">#{@options[:comment_chars]} DIVIDER<\/span>\n*/m).
map { |code| code.sub(/\n?<div class="highlight"><pre>/m, '') }.
map { |code| code.sub(/\n?<\/pre><\/div>\n/m, '') }
# Lastly, combine the docs and code lists back into a list of two-tuples.
# We `popen` a read/write pygmentize process in the parent and
# then fork off a child process to write the input.
def highlight_pygmentize(code)
code_html = nil
open("|pygmentize -l #{@options[:language]} -O encoding=utf-8 -f html", 'r+') do |fd|
pid =
fork {
fd.write code
code_html = fd.read
# Pygments is not one of those things that's trivial for a ruby user to install,
# so we'll fall back on a webservice to highlight the code if it isn't available.
def highlight_webservice(code)
{'lang' => @options['language'], 'code' => code}
# And that's it.