RUBY-200 migrating confluence docs to /docs

2010-11-12 17:59:27 -05:00 · 2010-11-12 17:59:27 -05:00 · dfe40238c9
commit dfe40238c9
parent a56636b3b2
5 changed files with 349 additions and 3 deletions
--- a/README.md
+++ b/README.md
@ -5,8 +5,11 @@ This is the 10gen-supported Ruby driver for [MongoDB](http://www.mongodb.org).
 This documentation includes other articles of interest, include:

 1. [A tutorial](http://api.mongodb.org/ruby/current/file.TUTORIAL.html).
-2. [History](http://api.mongodb.org/ruby/current/file.HISTORY.html).
-3. [Credits](http://api.mongodb.org/ruby/current/file.CREDITS.html).
+2. [Replica Sets in Ruby](http://api.mongodb.org/ruby/current/file.REPLICA_SETS.html).
+3. [GridFS in Ruby](http://api.mongodb.org/ruby/current/file.GridFS.html).
+4. [Frequently Asked Questions](http://api.mongodb.org/ruby/current/file.FAQ.html).
+5. [History](http://api.mongodb.org/ruby/current/file.HISTORY.html).
+6. [Credits](http://api.mongodb.org/ruby/current/file.CREDITS.html).

 Here's a quick code sample. Again, see the [MongoDB Ruby Tutorial](http://api.mongodb.org/ruby/current/file.TUTORIAL.html)
 for much more:
--- a/2
+++ b/2
@ -169,7 +169,7 @@ task :ydoc do
  require File.join(File.dirname(__FILE__), 'lib', 'mongo')
  out = File.join('ydoc', Mongo::VERSION)
  FileUtils.rm_rf('ydoc')
-  system "yardoc lib/**/*.rb lib/mongo/**/*.rb lib/bson/**/*.rb -e yard/yard_ext.rb -p yard/templates -o #{out} --title MongoRuby-#{Mongo::VERSION} --files docs/TUTORIAL.md,docs/HISTORY.md,docs/CREDITS.md,docs/1.0_UPGRADE.md"
+  system "yardoc lib/**/*.rb lib/mongo/**/*.rb lib/bson/**/*.rb -e yard/yard_ext.rb -p yard/templates -o #{out} --title MongoRuby-#{Mongo::VERSION} --files docs/TUTORIAL.md,docs/GridFS.md,docs/FAQ.md,docs/REPLICA_SETS.md,docs/HISTORY.md,docs/CREDITS.md,docs/1.0_UPGRADE.md"
 end

 namespace :gem do
--- a/docs/FAQ.md
+++ b/docs/FAQ.md
@ -0,0 +1,112 @@
+# Ruby MongoDB FAQ
+
+This is a list of frequently asked questions about using Ruby with MongoDB. If you have a question you'd like to have answered here, please post your question to the [mongodb-user list](http://groups.google.com/group/mongodb-user).
+
+#### Can I run (insert command name here) from the Ruby driver?
+
+Yes. You can run any of the [available database commands|List of Database Commands] from the driver using the DB#command method. The only trick is to use an OrderedHash when specifying the command. For example, here's how you'd run an asynchronous fsync from the driver:
+
+
+    # This command is run on the admin database.
+    @db = Mongo::Connection.new.db('admin')  
+
+    # Build the command.
+    cmd = OrderedHash.new
+    cmd['fsync'] = 1
+    cmd['async'] = true
+
+    # Run it.
+    @db.command(cmd)
+
+
+It's important to keep in mind that some commands, like `fsync`, must be run on the `admin` database, while other commands can be run on any database. If you're having trouble, check the [command reference|List of Database Commands] to make sure you're using the command correctly.
+
+#### Does the Ruby driver support an EXPLAIN command?
+
+Yes. `explain` is, technically speaking, an option sent to a query that tells MongoDB to return an explain plan rather than the query's results. You can use `explain` by constructing a query and calling explain at the end:
+
+
+    @collection = @db['users']
+    result = @collection.find({:name => "jones"}).explain
+
+
+The resulting explain plan might look something like this:
+
+
+    {"cursor"=>"BtreeCursor name_1", 
+     "startKey"=>{"name"=>"Jones"}, 
+     "endKey"=>{"name"=>"Jones"}, 
+     "nscanned"=>1.0, 
+     "n"=>1, 
+     "millis"=>0, 
+     "oldPlan"=>{"cursor"=>"BtreeCursor name_1", 
+                   "startKey"=>{"name"=>"Jones"}, 
+                   "endKey"=>{"name"=>"Jones"}
+     },
+     "allPlans"=>[{"cursor"=>"BtreeCursor name_1", 
+                     "startKey"=>{"name"=>"Jones"}, 
+                     "endKey"=>{"name"=>"Jones"`]
+     }
+
+
+Because this collection has an index on the "name" field, the query uses that index, only having to scan a single record. "n" is the number of records the query will return. "millis" is the time the query takes, in milliseconds. "oldPlan" indicates that the query optimizer has already seen this kind of query and has, therefore, saved an efficient query plan. "allPlans" shows all the plans considered for this query.
+
+#### I see that BSON supports a symbol type. Does this mean that I can store Ruby symbols in MongoDB?
+
+You can store Ruby symbols in MongoDB, but only as values. BSON specifies that document keys must be strings. So, for instance, you can do this:
+
+
+    @collection = @db['test']
+
+    boat_id = @collection.save({:vehicle  => :boat})
+    car_id  = @collection.save({"vehicle" => "car"})
+
+    @collection.find_one('_id' => boat_id)
+    {"_id" => ObjectID('4bb372a8238d3b5c8c000001'), "vehicle" => :boat}
+
+
+    @collection.find_one('_id' => car_id)
+    {"_id" => ObjectID('4bb372a8238d3b5c8c000002'), "vehicle" => "car"}
+
+
+Notice that the symbol values are returned as expected, but that symbol keys are treated as strings.
+
+#### Why can't I access random elements within a cursor?
+
+MongoDB cursors are designed for sequentially iterating over a result set, and all the drivers, including the Ruby driver, stick closely to this directive. Internally, a Ruby cursor fetches results in batches by running a MongoDB `getmore` operation. The results are buffered for efficient iteration on the client-side.
+
+What this means is that a cursor is nothing more than a device for returning a result set on a query that's been initiated on the server. Cursors are not containers for result sets. If we allow a cursor to be randomly accessed, then we run into issues regarding the freshness of the data. For instance, if I iterate over a cursor and then want to retrieve the cursor's first element, should a stored copy be returned, or should the cursor re-run the query? If we returned a stored copy, it may not be fresh. And if the the query is re-run, then we're technically dealing with a new cursor.
+
+To avoid those issues, we're saying that anyone who needs flexible access to the results of a query should store those results in an array and then access the data as needed.
+
+#### Why can't I save an instance of TimeWithZone?
+
+MongoDB stores times in UTC as the number of milliseconds since the epoch. This means that the Ruby driver serializes Ruby Time objects only. While it would certainly be possible to serialize a TimeWithZone, this isn't preferable since the driver would still deserialize to a Time object.
+
+All that said, if necessary, it'd be easy to write a thin wrapper over the driver that would store an extra time zone attribute and handle the serialization/deserialization of TimeWithZone transparently.
+
+#### I keep getting CURSOR_NOT_FOUND exceptions. What's happening?
+
+The most likely culprit here is that the cursor is timing out on the server. Whenever you issue a query, a cursor is created on the server. Cursor naturally time out after ten minutes, which means that if you happen to be iterating over a cursor for more than ten minutes, you risk a CURSOR_NOT_FOUND exception.
+
+There are two solutions to this problem. You can either:
+
+1. Limit your query. Use some combination of `limit` and `skip` to reduce the total number of query results. This will, obviously, bring down the time it takes to iterate.
+
+2. Turn off the cursor timeout. To do that, invoke `find` with a block, and pass `:timeout => true`:
+
+        @collection.find({}, :timeout => false) do |cursor|
+          cursor.each do |document
+            # Process documents here
+          end
+        end
+
+#### I periodically see connection failures between the driver and MongoDB. Why can't the driver retry the operation automatically?
+
+A connection failure can indicate any number of failure scenarios. Has the server crashed? Are we experiencing a temporary network partition? Is there a bug in our ssh tunnel?
+
+Without further investigation, it's impossible to know exactly what has caused the connection failure. Furthermore, when we do see a connection failure, it's impossible to  know how many operations prior to the failure succeeded. Imagine, for instance, that we're using safe mode and we send an `$inc` operation to the server. It's entirely possible that the server has received the `$inc` but failed on the call to `getLastError`. In that case, retrying the operation would result in a double-increment.
+
+Because of the indeterminacy involved, the MongoDB drivers will not retry operations on connection failure. How connection failures should be handled is entirely dependent on the application. Therefore, we leave it to the application developers to make the best decision in this case.
+
+The drivers will reconnect on the subsequent operation.
--- a/docs/GridFS.md
+++ b/docs/GridFS.md
@ -0,0 +1,158 @@
+# GridFS in Ruby
+
+GridFS, which stands for "Grid File Store," is a specification for storing large files in MongoDB. It works by dividing a file into manageable chunks and storing each of those chunks as a separate document. GridFS requires two collections to achieve this: one collection stores each file's metadata (e.g., name, size, etc.) and another stores the chunks themselves. If you're interested in more details, check out the [GridFS Specification](http://www.mongodb.org/display/DOCS/GridFS+Specification).
+
+### The Grid class
+
+The [Grid class](http://api.mongodb.org/ruby/current/Mongo/Grid.html) represents the core GridFS implementation. Grid gives you a simple file store, keyed on a unique ID. This means that duplicate filenames aren't a problem. To use the Grid class, first make sure you have a database, and then instantiate a Grid:
+
+
+    @db = Mongo::Connection.new.db('social_site')
+    @grid = Grid.new(@db)
+
+#### Saving files
+Once you have a Grid object, you can start saving data to it. The data can be either a string or an IO-like object that responds to a #read method:
+
+
+    # Saving string data
+    id = @grid.put("here's some string / binary data")
+
+    # Saving IO data and including the optional filename
+    image = File.open("me.jpg")
+    id2   = @grid.put(image, :filename => "me.jpg")
+
+
+Grid#put returns an object id, which you can use to retrieve the file:
+
+
+    # Get the string we saved
+    file = @grid.get(id)
+
+    # Get the file we saved
+    image = @grid.get(id2)
+
+
+#### File metadata
+
+There are accessors for the various file attributes:
+
+
+    image.filename
+    # => "me.jpg"
+
+    image.content_type
+    # => "image/jpg"
+
+    image.file_length
+    # => 502357
+
+    image.upload_date
+    # => Mon Mar 01 16:18:30 UTC 2010
+
+    # Read all the image's data at once
+    image.read
+
+    # Read the first 100k bytes of the image
+    image.read(100 * 1024)
+
+
+When putting a file, you can set many of these attributes and write arbitrary metadata:
+
+
+    # Saving IO data
+    file = File.open("me.jpg")
+    id2  = @grid.put(file, 
+             :filename     => "my-avatar.jpg" 
+             :content_type => "application/jpg", 
+             :_id          => 'a-unique-id-to-use-in-lieu-of-a-random-one',
+             :chunk_size   => 100 * 1024,
+             :metadata     => {'description' => "taken after a game of ultimate"})
+
+
+#### Safe mode
+
+A kind of safe mode is built into the GridFS specification. When you save a file, and MD5 hash is created on the server. If you save the file in safe mode, an MD5 will be created on the client for comparison with the server version. If the two hashes don't match, an exception will be raised.
+
+
+    image = File.open("me.jpg")
+    id2   = @grid.put(image, "my-avatar.jpg", :safe => true) 
+
+
+#### Deleting files
+
+Deleting a file is as simple as providing the id:
+
+
+    @grid.delete(id2)
+
+
+### The GridFileSystem class
+
+[GridFileSystem](http://api.mongodb.org/ruby/current/Mongo/GridFileSystem.html) is a light emulation of a file system and therefore has a couple of unique properties. The first is that filenames are assumed to be unique. The second, a consequence of the first, is that files are versioned. To see what this means, let's create a GridFileSystem instance:
+
+#### Saving files
+
+    @db = Mongo::Connection.new.db("social_site")
+    @fs = GridFileSystem.new(@db)
+
+Now suppose we want to save the file 'me.jpg.' This is easily done using a filesystem-like API:
+
+
+    image = File.open("me.jpg")
+    @fs.open("me.jpg", "w") do |f|
+      f.write image
+    end 
+
+
+We can then retrieve the file by filename:
+
+
+    image = @fs.open("me.jpg", "r") {|f| f.read }
+
+
+No problems there. But what if we need to replace the file? That too is straightforward:
+
+
+    image = File.open("me-dancing.jpg")
+    @fs.open("me.jpg", "w") do |f|
+      f.write image
+    end 
+
+
+But a couple things need to be kept in mind. First is that the original 'me.jpg' will be available until the new 'me.jpg' saves. From then on, calls to the #open method will always return the most recently saved version of a file. But, and this the second point, old versions of the file won't be deleted. So if you're going to be rewriting files often, you could end up with a lot of old versions piling up. One solution to this is to use the :delete_old options when writing a file:
+
+
+    image = File.open("me-dancing.jpg")
+    @fs.open("me.jpg", "w", :delete_old => true) do |f|
+      f.write image
+    end 
+
+
+This will delete all but the latest version of the file.
+
+
+#### Deleting files
+
+When you delete a file by name, you delete all versions of that file:
+
+
+    @fs.delete("me.jpg")
+
+
+#### Metadata and safe mode
+
+All of the options for storing metadata and saving in safe mode are available for the GridFileSystem class:
+
+
+    image = File.open("me.jpg")
+    @fs.open('my-avatar.jpg', w,  
+               :content_type => "application/jpg", 
+               :metadata     => {'description' => "taken on 3/1/2010 after a game of ultimate"},
+               :_id          => 'a-unique-id-to-use-instead-of-the-automatically-generated-one',
+               :safe         => true) { |f| f.write image }
+
+
+### Advanced Users
+
+Astute code readers will notice that the Grid and GridFileSystem classes are merely thin wrappers around an underlying [GridIO class](http://api.mongodb.org/ruby/current/Mongo/GridIO.html). This means that it's easy to customize the GridFS implementation presented here; just use GridIO for all the low-level work, and build the API you need in an external manager class similar to Grid or GridFileSystem.
+
--- a/docs/REPLICA_SETS.md
+++ b/docs/REPLICA_SETS.md
@ -0,0 +1,73 @@
+# Replica Sets in Ruby
+
+Here follow a few considerations for those using the MongoDB Ruby driver with [replica sets](http://www.mongodb.org/display/DOCS/Replica+Sets).
+
+### Setup
+
+First, make sure that you've configured and initialized a replica set.
+
+Connecting to a replica set from the Ruby driver is easy. If you only want to specify a single node, simply pass that node to `Connection.new`:
+
+    @connection = Connection.new('foo.local', 27017)
+
+If you want to pass in multiple seed nodes, use `Connection.multi`:
+
+    @connection = Connection.multi([['n1.mydb.net', 27017], 
+       ['n2.mydb.net', 27017], ['n3.mydb.net', 27017]])
+
+In both cases, the driver will attempt to connect to a master node and, when found, will merge any other known members of the replica set into the seed list.
+
+### Connection Failures
+
+Imagine that our master node goes offline. How will the driver respond?
+
+At first, the driver will try to send operations to what was the master node. These operations will fail, and the driver will raise a *ConnectionFailure* exception. It then becomes the client's responsibility to decide how to handle this.
+
+If the client decides to retry, it's not guaranteed that another member of the replica set will have been promoted to master right away, so it's still possible that the driver will raise another *ConnectionFailure*. However, once a member has been promoted to master, typically within a few seconds, subsequent operations will succeed.
+
+The driver will essentially cycle through all known seed addresses until a node identifies itself as master.
+
+### Recovery
+
+Driver users may wish to wrap their database calls with failure recovery code. Here's one possibility:
+
+    # Ensure retry upon failure
+    def rescue_connection_failure(max_retries=5)
+        success = false
+        retries = 0
+        while !success
+          begin
+            yield
+            success = true
+          rescue Mongo::ConnectionFailure => ex
+            retries += 1
+            raise ex if retries >= max_retries
+            sleep(1)
+          end
+        end
+      end
+    end
+
+    # Wrapping a call to #count()
+    rescue_connection_failure do
+      @db.collection('users').count()
+    end
+
+Of course, the proper way to handle connection failures will always depend on the individual application. We encourage object-mapper and application developers to publish any promising results.
+
+### Testing
+
+The Ruby driver (>= 1.0.6) includes some unit tests for verifying replica set behavior. They reside in *tests/replica_sets*. You can run them individually with the following rake tasks:
+
+    rake test:replica_set_count
+    rake test:replica_set_insert
+    rake test:pooled_replica_set_insert
+    rake test:replica_set_query
+
+Make sure you have a replica set running on localhost before trying to run these tests.
+
+### Further Reading
+
+* [Replica Sets](http://www.mongodb.org/display/DOCS/Replica+Set+Configuration)
+* [Replics Set Configuration](http://www.mongodb.org/display/DOCS/Replica+Set+Configuration)
+