In MongoDB, a document is roughly equivalent to a row in an RDBMS. When working with relational databases, rows are stored in tables, which have a strict schema that the rows follow. MongoDB stores documents in collections rather than tables - the principle difference is that no schema is enforced at a database level.
MongoEngine allows you to define schemata for documents as this helps to reduce coding errors, and allows for utility methods to be defined on fields which may be present.
To define a schema for a document, create a class that inherits from Document. Fields are specified by adding field objects as class attributes to the document class:
from mongoengine import *
import datetime
class Page(Document):
title = StringField(max_length=200, required=True)
date_modified = DateTimeField(default=datetime.datetime.now)
One of the benefits of MongoDb is dynamic schemas for a collection, whilst data should be planned and organised (after all explicit is better than implicit!) there are scenarios where having dynamic / expando style documents is desirable.
DynamicDocument documents work in the same way as Document but any data / attributes set to them will also be saved
from mongoengine import *
class Page(DynamicDocument):
title = StringField(max_length=200, required=True)
# Create a new page and add tags
>>> page = Page(title='Using MongoEngine')
>>> page.tags = ['mongodb', 'mongoengine']
>>> page.save()
>>> Page.objects(tags='mongoengine').count()
>>> 1
..note:
There is one caveat on Dynamic Documents: fields cannot start with `_`
By default, fields are not required. To make a field mandatory, set the required keyword argument of a field to True. Fields also may have validation constraints available (such as max_length in the example above). Fields may also take default values, which will be used if a value is not provided. Default values may optionally be a callable, which will be called to retrieve the value (such as in the above example). The field types available are as follows:
Each field type can be customized by keyword arguments. The following keyword arguments can be set on all fields:
A value to use when no value is set for this field.
The definion of default parameters follow the general rules on Python, which means that some care should be taken when dealing with default mutable objects (like in ListField or DictField):
class ExampleFirst(Document):
# Default an empty list
values = ListField(IntField(), default=list)
class ExampleSecond(Document):
# Default a set of values
values = ListField(IntField(), default=lambda: [1,2,3])
class ExampleDangerous(Document):
# This can make an .append call to add values to the default (and all the following objects),
# instead to just an object
values = ListField(IntField(), default=[1,2,3])
An iterable (e.g. a list or tuple) of choices to which the value of this field should be limited.
Can be either be a nested tuples of value (stored in mongo) and a human readable key
SIZE = (('S', 'Small'),
('M', 'Medium'),
('L', 'Large'),
('XL', 'Extra Large'),
('XXL', 'Extra Extra Large'))
class Shirt(Document):
size = StringField(max_length=3, choices=SIZE)
Or a flat iterable just containing values
SIZE = ('S', 'M', 'L', 'XL', 'XXL')
class Shirt(Document):
size = StringField(max_length=3, choices=SIZE)
MongoDB allows the storage of lists of items. To add a list of items to a Document, use the ListField field type. ListField takes another field object as its first argument, which specifies which type elements may be stored within the list:
class Page(Document):
tags = ListField(StringField(max_length=50))
MongoDB has the ability to embed documents within other documents. Schemata may be defined for these embedded documents, just as they may be for regular documents. To create an embedded document, just define a document as usual, but inherit from EmbeddedDocument rather than Document:
class Comment(EmbeddedDocument):
content = StringField()
To embed the document within another document, use the EmbeddedDocumentField field type, providing the embedded document class as the first argument:
class Page(Document):
comments = ListField(EmbeddedDocumentField(Comment))
comment1 = Comment(content='Good work!')
comment2 = Comment(content='Nice article!')
page = Page(comments=[comment1, comment2])
Often, an embedded document may be used instead of a dictionary – generally this is recommended as dictionaries don’t support validation or custom field types. However, sometimes you will not know the structure of what you want to store; in this situation a DictField is appropriate:
class SurveyResponse(Document):
date = DateTimeField()
user = ReferenceField(User)
answers = DictField()
survey_response = SurveyResponse(date=datetime.now(), user=request.user)
response_form = ResponseForm(request.POST)
survey_response.answers = response_form.cleaned_data()
survey_response.save()
Dictionaries can store complex data, other dictionaries, lists, references to other objects, so are the most flexible field type available.
References may be stored to other documents in the database using the ReferenceField. Pass in another document class as the first argument to the constructor, then simply assign document objects to the field:
class User(Document):
name = StringField()
class Page(Document):
content = StringField()
author = ReferenceField(User)
john = User(name="John Smith")
john.save()
post = Page(content="Test Page")
post.author = john
post.save()
The User object is automatically turned into a reference behind the scenes, and dereferenced when the Page object is retrieved.
To add a ReferenceField that references the document being defined, use the string 'self' in place of the document class as the argument to ReferenceField‘s constructor. To reference a document that has not yet been defined, use the name of the undefined document as the constructor’s argument:
class Employee(Document):
name = StringField()
boss = ReferenceField('self')
profile_page = ReferenceField('ProfilePage')
class ProfilePage(Document):
content = StringField()
By default, MongoDB doesn’t check the integrity of your data, so deleting documents that other documents still hold references to will lead to consistency issues. Mongoengine’s ReferenceField adds some functionality to safeguard against these kinds of database integrity problems, providing each reference with a delete rule specification. A delete rule is specified by supplying the reverse_delete_rule attributes on the ReferenceField definition, like this:
class Employee(Document):
...
profile_page = ReferenceField('ProfilePage', reverse_delete_rule=mongoengine.NULLIFY)
The declaration in this example means that when an Employee object is removed, the ProfilePage that belongs to that employee is removed as well. If a whole batch of employees is removed, all profile pages that are linked are removed as well.
Its value can take any of the following constants:
Warning
A safety note on setting up these delete rules! Since the delete rules are not recorded on the database level by MongoDB itself, but instead at runtime, in-memory, by the MongoEngine module, it is of the upmost importance that the module that declares the relationship is loaded BEFORE the delete is invoked.
If, for example, the Employee object lives in the payroll app, and the ProfilePage in the people app, it is extremely important that the people app is loaded before any employee is removed, because otherwise, MongoEngine could never know this relationship exists.
In Django, be sure to put all apps that have such delete rule declarations in their models.py in the INSTALLED_APPS tuple.
A second kind of reference field also exists, GenericReferenceField. This allows you to reference any kind of Document, and hence doesn’t take a Document subclass as a constructor argument:
class Link(Document):
url = StringField()
class Post(Document):
title = StringField()
class Bookmark(Document):
bookmark_object = GenericReferenceField()
link = Link(url='http://hmarr.com/mongoengine/')
link.save()
post = Post(title='Using MongoEngine')
post.save()
Bookmark(bookmark_object=link).save()
Bookmark(bookmark_object=post).save()
Note
Using GenericReferenceFields is slightly less efficient than the standard ReferenceFields, so if you will only be referencing one document type, prefer the standard ReferenceField.
MongoEngine allows you to specify that a field should be unique across a collection by providing unique=True to a Field‘s constructor. If you try to save a document that has the same value for a unique field as a document that is already in the database, a OperationError will be raised. You may also specify multi-field uniqueness constraints by using unique_with, which may be either a single field name, or a list or tuple of field names:
class User(Document):
username = StringField(unique=True)
first_name = StringField()
last_name = StringField(unique_with='first_name')
You can also skip the whole document validation process by setting validate=False when caling the save() method:
class Recipient(Document):
name = StringField()
email = EmailField()
recipient = Recipient(name='admin', email='root@localhost')
recipient.save() # will raise a ValidationError while
recipient.save(validate=False) # won't
Document classes that inherit directly from Document will have their own collection in the database. The name of the collection is by default the name of the class, coverted to lowercase (so in the example above, the collection would be called page). If you need to change the name of the collection (e.g. to use MongoEngine with an existing database), then create a class dictionary attribute called meta on your document, and set collection to the name of the collection that you want your document class to use:
class Page(Document):
title = StringField(max_length=200, required=True)
meta = {'collection': 'cmsPage'}
A Document may use a Capped Collection by specifying max_documents and max_size in the meta dictionary. max_documents is the maximum number of documents that is allowed to be stored in the collection, and max_size is the maximum size of the collection in bytes. If max_size is not specified and max_documents is, max_size defaults to 10000000 bytes (10MB). The following example shows a Log document that will be limited to 1000 entries and 2MB of disk space:
class Log(Document):
ip_address = StringField()
meta = {'max_documents': 1000, 'max_size': 2000000}
You can specify indexes on collections to make querying faster. This is done by creating a list of index specifications called indexes in the meta dictionary, where an index specification may either be a single field name, a tuple containing multiple field names, or a dictionary containing a full index definition. A direction may be specified on fields by prefixing the field name with a + or a - sign. Note that direction only matters on multi-field indexes.
class Page(Document):
title = StringField()
rating = StringField()
meta = {
'indexes': ['title', ('title', '-rating')]
}
If a dictionary is passed then the following options are available:
Warning
Inheritance adds extra indices. If don’t need inheritance for a document turn inheritance off - see Document inheritance.
Geospatial indexes will be automatically created for all GeoPointFields
It is also possible to explicitly define geospatial indexes. This is useful if you need to define a geospatial index on a subfield of a DictField or a custom field that contains a point. To create a geospatial index you must prefix the field with the * sign.
class Place(Document):
location = DictField()
meta = {
'indexes': [
'*location.point',
],
}
A default ordering can be specified for your QuerySet using the ordering attribute of meta. Ordering will be applied when the QuerySet is created, and can be overridden by subsequent calls to order_by().
from datetime import datetime
class BlogPost(Document):
title = StringField()
published_date = DateTimeField()
meta = {
'ordering': ['-published_date']
}
blog_post_1 = BlogPost(title="Blog Post #1")
blog_post_1.published_date = datetime(2010, 1, 5, 0, 0 ,0)
blog_post_2 = BlogPost(title="Blog Post #2")
blog_post_2.published_date = datetime(2010, 1, 6, 0, 0 ,0)
blog_post_3 = BlogPost(title="Blog Post #3")
blog_post_3.published_date = datetime(2010, 1, 7, 0, 0 ,0)
blog_post_1.save()
blog_post_2.save()
blog_post_3.save()
# get the "first" BlogPost using default ordering
# from BlogPost.meta.ordering
latest_post = BlogPost.objects.first()
assert latest_post.title == "Blog Post #3"
# override default ordering, order BlogPosts by "published_date"
first_post = BlogPost.objects.order_by("+published_date").first()
assert first_post.title == "Blog Post #1"
If your collection is sharded, then you need to specify the shard key as a tuple, using the shard_key attribute of -mongoengine.Document.meta. This ensures that the shard key is sent with the query when calling the save() or update() method on an existing -mongoengine.Document instance:
class LogEntry(Document):
machine = StringField()
app = StringField()
timestamp = DateTimeField()
data = StringField()
meta = {
'shard_key': ('machine', 'timestamp',)
}
To create a specialised type of a Document you have defined, you may subclass it and add any extra fields or methods you may need. As this is new class is not a direct subclass of Document, it will not be stored in its own collection; it will use the same collection as its superclass uses. This allows for more convenient and efficient retrieval of related documents:
# Stored in a collection named 'page'
class Page(Document):
title = StringField(max_length=200, required=True)
meta = {'allow_inheritance': True}
# Also stored in the collection named 'page'
class DatedPage(Page):
date = DateTimeField()
Note
From 0.7 onwards you must declare allow_inheritance in the document meta.
To enable correct retrieval of documents involved in this kind of heirarchy, two extra attributes are stored on each document in the database: _cls and _types. These are hidden from the user through the MongoEngine interface, but may not be present if you are trying to use MongoEngine with an existing database. For this reason, you may disable this inheritance mechansim, removing the dependency of _cls and _types, enabling you to work with existing databases. To disable inheritance on a document class, set allow_inheritance to False in the meta dictionary:
# Will work with data in an existing collection named 'cmsPage'
class Page(Document):
title = StringField(max_length=200, required=True)
meta = {
'collection': 'cmsPage',
'allow_inheritance': False,
}