Double-typed relations for partial data representation
case class Person(name: String, address: Address)
This representation has advantages:
- strongly typed data access,
- meta information binding is available through annotations,
- if there are many entities then the data handling becomes routine (copy-paste);
- annotations are rather inconvenient as a means of metainformation representation;
- partial data cannot be represented easily - we need either to use empty values for unavailable fields, or create additional classes with required attributes;
- it is difficult to represent the change of a property.
Double-typed relationsWe would like to create a model of object-oriented data structure. In our model every class, an instance, a property and any other aspects of the structure are represented as objects. To abstract the property declaration as an object (rather than a method or a field definition) we need to use some type. As soon as the property declaration contains the data type
T, it is necessary to include it as a generic argument to the property type. Thus a type of an object that can represent the property itself could be something like
Slot[T]. The single type is used only to constrain the data that the property can contain. However, the property is usually defined within the object class and in fact it is implicitly bound to the containing class.
Let's model the property as a relation parametrized with two types - the type
Lof the container and the type
Rof the value:
sealed trait Relation[-L,R] case class Rel[-L, R](name: String) extends Relation[L, R]
-Lmeans "contravariance", i.e. the property will also be available for descendants of
L. The property is invariant on type
Rbecause for the property getters and setters are available.)
Relclass is the way to declare simple ordinary attributes of type
L. For example:
class Box val width = Rel[Box, Int]("width") val height = Rel[Box, Int]("height")
(Thanks to the contravariance, these properties are also available for descendents of the Box). (Note also that the class itself needs no methods nor properties.)
The property can also contain any application specific metainformation - database domain, property description, serializer/deserializer, constraint, column width, data format, etc. It is also easy to bind additional layer-specific metainformation using
For the type parameter
Lwe need some real type. Any valid Scala type can be used here - a primitive type, a type alias, a trait, an abstract/final class, an object.type, etc. Contravariance of the type parameter L allows us to leverage inheritance relation of the types used. It seems to be convenient to reflect relations between domain entities directly by a hierarchy of classes and traits:
abstract class Shape trait BoundingRectangle final class Rectangle extends Shape with BoundingRectangle final class Circle extends Shape with BoundingRectangle val width = Rel[BoundingRectangle, Int]("width") val height = Rel[BoundingRectangle, Int]("height") val radius = Rel[Circle, Int]("radius")
An attribute can be considered as a path component for navigation from parent to child instance. If the child instance has it's own attributes, we can navigate further. A pair of adjacent attributes can be combined to a single relation between the grandparent and the grandchild -
case class Rel2[-L, M, R](_1: Relation[L, M], _2: Relation[M, R]) extends Relation[L, R]
DSL has an extension method `/` for convenient composition of relations (it constructs Rel2).
Relationfits very well into RDF/OWL ontology. It is the middle component of the triple:
(identifier of an instance of type
L, identifier of property
Relation[L,R], identifier of an instance of type
Let's see what can we do with identifiers of instances.
Strongly-typed identifiersWhen we have partial object description with a set of attributes, it is very important to match different set of attributes with the same instance. We need some way to model the authenticity of the instance. When we use ordinary classes all attributes are localized within the same instance and that is enough. In databases it is possible to obtain partial set of attributes and so the necessity to relate attributes to a particular object is obvious. Often a surrogate ID is used. The object obtains a special attribute that has unique value throughout all available instances.
Here we can also employ the database approach and use some attribute as an identifier. As far as our attributes are bound to the containing class, it is also necessary to bind the type of the identifier to the same entity type. This will enable compile-time safety when dealing with an instance identifier and ascribed attributes.
A simple way to declare identifier type is to use single type of entity as a generic type parameter:
However, this is not really general. Firstly, many objects do not have global identifiers. They can only be found within some scope, relative to parent. Secondly, some objects can be identified by different attributes.
What if we use some
Relation[?,?]as a way of navigating from parent to child?
Let's have a collection of children as an attribute of some
val children = Rel[Parent, Seq[Child]]("children")
and let's introduce a relation between the collection and it's elements:
case class IntId[T](id: Int) extends Relation[Seq[T], T]
Note that unlike attributes where the relation is differentiated with
name:String, the relation between the collection and it's elements can be "named" by position of the element within the collection.
These two relations combined together provide us with the necessary tool to identify a particular child element:
val child123 = children / IntId(123)
child123- is a composite identifier of an object
[Child]relative to some instance of the type
Parent. It is 123-rd element of the children's collection. (Here again the DSL composition method
`/`is used). This way of identification allows us to navigate from the given parent to the required child.
What if we need to identify children with an arbitrary attribute instead of the position within a collection? For example if we know that the "name" attribute is unique, we could refer to children by names. In database management systems unique indexes are the typical solution. Let's implement the similar approach here.
trait IndexedCollection[TId, T]
- a special type (without implementation), that simply denotes some collection of elements
Tthat can be accessed by some index value of type
TId. Let's introduce some attribute that will "contain" the index of some collection. The attribute's
Ltype should be the type of collection -
Seq[T], the attribute's
Rtype should be the
case class Index[TId, T](keyProperty: Relation[T, TId]) extends Relation[Seq[T],IndexedCollection[TId, T]]
Note that the relation identifies the index that is based on the
keyProperty. The only thing to define is an index value that can be used as a key to find elements:
case class IndexValue[TId, T](value:TId) extends Relation[IndexedCollection[TId, T], T]
val name = Rel[Child, String]("name") // declare a simple property val childByName = name.index // create an index identifier. The index is based on the property values. val childJohn = parent / children / childByName / IndexValue("John")
Hence, the type
Relation[-L,R]with a couple of descendents, allows us to navigate hierarchical data structures. To identify top level objects that do not have any parent, we introduce an auxiliary type
Globalthat will be the source of references to top level objects:
final class Global val persons = Rel[Global, Seq[Person]]("persons") val otherTopLevelObjects = Rel[Global, Seq[OtherTopLevelObject]]("otherTopLevelObjects")
The scheme of dataThe double-typed relations are the building blocks that can be used for both the data structures and the scheme construction. The scheme can be either relational or object-oriented. The object-oriented scheme is based on a few descendents of
SimpleScheme[T]- for simple types without attributes;
RecordScheme[T]- for composite types with attributes;
CollectionScheme[T]- for attributes of type Seq[T] in order to bind element scheme.
Data representationThe metainformation cannot hold any data. To use it we need some kind of storage. The storage depends on application requirements and can be any of the following:
- ordinary classes with getters/setters. The access to the data can be performed either via reflection by property name, or with lenses;
- special universal classes (descendents of
Instance[T]) that resembles maps.
SimpleInstance, RecordInstance, CollectionInstanceare used to store data that satisfies corresponding scheme. These types simplify generic data handling because the data is stored directly according to the scheme;
- linear tuple (
List[Any]), and a "list of lists". Hierarchical records (objects) can be flattened to a linear structure - the sequence of primitive types. Inner collections can be represented as a list of lists of inner types. This representation can be used to send data over network and to save data to database tables. To convert Instances to and from flat lists a pair of operations is used -
- database tables, RecordSet's;
DSL for data constructionWhile constructing records from pieces it is important to have the compiler checking type compatibility. We should only be able to use properties on those types on which they have been defined. (It is the main reason to have the left generic-type on Relation definition). That's why we need special tools for building instances. For example:
val b1 = empty[Box] .set(width, simple(10)) .set(height, simple(20))
Here an immutable type
Instance[Box]is used. The instance is recreated each time when we add a pair (property, value). When we have more data, it is more efficient to have a mutable Builder that is converted to Instance afterwards:
val boxBuilder = new Builder(boxSchema) boxBuilder.set(width, simple(10)) boxBuilder.set(height, simple(20)) val b1 = boxBuilder.toInstance
The builder also performs runtime checks:
- Denying the use of properties that are not available in the scheme;
- Final check for completeness of the instance.
ConclusionThe approach allows
- to define complex domains with explicit and convenient metainformation;
- to manipulate properties with compile-time check;
- to represent hierarchical data structures (like json or xml) with strong types at every layer;
- to represent incomplete information and to check the level of completeness (for example we may test an instance against a