A standard which enables a user to say anything… is not a collaboration solution.
I've recently heard more than once that the solution to cooperation between organizations who work on overlapping datasets is to use RDF, or more generally that the "Semantic Web" and "linked data" are the solutions to all standardization of data. Don't get me wrong. I don't hate RDF or the ideas of the Semantic Web and linked data. Having said that, they don't solve the problems of inter-institutional collaboration.
RDF essentially prescribes a Subject, Predicate, Direct Object syntax. Basically, that’s the grammar of a two year old. I don’t say that to minimize RDF; I say that to maximize it. RDF has done nothing to standardize anything, anymore than XML has done to standardize data representation. Consider this: If I have to learn all the RDF Predicates a data repository might use (note: RDF Predicates are not really English language predicates because RDF Predicates exclude the direct object and often imply a simple ‘is-a’ verb) and figure out what exactly an author means by those RDF Predicates, then how is that any different than the need to read and understand a set of XML Schema definitions for objects I might find in an XML repository? Let me use a different adjective: how is that any better?
<http://example.org/#spiderman> <http://www.perceive.net/schemas/relationship/enemyOf> <http://example.org/#green-goblin> . <http://example.org/#uncle-ben> <http://www.perceive.net/schemas/relationship/guardianOf> <http://example.org/#spiderman> .
<character id=”spiderman”> <enemyOf>green-goblin</enemyOf> <guardian>uncle-ben</guardian> </character>
Sure, you’ve atomized data into bite-sized chunks and can now build very simple data browsing tools based on these granular chunks of data. I can look up Spiderman and see all known predicates assigned to Spiderman and if those predicates mostly make human sense to me without forcing me to go read their definition, then I can click on one of their direct objects and learn all the predicates related to that object, ad infinitum. Cool right? Sure. Is it more useful for programming a valuable user experience than an XML Schema? Probably not so much.
Don’t miss the problem. The hard part of data interoperability is political agreement. It’s not a technical problem that hasn’t been solved in many ways. Telling people, hey, here’s a standard which allows you to create noun/predicate/direct object syntax and here’s a browser and search tool for that syntax, and thinking that you’ve solved any real problem without getting people to agree on exactly what RDF Predicates to use and what exactly they mean, and forcing them to adopt a unique object identification system and to agree on exactly what each of those object identifiers specify, but then calling it an interoperability standard is worse than TEI. I mean, I love TEI, but no one uses TEI in any standards-compliant way which allows real useful interoperability because TEI allows a user the freedom to build their own schema from many different modules of tags, and-- I can guess because I’ve been there before-- the reason the tags are not more strictly defined is because participating members of the standards team had heartfelt and different usages in mind for how they would apply those tags. The end result is that basically there is no interoperability. There is familiarity, but that’s not the same thing. The hard work EpiDoc has done to wrangle political agreement on a subset of the TEI and strict usage definitions shared between organizations, that is work toward inter-institutional collaboration. Back to RDF and the Semantic Web, sure, work has been done to define very large "vocabularies" and "ontologies" (if you find an authoritative source with a clear definition of the difference between "ontology" and "vocabulary", send me an email) for specialized domains... This is parallel to the work of specializing XML into the TEI and is a step toward interoperability, but are we in the same place as TEI, with different organizations opting to use different subsets of these ontologies? Are the common elements which institutions might share used in the same way? Do the same noun instances common in these organizations use the same unique key? In short, do we have a large body of data available in one of these ontologies (besides the ontologies themselves)? Do we have TWO large bodies of data from TWO different institutions available in the same ontology, both using all the terms in exactly the same way (parallel: TEI -> EpiDoc) and identifying proper nouns with exactly the same keys? Do we have any specialized (= useful) software systems developed based on this ontology which work with BOTH datasets? These are the hard parts-- the time consuming parts-- of inter-institutional collaboration, and they are not strictly technical in nature.
Yeah, so what exactly am I saying? I am saying that once you adopt a unique naming scheme for objects and have multiple institutions agree on that naming scheme and what exactly those objects mean; and once you specifically define predicates which can be used with those object types (e.g., adjectives and relationships) and get more than one institution to agree to actually adopt and implement that schema internally, and finally convince them to make those resources available to other institutions, then you’ve developed a useful standard for interoperability. And then that standard can be described in RDF or XML Schema or a number of other ways. Saying that you’ve adopted RDF is like saying you’ve adopted XML or JSON. Are RDF, XML, and JSON all standards? Sure. Does simply adopting RDF, XML, or JSON mean that you are interoperable? It doesn’t even mean that you’ve begun.