Open data standards

S Sam Public Seen by 28

Hi I’m editing this wiki page: http://openfoodfoundation.org/document/open-source-food-projects-list-wiki/target-audience-functionality-web-based-openfood To detail the existing and emerging API’s used in open source food projects.

Additionally 'Big Ag' open data standard is emerging here: http://openag.io/principles/

There is a conversation starting about a shared API/ data standard & whether it's fruitful to engage with the 'Big Ag' version.

The conversation is currently emerging here: http://openfoodfoundation.org/project/standards-protocols/conversation/what-if-anything-should-be-here

It would be great to hear your views..


[deactivated account] Wed 24 Sep 2014 1:16AM

Hello Sam,

I think OpenFarm's current direction is fairly explicitly not Big Ag. As far as I understand us, we're aimed at small-scale farmers who don't have the means or resources of big agriculture companies (correct me if I'm wrong anyone). We're aiming to make food growing accessible and understandable to everyone, instead of just farmers, and pool common wisdom about plants.

Even if that's not yet the case, that's what I hope for, and would personally direct the project forwards in.

On defining a common API and data standard. We're pushing forward with building an incremental API based on OpenFarm's current needs. This is largely because we haven't found any "standard" model of defining things that is open and useable and meets our license requirements.

We came to this conclusion after numerous conversations about this kind of thing.

My gut reaction to the BigAg and OpenAg is that it smells of technological solutionism, one that classically tries to solve problems through technical solutions that might not be appropriate. The fact that this is coming from big agricultural companies only confirms that to me. Similarly, it seems to be a data standard about device communication, rather than actual plant information? I'd love to hear more opinions on this.

Thanks for linking to such a large group of folks that are dealing with the same stuff. I'm definitely interested in talking with you all.


Sam Wed 24 Sep 2014 10:59AM

Hi Simon, thanks for your thoughts, really helpful to see the links where you have already been discussing this stuff.

I realise that you are aimed at smaller growers (I am one myself and excited to see your project develop)

It seems like there are a lot of groups working in open source software for food, http://openfoodfoundation.org/document/open-source-food-projects-list-wiki/target-audience-functionality-web-based-openfood I realise they are diverse projects, with different aims, but as far as possible I'd like to see as little duplicated effort and as much compatibility between you as possible.

I just did this mind map which I'm not entirely sure is helpful: https://www.mindmup.com/#m:a1e7b5def0260401321ab002f3199dafb1 The green bits are supposed to be a life/distribution cycle of a particular carrot or potato.

I think your project sit's more in left hand side? With less interest in tracking any particular carrots life/ distribution cycle?

On the other hand box scheme software would sit more on the right, with a keen interest in the distribution side. I reckon it would be great if they could pull in data from your project, so consumers & growers could learn more about their food.

There is also the whole seed saving/ www.opensourceseedinitiative.org angle, which benefits from tracking actual germplasm/ plant/ seed cycles..

In terms of your project you seem closest to http://growstuff.org/ who are currently defining their API here: http://wiki.growstuff.org/index.php/API their data will be CC-BY-SA Is that license compatible with what you are hoping to do?




Rory Aronson Thu 25 Sep 2014 9:29PM

Hi Sam,

Thanks for linking us to these other projects. I was not aware of them!

You're right that we're closest to what GrowStuff is building, though I think we have a specific niche even so: Crowdsourced and localized growing instructions. This means our API might look different from theirs because we just have different data. On the other hand, we do have some shareable data: crop/species information such as common names, common pests, etc.

However, licensing is a concern (unfortunately). At this point OpenFarm's data is going to be CC-BY, and it might become a mess if we intersperse our data with another source with another license such as GrowStuff's CC-BY-SA. Then there is license chaining to consider and everything becomes annoying and too red-tape-ey to be useful.

It would be great to ensure there is minimized double work, and awesome to see all of these projects sharing data and offering specific values to people. I'm not sure how to proceed and make that happen. Maybe an open invitation discussion/meeting to see where the viable opportunities for collaboration are and to pool efforts?


Sam Thu 25 Sep 2014 10:41PM

Hi Rory

Thanks for getting back. I have started to think about it as an ecosystem of FOSS software that can support various types of users: http://openfoodfoundation.org/document/open-source-food-projects-list-wiki/open-food-software-ecosystem

I think the work you are doing is vital, and I'm keen to see your data available to people running garden or farm information systems, or even box scheme software.

It feels like there are three main categories of data here:

1) Relatively fixed stuff: Latin & English Name, Variety, Taxonomic rankings, Minimum tolerated Temperature, Growing degree days, PH range, Rainfall range, Companion planting (in)compatibility, Common pests, diseases, Photo's.

2) Record keeping stuff, tracking the lives of specific plants and crops in specific geographical locations: Seed parentage information, variety, planting date, germination date, lat/long of plants, date plant care tasks undertaken, harvest date, yield, profit.

3) Information needed to get food distributed. There is a draft XML for this here being called #FoodChat http://openfoodfoundation.org/document/open-source-food-projects-list-wiki/boxomatic-xml-schema

I see these as somewhat hierarchical, as if you are running a food distribution system at (3) you might want to pull in the data about both how that Tomato has been grown (2), and also the taxonomic name etc. (1)

If I understand it correctly, your growing guides don't fit into this neatly, they seem to live between 1&2. The distillation of formal, or informal record keeping, into a coherent and geographically tagged bit of subjective reporting?

I think an on-line meeting would be a good idea, at some stage. There seems to be a lot of exciting activity in this area all of a sudden :)

I have been encouraging people to use #OpenFood as a shared Twitter tag for talking about this stuff.

I see what you are saying about licensing, maybe it's fine if you become the de-facto database for (1) type information.

If people want to grab your data and re-release it under CC-BY-SA is that permitted?




Rory Aronson Fri 26 Sep 2014 1:55AM

I think you broke it down pretty well Sam, and I would add OpenFarm in as it's own category of data: subjective/creative ways of growing plants. So I have:

(1) Factual plant data that is always the same (this could be very easily shared between projects if there was a standard for it) (OpenFarm is pulling this data from http://www.ITIS.gov for now and could turn into an authority/de-facto down the road)
(1.5?) Data on where to get seeds, supplies, equipment sharing, etc
(2) Subjectve data on how to grow plants (OpenFarm)
(3) Record keeping of an individual plant in the real world (GrowStuff and other ag startups now, OpenFarm one day...)
(4) Data for moving individual plants to the consumer (#FoodChat, Open Food Foundation)
(4.5?) Subjective cooking and usage (OpenFarm maybe one day...)

^ This seems comprehensive and succinct as to what everyone in the space is doing and the type of data they are dealing with. It seems to provide clarity towards what open APIs might need to be developed and it provides ideas as to how the different categories/projects might interact. Would love to see if I missed anything or how someone else might break it down!

For licensing, one could take our data and re-release it under CC-BY-SA if they wanted, but it can't go back the other way. The SA license is less permissive and (in my opinion) less open because it places more limitation on the user. However, some say that ensuring openness with the SA is more open. It's a matter of perspective.


Sam Fri 26 Sep 2014 5:45PM

Hi Rory, yes this is starting to look good.

Just thinking about 1)

What about the case where new cultivar's emerge with more drought, or disease tolerance. (Perhaps as a result of Open Source Seed initiatives breeding). So the data is factual, and always the same, but also at any moment in time incomplete. So there needs to be a (peer review?) process of adding to that database?

I also recently came across this plant data .XML http://sourceforge.net/p/kitchengarden/gitcode/ci/master/tree/resources/species.xml Which I think is interesting in the way it maps relationships between species.

Just to also make a point on 3) one of my personal interests is open source seeds, and the emergence of new location adapted cultivars. This project will be much more interesting if records are kept (and linked) of the family history of these seeds. So if I live in a wet climate I may be really interested in the OS kale that has been bred in a wet climate for ten generations. I'm less interested in the kale that has been bred in a dry climate for ten generations. Both of these cultivars may have been bred from the same OS parent seed, but having the historical breeding/environment data is useful to me. This implies that the site where growing records are kept might also be a good place for the sale/exchange of OS seeds to take place..

I have more thoughts but I'm trying to make them (semi) coherent..




Rory Aronson Fri 26 Sep 2014 8:28PM

I saw that XML file too - pretty cool work! It's almost as though there needs to be a social network of plants - who is friends with who, to what degree, and why?

I'm hoping that one day OpenFarm kind of taps into a lot of these data categories. For instance, in addition to the crops database, we could have a seeds database. We could build a garden management console like growveg.com where users virtually plant their garden, specifying which seeds they use and which guides they follow. The user gets an email once a week telling them what to do for each section of their garden based on the Guides, the seeds, and the other conditions.

Aggregate data could be really interesting as to where in the world each plant is being grown and how.


Alex Bayley Mon 29 Sep 2014 12:22AM

Hi folks,

I just thought I'd pop in and link this article which I shared with Rory a couple of weeks ago: http://wiki.growstuff.org/index.php/Importing_data This covers many of the issues related to importing data from one project to another, including the license chaining issues that Rory refers to upthread.

In my experience working on Freebase (http://freebase.com), for Google's knowledge group after they acquired Freebase, and on the early days of Wikidata (http://wikidata.org), I would suggest that the most important thing we can do is cross-reference each others' crops to make connections and interoperation between our datasets more effective. Or, if not connecting directly with each others' data sets, then at least with widely-used third party ones.

Growstuff already connects to Wikipedia which, by extension, connects us to Wikidata, Freebase, and then via Freebase to many other open data sources such as the Encyclopedia of Life.

I should probably write up something about the concepts behind this -- entities, unique identifiers, and semantic equivalence -- and post it somewhere. I'll drop a note here when I have done so.


Lynn Foster Mon 29 Sep 2014 1:33AM

@alexbayley Very interesting to connect to wikipedia to get some common identification!

I, for one, would be very interested if you do write up your experience with unique identifiers, semantic equivalence, etc.


Alex Bayley Mon 29 Sep 2014 2:20AM

OK, I just spent my morning writing something up, which I hope will be of interest: http://talk.growstuff.org/t/open-food-interoperability-entities-unique-ids-and-semantic-equivalence/93


[deactivated account] Tue 30 Sep 2014 1:08AM

Thanks for that incredibly thorough write-up @alexbayley.

I think your Next Steps are sensible and do-able.

I've discussed this with Rory before, and we've decided that the best way forward for OpenFarm was a "we bring in crop names from ITIS for user experience purposes", (at the moment, all we're taking is the crop name and binomial name) but we make it clear that crop data is incomplete, encouraging users to contribute information to the crop.

One way of doing this is linking the crop information to reliable sources of information, and Wikipedia seems like a particularly good one for now. At the moment our database supports adding other data sources, and I really see no reason to limit what they link to. Then we decided that we could use a sort of gamification to make our crop database system more complete, using bounties if it starts getting a bit slow.

Making sure that this information is linked to other data sources to be considered complete seems like a minimal requirement for that.

We'll have to figure out how to implement this within our own API, but it's not overly complex.

I'm wondering about the namespaces that Freebase uses - did freebase define those, or are those linked entity defined?


Alex Bayley Tue 30 Sep 2014 1:31AM

@simonv3 Freebase defined those namespaces.

Incidentally I've been looking at ITIS and I note that it's missing a number of edible crops (I chose a few from http://en.wikipedia.org/wiki/Bushfood which lists edible Australian natives, and of the 13 top-end fruits, only 4 were found in ITIS). It seems that their data set is incomplete -- perhaps it is US-centric?

Ah yes, I see from their website: "The ITIS mission is to create a scientifically credible database of taxonomic information, placing primary focus on taxa of interest to North America."

So it might be useful to link to as a source of reliable identities but it's not a complete source. Pity :(


[deactivated account] Tue 30 Sep 2014 1:43AM

Yeah, it's focus on North America is a bit of a pity, but at the moment it's the best I could find that makes data available under a really free license. Since we're drawing only names from it, I don't think this is a major pitfall though, and prevents a data import, it just means that the hunt for better data sources continues.

Edit: Free in as far as my understanding of free goes. Could have misinterpreted it.


Alex Bayley Tue 30 Sep 2014 2:03AM

@simonv3 As I understand it, data produced by the US federal government must be released into the public domain, which is about as free as it gets :)


Rory Aronson Tue 30 Sep 2014 5:13AM

Hey @alexbayley, thanks for dropping into the conversation here! It's great to have someone with so much experience helping guide our development and be a partner in creating linked data and open standards!

I can't speak to anything technical myself (I'm not a developer) but your blog post makes a lot of sense and I'm glad you set out practical steps that all of the projects can take towards a common future.


Lynn Foster Tue 30 Sep 2014 12:36PM

@alexbayley Thanks again for the write-up! As an aside, we brought your post into another conversation on moving towards standard interfaces for Open Apps. Seems a potentially useful approach for more than just food.

Re. your next steps - We are not working on our http://foodnetworksoftware.org/ any more. If it does make sense to bring it into an ecosystem of food related apps at some point, we'd be happy to start again, and would commit to participating in your next steps if that is the general agreement forward.