Is there anything missing from the action plan?

Tom (facilitator) Public Seen by 322

What else do you think we should consider including? Why?

David Eccles Mon 26 Mar 2018 10:10AM

Free software is missing from the action plan, and I think it's a good thing to use as an established model to show how we can benefit from making things free and open. If open research is conducted in the spirit of free and open source software, I expect that open data should be as well. I've recently written something about the connection between open research and free software (which is why this is stuck in my head at the moment), finishing up by arguing that the emphasis should be on the users and participants, rather than the developers:

https://gringer.github.io/blogs/open-research

Writing a bug report for your favourite software program, seeing a fix developed within a couple of weeks (not necessarily by the lead developer), then seeing that fixed program released to the world a few months later is an extremely rewarding experience.

Jay Daley Mon 26 Mar 2018 9:13PM

I disagree because I don't think that software, of any variety, has an place in an open data action plan. To be clear, I'm a very strong open source advocate, so it's not about that. It's important when talking about open data not to maximise its potential for reuse and that's best done by focusing solely on the nature of that data and how it is made accessible. Any discussion about software or tool automatically turns into directing that reuse, which is in turn limiting the reuse and so counter-productive.

We also run into that usual issue with government plans, that the authors feel obliged to cover every related subject, which certainly makes the plan seem comprehensive and well-researched, but also loses the focus and makes the plan read like every other.

An open data plan works best if it is written from the point of view of an open data purist - no concern whatsoever for software, tools, outcomes, demand etc - just format (standards) and access.

David Eccles Mon 26 Mar 2018 10:37PM

Well, I was initially thinking of using FOSS as a thematic model of open data, rather than something that should be embedded into action plans, but your comment has made me think further on this matter....

It is common to see data released as Excel spreadsheets, and frequently as formatted sub-tables (e.g. see the non-API-accessible parts of StatsNZ data). These are the data storage formats that people work with everyday, so it makes sense for data to be released in those formats. I don't think it's a good idea to demand people release everything as 0x1D/0x1E-separated UTF-8 text files. Let them release data in its natural format, and software can be created to do the necessary conversions.

It's not possible to talk about open data without talking about the way in which that data can be accessed. If a particular format is used for data release, in order for that data to be properly open, it must be freely accessible, and that means that any programs required to access that data should also be freely accessible.

Jay Daley Mon 26 Mar 2018 10:48PM

Ah I see the difference here - in contrast to your view, I think it is critical to insist that data is released in an open format, which definitely means at an absolute minimum CSV files (so yes that means 0x1D/0x1E-separated UTF-8 text), and normally also an open JSON based API. If data isn't released this way then it just doesn't count as 'open' data. As a long time user of StatsNZ Excel files I can document a number of inappropriate ways in which it was used that made it unnecessarily difficult to use the data.

If data is released in a proper way, which means at the very least 3-star open data (http://5stardata.info/en/) then the accessibility by any software that can read CSV is guaranteed. The danger of including any discussion of the type of software is that this universality will be lost.

David Eccles Mon 26 Mar 2018 11:09PM

I would rather people release data in whatever format they have than not release it at all. If it takes time to convert, and people are not being paid for that conversion, then I don't expect that the conversion will happen.

David Hood Tue 27 Mar 2018 11:39PM

As a consumer/ enabler of other consumers of open data, I tend to agree with David. Getting any data has a higher priority than the format. I would say csv (for tabular arrangements) and json (for hierarchical arrangements) maybe the most common open formats at the moment, but they are both limited in their incorporating of metadata compared to other currently more experimental formats that might be better very long term. So any particular format is ultimately "the best that we can do at the moment". There is also some other stuff swirling around if you want the trail of the data to be referencable, for which stable URIs are best, or as current as possible (which is API based) there is no good single solution here that I know of for the different use cases.

From that, pragmatically, I tend to view as open any format that can be written/read without special licensing requirements, so while I prefer a downloadable csv with a static address that can be referenced, I don't mind getting stuff as excel files (as there are the tools for working programmatically with them), but I have a special hatred for pre summaries information in Tableau dashboards with no download option.

Paul Stone Mon 26 Mar 2018 6:46PM

Good point!

Is there anything missing from the action plan?

David Eccles · Mon 26 Mar 2018 10:10AM

Jay Daley · Mon 26 Mar 2018 9:13PM

David Eccles · Mon 26 Mar 2018 10:37PM

Jay Daley · Mon 26 Mar 2018 10:48PM

David Eccles · Mon 26 Mar 2018 11:09PM

David Hood · Tue 27 Mar 2018 11:39PM

Paul Stone · Mon 26 Mar 2018 6:46PM