Thu 1 Mar 2018 1:08AM

What challenges might we face in opening up government data?

T Tom (facilitator) Public Seen by 321

What barriers or issues might we encounter or need to overcome to increase access to government held data and build an open by design culture?


David Eccles Mon 26 Mar 2018 10:29AM

1) Privacy. I know that Stats NZ is familiar with the process and reasoning of data anonymisation; it's important to realise that every additional shred of information makes it easier to unmask people's private lives. Care needs to be taken to make sure that no individual-level data is released (even if "anonymised"), because linking is embarrassingly easy when everything is out in the open.

2) Change. People don't like change, and will find whatever thin thread they can hold on to in order to argue for the status quo. There will be big concerns, and those should preferably be tacked first. But unless there is a financial incentive to change, it's only going to be when the last tiny concern is snuffed out that the majority of people will start to change. There will be a few proponents of disruptive change who are willing to suffer through the "long drawn-out phase of concern, and it would be a good idea to support and signal-boost those people if possible.

I've experienced this second point both with free software and with nanopore sequencing. Change needs to be gradual, if possible (e.g. replacing programs on Windows systems one-by-one with free software). If not possible (as is the case for nanopore sequencing, and likely the case for most open data), it's important to listen to the concerns of the users (i.e. both those that own the data, and those that could benefit from it), because those concerns are the threads that need to be cut before change will happen.


Paul Stone Mon 26 Mar 2018 6:45PM

Thanks David, yes privacy continues to be a challenge to manage carefully, and seems to be getting harder, not easier!


David Hood Wed 28 Mar 2018 12:11AM

I think a barrier to a open by design culture is in the life cycle of the data.

As a specific example, there is a lot of hatred for data made available in excel spreadsheets out there, but to me a lot of it comes back to the point of processing such spreadsheets represent. The life cycle in most cases is something along the lines of: raw data is gather in an untidy, poorly structured way; data is collated into a structured disaggregated form, data is summarised in the preparation of reports. A lot of the stuff available in Excel sheets is the aggregated/ summarised data making up tables and graphs in word documents, but presented in Excel form. And this was the traditional way of presenting data products- the people writing the reports are encapsulating their understanding and expertise of the area in the manner in which the summaries are made.

But, in an open data model where people are wanting to take the data and use in novel ways for new understanding, they don't want the summarised version, they want the data as disaggregated as possible so the uses are flexible. This can be viewed as negating the expertise of those collating the data.

I think the answers around this are around developing a culture of stewardship and illumination of the data, rather than being the sole producers of product from the data.