Open Data Programme

Tue 6 Mar 2018 11:02PM

Is public demand for data the right approach to prioritisation?

Paul Stone Public Seen by 333

Should we push for all appropriate data to be released or target what appears to be in high demand first? Is there another approach? Should there be a balance of factors influencing prioritisation of effort to release data?...

Paul Stone Mon 19 Mar 2018 1:09AM

Open Data Charter is suggesting a "publish with purpose approach" with a focus on releasing open data to solve specific policy problems.

Jay Daley Tue 20 Mar 2018 7:51PM

My view is that the situation is far more complicated than this and there are significant concerns with an approach the solely follows public demand. For example, if we were to follow public demand solely then we would have a great deal of house price data released as a priority and data about the homeless would be at the back of the queue.

The ideal methodology is to assess each dataset for a) impact if it were released; and b) difficulty of release, and the two balanced against each other to determine a prioritisation. The assessment of impact should include 1) public demand as an important consideration; 2) experience from other jurisdictions of the impact of release (e.g. the impact of open contracting data has been far more than most predicted); 3) National strategies such as our current government digital inclusion strategy or a top-level anti-corruption strategy that many countries have.

The prioritisation could then be as crude as a four box model where the first priority is "high impact/low difficulty" and the last priority is "low impact/high difficulty" or a more nuanced approach could be taken.

Finally just to note that in my experience implementing the "determination of public demand" through a one-way channel such as a survey is not very effective. This is because there are so many variables in any individual request that need to be teased out and understood. For example, it is common to get a request for a dataset that contains confidential information and it is only through a conversation with the requestor that it is possible to understand if they would accept redaction or aggregation or if they really do expect the confidential data (in which case it does not need to priority as the request is not reasonable). Also, I regularly come across people who ask for one dataset when the really mean an another. A good example is raw data versus summary reports. People often ask for the raw data but when it's explained that that is 50 million rows and a sum formula is Excel is not going to cut it, they modify their request to something like monthly summaries.

Paul Stone Tue 20 Mar 2018 9:49PM

Valid points Jay. I really like the idea of the impact/difficulty quadrant to aid prioritisation decisions - although won't that lead to something everyone agrees to be really important being put at a low priority just because its "too hard"?

I wonder if we need a strategy of too parallel streams of work - the "short game" and the "long game". The short game would use your quadrant to determine quick wins leading to impact (instant gratification).

The long game is two-fold:

1) to work towards "open by default" - or as I prefer to think about it, "open by design" - building good data management & governance, standardisation data; competence in the release and reuse of data etc. that will lead to systematic, quality open data.

2) work on the "too hard basket" - work releasing data that will have a high impact but has complex barriers that will take significant time to address and overcome.

This is more or less our current approach through this action plan, but the question is how should we balance our resources between these two approaches?...

Jay Daley Wed 21 Mar 2018 12:56AM

I find it useful to start by identifying the high impact/high difficulty as soon as possible and get the wheels in motion on those expecting a long haul. Quite often that means educating people about those difficulties and the goal of open data so that they can begin to invent solutions.

With that underway then it's easier to concentrate on the quick wins of high impact/low difficulty and keep prodding the high/high as needed.

I agree very much about the long game as you've outlined it but I would suggest that you also start the long game pretty much straight away. I find that the following steps, pretty much in order of priority, help towards that long game
- Top down organisational buy-in. My ideal situation is where the CEO adopts open data as a personal goal (or is given it by their board) and make a strong statement about their personal commitment.
- A discrete, properly resourced, dedicated project team.
- A highly vocal and informed community of demand.
- A formal open data policy statement or plan, backed up by significant educational materials, all of which is targeted at the specific organisation and the issues it faces
- A data governance board or equivalent for the organisation.

David Eccles Mon 26 Mar 2018 9:50AM

I think the most benefit would be gained by asking the media what data they want, or at least puting emphasis on opening up areas that are commonly targeted by media (or OIA requests). The media frequently controls the slices of governments that people see, and there will be an enhanced interest in things that have been recently seen in newspapers, television, etc..

I know that ACC at least has (or had) staff whose job it is to look through the media to identify areas of public concern; perhaps Stats could employ a similar "data communication advocate" (if they don't already have one) to search out data of interest that would be easy to release.

David Hood Wed 28 Mar 2018 12:00AM

It seems to me making frequently OIA requested data open would be a nett saving on resources and effort for organisations, but personal experience at trying to get data from some places leaves me feeling there may be a different view inside an organisation.

I think the impact/ difficulty quartiles may need a bit of teasing out about the who it is assessing the impactfulness using what criteria- presumably impact relates to usefulness, which depends on being used, so is a retrospective measure unless a model of estimated impact is being applied.

My concern with such models is the serendipity of open data- that have the data there enables perspectives that otherwise would not be able to produce evidence so dogma remains unchallenged.

To give a current example of such serendipity, at the end of last week aggregate data on prescriptions was made available from Pharmac. I mashed it up with other web scraped and census based information, and found a dramatic difference in the rate antibiotics a prescribed around the county. So since the weekend I am no involved with a group looking deeper into that. Antibiotic resistance is (I would argue) impactful, however the impact arises out of a serendipitous combination of sources.

My concern that any assessment of data source in isolation has problems foreseeing such use. To me, public demand indicates the information is likely to be used, and the more it is used the more potential for knowledge discovery.

George Wills Tue 17 Apr 2018 1:16AM

Good discussion so far. To build on some of what Jay and the David's have said...

Prioritising based on public demand is one important method which I think should be held on to. As David Hood said, there are some obvious economic benefits for releasing data which is frequently requested.

Also of importance in my opinion is building a culture of requesting data so that assumptions, analysis and decisions can be challenged.

My hypothesis is “that by making an additional effort to open up data that has been requested, the behaviour of requesting data is rewarded.”

I do also believe that there needs to be effort made to prioritise based on a value system such as Jay has suggested which is not connected directly to public demand. While public engagement/demand in Open Data is low we can’t expect it to be representative of our societal priorities. Perhaps this will be less of problem as demand and requests grow.

Valuing individual datasets between each other may pose some practical challenges. However, I like the idea of valuing them thematically. Perhaps prioritising policy areas or problem domains could be easier than prioritising the datasets which fall under each theme.

In my opinion one of the best questions the data request form on data.govt.nz asks is “Opening this data would solve the problem of …” That starts to give us clues on the themes or problem area’s people are interested in.

Is public demand for data the right approach to prioritisation?

Paul Stone · Mon 19 Mar 2018 1:09AM

Jay Daley · Tue 20 Mar 2018 7:51PM

Paul Stone · Tue 20 Mar 2018 9:49PM

Jay Daley · Wed 21 Mar 2018 12:56AM

David Eccles · Mon 26 Mar 2018 9:50AM

David Hood · Wed 28 Mar 2018 12:00AM

George Wills · Tue 17 Apr 2018 1:16AM