Interview | Pronab Sen on Why 2017-18 Jobs Data Are Comparable With Previous Years

"I think the fact that the whole [NSSO] exercise began with a fundamental premise of keeping it comparable, that has been forgotten."

The fierce debate over India’s unemployment figures came to a head last week, when a jobs data report by the National Sample Survey Office (NSSO) was finally made public. This report has been a source of contention ever since two members of the National Statistical Commission (NSC)  resigned allegedly because its release was delayed by the Narendra Modi government.

In an interview with The Wire’s Kabir Agarwal and Anuj Srivas, India’s former chief statistician Pronab Sen talks on the comparability controversy, the changes in the survey’s methodology and the state of unemployment data.

Edited excerpts:

The NSSO’s periodic labour force survey (PLFS) report was finally released recently. Its findings have sparked controversy, over whether unemployment is really at a 45-year high. There have been a host of issues raised, by people within and outside the government, over whether the data is actually comparable to past NSSO surveys, because of changes in underlying methodology…

The problem is that most of the people who have raised these issues, particularly in the government, none of them are specialists. I don’t think any of them even have a nodding acquaintance with statistics.

It is only yesterday [May 31, 2019] that the chief statistician said something. Let me address what he said.

He identified two issues. One issue is that because of changes in stratification, the matrix will be different. And that is absolutely right – because in the matrix, you are talking about all these stratification levels.

But what he did not mention, however, is that this does not affect the national-level and state-level estimates in any manner or form.

The second issue is that of the repeat visits [revisiting samples], which can influence responses. Now, this reflects a lack of understanding of the way the annual survey estimate is conducted. In the annual one, there is no repeat. It’s only that in the quarterly that repeats happen and nobody is saying that is comparable.

How do we reconcile these opposing views then, especially those expressed by former members of the National Statistical Commission, who say the data are comparable

When we decided to do the annual survey, the one fundamental principle was that the annual estimates must be comparable to all previous quinquennial ones. That was a fundamental, guiding principle for the design of the survey.

It’s being said that because you are revisiting a household, that changes the nature of the response. Which is true. But that is true only of the quarterly estimates, not annual estimates. Because annual estimate is always and only based on the first visit, so there is no danger of contamination.

The second point that has been raised is the nature of stratification. You have to understand what stratification does and why it is done. Stratification is the manner in which you divide the data into finer divisions.

You start with the country, the sample has to be nationally representative. Then the first stratification is state and second level may be the district. That allows you to get state-level estimates and district-level estimates.

The next level, or third level, is where the differences started. Which is, that earlier it used to be by expenditure groups – low, middle and high expenditure groups. Because the number of people in high expenditure groups were relatively few – compared to low and middle – there was an oversampling that was done there, otherwise you could not get enough data points.

This time around however, stratification is by education status.

What are the positives and negatives of having changed the method of stratification?

The first [stratification in earlier NSSO surveys] allows you to assess unemployment and employment of what are essentially income groups – the poor, middle class and the rich.

The second will not allow you to do it by income groups with the same accuracy, but by education status.

Pronab Sen. Credit: International Growth Centre website

The basic idea is that it is necessary to distinguish people who are educated up to the school level and post-school level. So you are really trying to hone in on educated unemployment. Earlier, you weren’t getting that again because post-school group was a relatively small group and thus the measurement would not have been robust.

Here again, at the post-school level, because you are over-sampling, you get robustness. But that comes only at that level of disaggregation. Above that, whether you are talking about the district estimate or the state estimate or at the national level, the stratification has no effect.

The whole idea was that in urban India, the education levels were going up rapidly. So you are really trying to focus in on educated unemployment.

We needed to know what would happen to educated people, particularly in urban areas. So the bulk of the stratification happens in urban areas, where education levels are high. In the lower education levels, the kind of activities people do are fairly limited – a very large proportion of them are casual workers, unskilled workers.

But amongst the educated, there will be a wide range of skills that are available, and therefore a wide range of work patterns.

Because the focus was on what was happening to employment as education levels went up, this stratification was done.

So when we speak about annual unemployment rate of 6.1% in 2017-2018, and compare it to unemployment recorded in the past, the changes in the stratification for the most recent round do not affect comparability?

No. That is the national-level estimate. When you talk about estimates at the education level, then that would not be comparable because that level of stratification was not there in earlier surveys.

Some economists have also criticised the population estimates in the PLFS. Particularly the part where it says India’s total population in 2017-2018 was lower than what was recorded during previous unemployment surveys… 

This is correct. Of course it is. But this is true, by the way, of every NSSO sample in the past as well.

The frame that we use for selecting the sample is the Census. Now, between the time the Census is taken and the NSSO survey is done, what happens is that families split up. Since this is a household-based survey, when the households split, you have more households and fewer people per household.

Now, the number of households is fixed, at 1.4 lakh or whatever. So if the number of people per household drops, then the number of persons in your sample, households remaining constant, also drops.

This is a common feature of household surveys. That would be the reason behind the NSSO survey population estimates.

Also read: As Economists Bicker Over Jobs Data, Underemployment Chokes Young Graduates

Do you feel then that there has been no compromise with changes to the sampling and designing, and that the issue is with the government’s presentation of these figures and its defence?

In a sense, I think the fact that the whole [NSSO] exercise began with a fundamental premise of keeping it comparable, that has been forgotten.

We were going to do an annual survey instead of a quinquennial one. When you do an annual survey, the design has to change. You cannot draw a fresh sample every year, because the amount of time it takes become much longer.

So you need to have a panel design. We adopted a rotational panel approach, where a household is tracked for four quarters and replaced thereafter. If we had actually followed the same panel up and taken the average for the year, that would make it completely non-comparable.

But that is not what was done. The annual estimates are computed from the data collected during the first visit only. The other three visits are used only for the quarterly estimates.

What explains the pushback from the government then?

I suppose it comes from the fact that they cannot disown the 6.1% figure. It’s out there now in the public domain.

But the shock of seeing the number go up from 2.2% in 2011-12 to 6.1%…I suppose it has to be papered over.

The controversies over government data have come in the run-up to the merger of the NSSO and CSO into the National Statistical Office (NSO), which has prompted some concern over whether it would reduce the role of the National Statistical Commission.

I think there is a whole bunch of things at play here. But yes, I think there is cause for concern. I am certainly concerned.

The reason is that that the NSO was created in 2005. It’s not a new creation or new body

NSO had two separate distinct divisions – CSO and NSSO. NSSO was an attached office. An attached office essentially means that it is administratively and financially a part of [the ministry]…but it is somewhat independent.

The NSSO for all technical purposes – including what survey should be done, what frequency, what should be the steering committees and working groups – all of that was decided by the National Statistical Commission (NSC). So, it gave NSSO a level of independence, which has now gone.

With the merger, the NSC’s oversight on the NSSO has diminished dramatically.

So if we are talking about independence, that has reduced.

Job seekers attend a job fair organised by the employment department of the Delhi state government in New Delhi, India, January 21, 2019. Credit: Reuters/Anushree Fadnavis

Another public debate has been over the accuracy and usefulness of household surveys and whether other sources of employment data should be given greater importance. The focal point of the Centre’s narrative around employment narrative has been Employees’ Provident Fund Organisation (EPFO) data and jobs created by Mudra loans.

Each data source has its own relevance. But the problem with any of these measures is that they are partial. What they are picking up is a snapshot of what is happening in the sector from where the data is coming from.

Its [EPFO data] not giving you data about whether the person was employed to begin with. It doesn’t tell you who is unemployed. Only the household survey does that. Because then you are considering all types and forms of workers.

Here you are looking at very specific workers. With EPFO data, you know how many people came into EPFO and how many dropped out of EPFO. But to say that a person who came into the EPFO database was not employed before that is wrong, because you don’t know.

Also, the people who leave the EPFO database aren’t necessarily unemployed or retired. They might have moved or started their own business. So that’s capturing only one part of it and that part is how many people are employed in the formal sector at any given point of time.

Now, from the household data too, you would get only a part of that story because the kind of questions that are asked in that survey are somewhat different. It’s in the type of employment. Quite often, the worker doesn’t know if he is formal or not. So there’s an interpretational component to household surveys.

So, for actually giving flesh to the basic household survey data, these [EPFO data] are quite useful.

However, Mudra loans, on the other hand, are not. Because you have absolutely no idea. All you know in Mudra loans is that we have given ‘X’ amount of money to ‘Y’ number of people. You have no idea what they have done with it.

They could have blown it up eating, drinking and having a good time because there is no follow-up system.

Also read: Amidst Incredulous Economic Data, Economists Create Own Benchmarks

We do know NPA-levels in this category, i.e whether the people who take the Mudra loans are paying them back.

Look, if I take a consumption loan, I will pay it back. Doesn’t necessarily mean that I have used that money for productive purposes. So, unless there is a system in Mudra for tracking the use of these loans – that system simply doesn’t exist today – you can make no statement about its employment effect.

Because otherwise you are making assumptions, such as: if I take a Mudra loan, at least one person is employed. Or that he may employ one more and therefore if we multiply the number of Mudra loans by two…this is bizarre.

It’s certainly not statistics. You can’t use it for any rational purpose. Not only is it not statistics, it also assumes that Mudra loans are actually being used for some production. On what basis? Yes, if you had a system which was tracking people who had taken Mudra loans, I would have seen some sense. But, you don’t.

The general idea perhaps is that these figures are positive economic signals, and reflect that certain parts of the economy are doing well…

The point is very simple. These are certainly are pointers towards positive signals. But whether or not they are actually addressing the issue of employment and unemployment, this data does not tell you.

In the US system, people talk of the payroll system. The payroll system works for the very simple reason that they have unemployment assistance. So that when a person goes off the payroll, he has to apply for it. So, you are picking up the unemployment from that.

We don’t have a comparable system here in India.

NITI Aayog chief Amitabh Kant has indicated that new types of jobs, like those created by Ola and Uber, may be harder to capture or are perhaps not being measured properly.

Not the same problem. Household survey data will always capture this. You ask the interviewee what he does for work. He will say ‘I am a driver’.

Now, of course, whether he is a driver for Uber or Ola we will not know because we don’t ask that question. Statistically, we don’t care.

The important question is ‘Is he a taxi driver or not?’. That’s all that’s important and that data is picked up in the household survey.

The PLFS report has some significant findings. One of which is the drop in the labour force participation rate. Especially with respect to women, that drop is significant.

Generally, that’s been a long-term trend. You see actually, survey after survey, the workforce participation rate has been going down. One of the reasons for that is that urban India has always had a lower work force participation rate than rural India. And as we are going through this transition of increasing urbanisation, the overall participation rate will drop.

However, it is worrying that the drop for women in rural areas has been sharpest. That’s unexplained and that’s something the data tells you is happening and time for us to start thinking why it’s happening.

But the drop in the labour force participation rate is a long-term trend. And this survey doesn’t seem, on the aggregate, wildly out of that trend.The number for rural women does seem significant. Urban not so much.