Can women “have it all”? Statistics of 2015 Chinese General Social Survey shows that, with all else equal, married women earn less on average.

Introduction

As the old saying goes, “behind every successful man there is a woman”. The phenomenon that marriage on average can bring a positive premium to man’s wage has long been popular and proven in various countries (Schoeni, 1990;Wang and Li, 2016). Wage and marriage, men seem to “have it all”. Does the same also hold true for women? As most researchers refer to the positive correlation between marriage and wage as “marriage premium” and the negative correlation as “marriage penalty”, we can put it this way, does marriage bring a positive premium, or penalty on professional women’s wage?

There are only few papers considering the relationship between marriage and women’s wage. What’s more, most results are hastily presented as a side product, using “marriage” as a covariate when researching on other factors’ effect on wage, and thus lacking precision and in depth comprehension. Also, the findings vary by case. Some found that married women enjoy a wage premium (presumably because of getting access to resouces through husband and higher focus on work due to the responsibility to feed the family) (Waldfogel,2013; Waite,1995), some found the effect trivial (Xiong and Yuan,2017), and others discovered a negative relationship between marriage and women’s wages (Mincer,1974; Hoon,Keiser and Dykstra). And more often than not, previous work either neglected the importance of addressing confounding bias through a non-parametric way and using the OLS alone, imposing a strong assumption on the correctness of functional form; or they adopted weak instrumental variable that lacks credibility and actually increases the bias. In addition to that, most works are using data from the US or Europe, which is not nearly representitive enough for developing countries such as China where gender inequality is much more severe.

My goal is to use matching as a pre-processing tool to balance the covariates before conducting the OLS. After the overall regression, I plan to block on “Hukou” (household registration), “levels of education”, and “age cohort” to see whether the correlation behaves distinctively between subgroups. I also wanted to use quantile regression to see if the wage difference between married and unmarried women remains the same for women across different wage percentile.

Data

The data comes from the 2015 China General Social Survey Database (CGSS2015). The survey covers 478 villages in 28 provinces or municipalities across the country, and 10,968 valid questionnaires have been collected in total. I consider this data strongly representative because of its large sample size and wide coverage.

According to the needs of the study, the data was processed as follows: Participations born earlier than 1961 and later than 1996 were deleted to ensure that the samples met the standard age for marriage and joining the labor force. And this time, I only kept those who currently have a job and are earning income in the exchange for labor. Besides reducing the wage, marriage can also lead to the unemployment of women but that topic is beyond this study. Considering how the follow-up regression model is very sensitive to outliers, I also removed the outliers in case of bias. After removing the missing values and selected female samples, 1168 total samples were obtained. Table 1 summarizes the data we are interested in.

Table 1:Data Summary
Statistic N Mean St. Dev. Min Max
Log Income per Hour 1,168 2.683 0.766 0.182    5.991
Marriage Status  (Married=1; Widowed/Divorced/Unwed=0)    1,168 0.822 0.383 0 1
Years of Education 1,168 11.702 4.101 0 19
Work Experience 1,168 12.657 8.918 0 40
Squared Work Experience   1,168   239.754   284.611   1   1,600
Age 1,168 37.926 8.714 19 54
Chinese Communist Party Member  (Yes=1;No=0)    1,168 0.098 0.298 0 1
Health Status  (Healthy=1;Not-so-healthy=0)   1,168 0.943 0.231 0 1
Hukou  (Urban=1;Rural=0) 1,168 0.502 0.500 0 1
Labor Contract  (Have=1;Don’t have=0) 1,168 0.492 0.500 0 1
Managerial Position (Yes=1;No=0) 1,168 0.215 0.411 0 1
Work at Public Institution  (Yes=1;No=0) 1,168 0.272 0.445 0 1
Work at State-owned Enterprise  (Yes=1;No=0) 1,168 0.278 0.448 0 1
East Region 1,168 0.481 0.500 0 1
Northeast Region 1,168 0.177 0.382 0 1
Middle Region 1,168 0.276 0.447 0 1

Model

Matching

I chose Mahalanobis Distance Matching as my matching method where \(\Sigma\) is the Variance-Covariance-Matrix. \[MD(X_i,X_j)=\sqrt{(X_i,X_j)^T{\Sigma}^{-1}(X_i,X_j)} \]

By matching treated units to similar non-treated units, matching enables a comparison of outcomes among treated and non-treated units to estimate the effect of the treatment, reducing bias due to confounding in a non-parametric way. Compared to matching on propensity scores (PSM), Mahalanobis matching is more reliable because it directly matches on covarites, approximating a fully blocked randomized experiment; while PSM matches on the calculated propensity score and approximates a less efficient completely randomized experiment which may increase imbalance, inefficiency, model dependence, and bias (King,2016). Furthermore, compared to matching with Euclidean distance, Mahalanobis matching is invariant to rescaling and rotations of covariates.

The results of matching are shown below. As we can see from Figure 1, the imbalance of covariates between treated (married) and control (single) has been largely reduced. And from the naive estimate of ATT, we can see that married women earns about 16.57% less than unmarried women, and the result is significant at 5% level.

## Balance Measures:
##               Type   M.0.Un   M.1.Un Diff.Un  M.0.Adj  M.1.Adj Diff.Adj
## edu        Contin.  12.7356  11.4781 -0.3075  11.7766  11.4781  -0.0730
## exp        Contin.   8.6635  13.5219  0.5576  11.6663  13.5219   0.2129
## exp2       Contin. 152.0000 258.7677  0.3732 213.2350 258.7677   0.1592
## age        Contin.  31.9904  39.2125  0.9244  36.1731  39.2125   0.3890
## party       Binary   0.1010   0.0979 -0.0030   0.1112   0.0979  -0.0133
## health      Binary   0.9327   0.9458  0.0131   0.9265   0.9458   0.0193
## hukou       Binary   0.6106   0.4781 -0.1325   0.5164   0.4781  -0.0383
## contract    Binary   0.5529   0.4792 -0.0737   0.5019   0.4792  -0.0228
## manage      Binary   0.1827   0.2219  0.0392   0.1712   0.2219   0.0507
## government  Binary   0.2356   0.2802  0.0446   0.2244   0.2802   0.0558
## statecom    Binary   0.2692   0.2802  0.0110   0.2766   0.2802   0.0036
## east        Binary   0.5481   0.4667 -0.0814   0.4952   0.4667  -0.0285
## northeast   Binary   0.2356   0.1646 -0.0710   0.1973   0.1646  -0.0327
## middle      Binary   0.2404   0.2833  0.0429   0.2515   0.2833   0.0319
## 
## Sample sizes:
##                      Control Treated
## All                      208     960
## Matched                 1034     960
## Matched (Unweighted)     181     960
## Unmatched                 27       0

## 
## Estimate...  -0.16568 
## AI SE......  0.074887 
## T-stat.....  -2.2123 
## p.val......  0.026943 
## 
## Original number of observations..............  1168 
## Original number of treated obs...............  960 
## Matched number of observations...............  960 
## Matched number of observations  (unweighted).  972

Wage model

Based on the initial Mincer 1958 salary model, I adjusted and extended the variables according to the research purpose and the uniqueness of data by including the impact of marriage, location and working environment related factors with Chinese characteristics, and obtained the following wage decision model: \[loglabinc = \beta_1 marriage + \beta_2 X_i + \beta_3 Y_i + \beta_4 Z_i\]

loglabinc represents the natural logarithm of hourly labor income; marriage represents a binary variable in a marital state; \(X_i\) represents a list of covariates related to an individual’s personal endowment, including years of schooling, work experience, squares of work experience, age, and health level; \(Y_i\) is a list of job-related factors including whether they are Chinese Communist party members, whether their “Hukou” (household registration) is urban (non-agricultural), whether they have a labor contract, whether they are in a management position, whether they work in a public institution, whether they work in a state-owned enterprise; \(Z_i\) is a list of regional factor, considering the differences between eastern region, the northeast and central regions using the western region as a baseline.

Findings

Comparison: Regression results Before and After Matching

As we can see from table 2 and 3, there’s visible difference between the regression before matching and after.

From Table 2 we can see that before matching, although all the covariates are negative, only the first column of regression which uses marriage as a lone regressor shows a weakly significant result; with the increase of control variables, the penalty effect of marriage on women’s wages gradually decreases.

Table 3 shows the results after matching regressions. Column one shows that the coefficient of interest is significantly negative at the 1% level, with an estimated value of -0.130, which means that considering only the marriage factor, the average salary of married women is significantly lower than that of non-married women by 13.0%. The second column includes control variables related to individual endowments such as education, work experience, and health level. The coefficient of the “marriage” variable is still significantly negative at the 1% level, representing a 15.2% marriage penalty on women’s wages. Similarly, column three includes work-related factors such as political status, hukou, labor contract, management position, public institution and state-owned company. The coefficient of the “marriage” variable is still significantly negative at the 1% level, with a penalty effect of 15.9%. The fourth column continues to incorporate geographical factors, and the penalty effect is 14.4%, significant at 1% level.

The difference between two tables indicates that before matching process, the confounding bias overshadows the effect of marriage, which is why many research fail to detect the significant correlation. Also, the fluctuation of coefficient for “marriage” in the matched sample was not significant with the addition of covariates, which also reflected that the Mahalanobis distance matching method had effectively adjusted confounding bias and brought the covariates into balance.

Table 2: Different steps of regression before matching
Dependent variable:loglabinc
(1) (2) (3) (4)
married -0.102* -0.072 -0.064 -0.043
(0.062) (0.062) (0.061) (0.060)
years of education 0.077*** 0.056*** 0.052***
(0.006) (0.007) (0.007)
working experience 0.043*** 0.037*** 0.035***
(0.009) (0.008) (0.008)
squared working experience -0.001*** -0.001*** -0.001***
(0.0003) (0.0003) (0.0003)
age -0.006 -0.007* -0.006*
(0.004) (0.003) (0.003)
healthy 0.150* 0.118 0.107
(0.090) (0.087) (0.084)
CCP member 0.027 0.035
(0.077) (0.074)
hukou 0.074* 0.095**
(0.045) (0.044)
contract 0.163*** 0.095**
(0.044) (0.043)
manage 0.367*** 0.339***
(0.053) (0.052)
public institution -0.102** -0.063
(0.051) (0.050)
state-owned company -0.014 -0.019
(0.050) (0.049)
east 0.339***
(0.043)
northeast 0.013
(0.053)
middle 0.074*
(0.045)
Constant 2.767*** 1.604*** 1.780*** 1.669***
(0.057) (0.178) (0.181) (0.177)
Observations 1,168 1,168 1,168 1,168
R2 0.003 0.225 0.275 0.315
Adjusted R2 0.002 0.221 0.268 0.306
Residual Std. Error 0.765 (df = 1166) 0.676 (df = 1161) 0.656 (df = 1155) 0.638 (df = 1152)
Note: p<0.1; p<0.05; p<0.01
 
Table 3: Different Steps of Regression After Matching
Dependent variable:loglabinc
(1) (2) (3) (4)
married -0.130*** -0.152*** -0.159*** -0.144***
(0.035) (0.033) (0.032) (0.031)
years of education 0.076*** 0.055*** 0.051***
(0.004) (0.007) (0.006)
working experience 0.065*** 0.055*** 0.049***
(0.007) (0.006) (0.006)
squared working experience -0.002*** -0.001*** -0.001***
(0.0002) (0.0002) (0.0002)
age -0.004 -0.004 -0.004
(0.003) (0.003) (0.003)
healthy 0.041 -0.011 0.017
(0.071) (0.069) (0.068)
CCP member 0.019 0.007
(0.063) (0.060)
hukou 0.022 0.054
(0.038) (0.037)
contract 0.124*** 0.040
(0.035) (0.034)
manage 0.530*** 0.497***
(0.048) (0.049)
public institution -0.182*** -0.143***
(0.043) (0.043)
state-owned company 0.055 0.037
(0.042) (0.041)
east 0.369***
(0.036)
northeast -0.038
(0.042)
middle 0.008
(0.037)
Constant 2.799*** 1.629*** 1.851*** 1.733***
(0.026) (0.145) (0.152) (0.149)
Observations 1,944 1,944 1,944 1,944
R2 0.007 0.213 0.293 0.344
Adjusted R2 0.006 0.211 0.289 0.339
Residual Std. Error 0.782 (df = 1942) 0.697 (df = 1937) 0.661 (df = 1931) 0.637 (df = 1928)
Note: p<0.1; p<0.05; p<0.01

 

 

Block on “Hukou (household registration)”

From Figure 2, we can clearly see that after controlling all other variables, the marriage penalty for the subgroup of non-agricultural households is obviously smaller than that of agricultural registrants. If we look closer to the data, we can see that for those who have non-agricultural hukou, married women earn 6.7% less than unmarried women, significant at the 10% level; while for those who have agricultural hukou, the wage difference rises up to 21.4%, significant at the 1% level.

One possible explanation is that the idea of gender equality is better promoted in urban areas, which are also where the one-child policy is more strictly enforced. It’s naturally easier for women who give birth and raise only one child to continue working a same level of job if the spouse helps to share the housework. However, agricultural hukou holders tends to have more than one kid as the original family planning policy stipulated that if the first born is a girl, the agricultural “hukou” family can still have another child. Raising an extra child can greatly increase the time for housework and the need for maternal leave may lower job security. Thus, for those affected, the income after marriage becomes lower.

   

Block on Levels of Education

Interestingly, we can see from Figure 3 that the marriage penalty is only highly significant for those who have minimal education and those who receive higher education, whereas the effect almost completely disappeared for people who have a middle school level of education (9-12 years of education). The difference between married and unmarried women for those who only went to primary school and lower is a drastically negative 43.3%; the wage difference for the post-secondary education group is around negative 15.9%.

One possible explanation is that people who receive higher education have a high opportunity cost, and then potentially the wage difference would be larger if they choose to substitute work for family. As for those who only have minimum education, since originally they didn???t earn much, they may stop working full-time jobs completely and only work on the side after marriage. Thus the wage difference is high. In addition to that, a lower level of education can signal less exposure to the modern idea of gender equality, therefore those in question may be more willing to follow the traditional housewife role where one would ???stay at home, assist the husband and raise the children???. However, for those who have a decent amount of education but didn???t go all the way to invest in higher education, they may be working at medium-pay jobs that are flexible in the beginning.

   

Block on Age Cohort

After blocking on age cohort, we can see that the marriage penalty is insignificant for the youngest group of people, as there is a minor wage premium of about 3.2%. But for other age cohorts, marriage and wage still have a negative correlation. The marriage penalty is around 13.8% for people between ages 25 and 34; 8.6% for people between 35 and 44; and 14.8% for people between 45 and 54.

One possible explanation for the marriage penalty in the 25-34 age group is that most women choose to give birth at the age cohort of 25-35. The penalty effect seems more severe in the 45-54 group, which may be due to the fact that they will need to take care of their grandchildren as well as the elderly in the family.

   

Quantile Regression on Levels of Wage

Marriage may have different influence on people of different income strata, but the Ordinary Least Squares (OLS) linear regression model assumes a constant treatment effect and can ignore this characteristic. I tested this with quantile regression, and set \(\tau=1:99/100\). From Figure 5 we can see that the penalty effect of marriage is almost a constant value fluctuating around -14.4% (the estimate when including in all the covariates).

Figure 5: Quantile Regression Result

Quantile Regression Result

Conclusions and steps ahead

This blog only attempts to more clearly and accurately study whether marriage has a punitive effect on women???s wages, and whether the effect differentiates between hukou, age, education and levels of wage. It does not prove or test the underlying causes and forming mechanisms behind this phenomenon, nor does it assert any causal relationship between marriage and women???s wage. However, this study can serve as a useful material that facilitates future research attempts. There are many possible explanations for the negative correlation between marriage and wage; it could be due to the fact that married women spend more time on house work, which crowds out the effective working time; it could be because pregnancy and maternal leave can cause interruption of work and thus companies are less likely to promote or hire married women for a higher wage; or it could even be because women with higher wages tend to value freedom and affection more. The identification of the causal relationship between marriage and women???s wages is a complicated task and the regression based on cross-sectional observable data may be insufficient. Popular as it may be, the instrumental variable method can also be troublesome as it???s based on several strong assumptions and a representative IV is hard to construct. Since most of the large-scale surveys do not obtain continuous observations on the same individual across the years, it is not possible to adopt an ideal counterfactual method for identifying causality. How to solve this dilemma is still worth exploring. Ideally if there are good follow-up survey databases, we can study the subgroup of women who actually get married during the on-going survey rounds and test whether there???s a decrease in wage after getting married. As for the heterogeneity between different hukou (household registration status), age cohorts, levels of education and wage, I only measured it and tentatively gave my own speculative explanations. For now, I do not have better data to support my hypotheses. In the future, more in-depth research should be conducted on the underlying mechanism of how and why married women earn less.