# Relationship between tobacco use, alcohol consumption and non-communicable diseases among women in India: data from the National Family Health Survey-2015-16 | BMC Public Health

### Data

The data comes from the National Family Health Survey (NFHS-4), the fourth in the NFHS series conducted in 2015-2016. [19]. It provides information on the population, health and nutrition of the people of India and every state and union territory in the country. The four rounds of NFHS surveys were conducted under the leadership of the Ministry of Health and Family Welfare (MoHFW), Government of India. The MoHFW has designated the International Institute of Population Sciences (IIPS) Mumbai as the nodal agency for carrying out the surveys. Decisions on the overall sample size required for the NFHS-4 were guided by several considerations, including the need to produce indicators at the district, state/union territory, and national, as well as separate estimates for urban and rural areas in 157 countries. districts where 30 to 70% of the population lives in urban areas according to the 2011 census, with a reasonable level of precision. The NFHS-4 sample is a two-stage stratified sample [19]. The 2011 census served as the sampling frame for the selection of primary sampling units (PSUs). The PSUs were villages in rural areas and census enumeration blocks (CEBs) in urban areas. PSUs with less than 40 households were linked to the nearest PSU. Within each rural stratum, villages were selected from the sampling frame with probability proportional to size [19]. Within each stratum, six approximately equal sub-strata were created by crossing three sub-strata, each created based on the estimated number of households in each village, with two sub-strata, each created based on the percentage of the population belonging to the Scheduled Castes and Scheduled Tribes. Four survey questionnaires (household questionnaire, female questionnaire, male questionnaire, and biomarker questionnaire) were probed in 17 local languages ​​using computer-assisted personal interviewing (CAPI). In surveyed households, 723,875 eligible women aged 15-49 were identified for individual interviews with women [19]. Interviews were conducted with 699,686 women, with a response rate of 97%. In total, there were 122,051 eligible men between the ages of 15 and 54 in the households selected for the status module. Interviews were conducted with 112,122 men, with a response rate of 92% [19]. The effective sample size for the present study was 699,686 women aged 15-49 in India.

### Description of variables

#### Result variable

The outcome variable was “presence of NTM” which was recoded as no and yes. The diseases taken into account to measure NCDs were hypertension, diabetes, asthma, heart disease and cancer. Blood pressure was measured in women aged 15-49 using an Omron blood pressure monitor to determine the prevalence of hypertension [19]. Blood pressure measurements for each respondent were taken three times with an interval of 5 min between readings [19]. Hypertension is defined as when an individual had a mean systolic blood pressure greater than or equal to 140 mmHg and/or diastolic blood pressure greater than or equal to 90 mmHg [20]. If the random blood sugar exceeds 140 mg/dl, the person is said to be diabetic. The FreeStyle Optium H Glucose Meter with Glucose Test Strips has been used to randomly measure blood sugar levels in women aged 15-49 using a fingertip blood sample [19]. Additionally, asthma, heart disease, and cancer were self-reported [21]. If the respondent suffered from any of the above conditions, they were considered to have NCDs.

#### Explanatory variable

The explanatory variables were selected on the basis of an in-depth literature review. The variables were divided into three sections which are behavioral, individual and household characteristics.

### Behavioral characteristics

1. I.

cigarettes, bidis, cigars, hookah, gutkha/paan masala, paan and khaini are commonly consumed tobacco products in India. The “smoking tobacco” variable was generated using questions a. Do you currently smoke cigarettes? b. Do you currently smoke bidis? C. Do you currently smoke cigar? summer. Do you currently smoke hookah? All responses were recoded as no and yes. And if the respondents smoked any of these products, they were coded as yes and otherwise no.

2. ii.

The variable “using smokeless tobacco” was generated using questions a. Do you currently chew tobacco? b. Do you currently use gutkha/paan masala with tobacco? vs. Do you currently use paan with tobacco? summer. Do you currently use khaini? All responses were recoded as no and yes. And if the women questioned consumed one of these products, they were coded by yes and otherwise by no.

3. iii.

Women who use alcohol were coded as no and yes. The variable was generated using the question “Do you currently drink alcohol?” the response was coded no and yes.

### Individual characteristics

Age was grouped into 15–24 years, 25–34 years and 35–49 years. Educational status was categorized as uneducated, primary, secondary, and tertiary. Work status was coded as no and yes. The work status variable was posed in the status module and therefore cannot be used for multivariate analysis. Marital status was coded as never married, currently married, and other. Others included people who were divorced, separated or widowed. Media exposure was coded as unexposed and exposed. The variable was generated from the question of whether women watched television, read newspapers or listened to the radio. If the answer was affirmative to any of these questions, it was coded as yes otherwise no. Body mass index (BMI) was recoded into underweight (less than 18.5), normal (18.5 to 24.9), overweight (25 to 29.9) and obese (30 and over ) [22].

### Household characteristics

The wealth status variable was generated from information provided by the 2015-2016 NFHS survey. Households were given scores based on the number and types of consumer goods they own, ranging from a television to a car or bicycle, and dwelling features such as toilets, water source drinking water and flooring materials. These scores are derived using principal component analysis (PCA). National wealth quintiles are compiled by assigning the household score to each usual (de jure) member of the household, ranking each person in the household population by their score, and then dividing the distribution into five equal categories, each with 20 % Population [23]. Wealth status was coded as poorest, poorest, middle, richest, and wealthiest.

Religion was coded as Hindu, Muslim, Christian and others. Others included Buddhists, Sikhs, Jains, etc. Caste was coded as Scheduled Tribe, Scheduled Caste, Other Backward Class and Others [23]. Others include those who have been identified as having higher social status [24, 25]. Place of residence was coded as urban and rural. Regions of India have been coded as North, Central, East, Northeast, West and South [19].

### statistical analyzes

All analyzes were performed using STATA 14. Descriptive statistics as well as bivariate analysis were performed at baseline. The chi-square test was used to find the level of significance of NCD prevalence estimates by contextual variables. In addition, a multivariate logistic regression analysis [26] was used to estimate the extent of the association between NCDs and behavioral factors as well as other individual and family factors.

The binary logistic regression model is usually presented in a more compact form as follows:

$$mathrm{Logit} left[mathrm{P}left(mathrm{Y}=1right)right]={beta}_0+beta ast X$$

The parameter β0 estimates the log odds of the NTMs for the reference group, while β estimates the maximum likelihood, the logarithmic differential probability of NTMs associated with a set of predictors X, relative to the reference group. The variance inflation factor (VIF) was estimated to check for multicollinearity between the variables used in the study [27]. the svyset The command in STATA 14 was used to control the analysis for complex survey design. Additionally, this command also incorporated the weights that make the estimates representative.

Models 2, 3, and 4 reveal the combined effects of smoking and smokeless tobacco use, smoking and alcohol use, and smokeless tobacco use and alcohol use. An “interaction variable” is a variable constructed from an original set of variables to represent either all of the present interaction, or part of it. In exploratory statistical analyses, it is common to use the products of the original variables as the basis for testing whether an interaction is present with the possibility of substituting other more realistic interaction variables at a later stage. When there are more than two explanatory variables, multiple interaction variables are constructed, with pairwise products representing pairwise interactions and higher-order products representing higher-order interactions. [28,29,30].

So, for an answer Yes and two variables X1 and X2, a additive model would be:

$$mathrm{Y}=alpha +{beta}_1{mathrm{x}}_1+{beta}_2{mathrm{x}}_2+{varepsilon}_0$$

In contrast to this,

$$mathrm{Y}=alpha +{beta}_1{mathrm{x}}_1+{beta}_2{mathrm{x}}_2+left({beta}_3{mathrm{x }}_{mathrm{s}}ast {mathrm{x}}_{mathrm{a}}right) {varepsilon}_0$$

Where, Y is the dependent variable (various DTMs) and α is the intercept, x1 is the individual-level independent variable, x2 is the individual-level independent variable, xa consumes alcohol, xs is a smoker, (β3 Xs *Xa) is the interaction of alcohol and smoking and ε0 is an error. Often models are presented without the interaction term d(x1* X2), but this confuses the main effect with the interaction effect (i.e. without specifying the interaction term, it is possible that any main effect found is in fact due to an interaction) [31].

In addition, population attributable risk (PAR) was calculated to verify the extent of NCD risk in women who were exposed to negative behavioral factors, i.e. smoking tobacco, consuming smokeless tobacco and alcohol. [32]. The “regpar” command in STATA was used to calculate the PAR. Regpar generates confidence intervals for population attributable risks and scenario proportions [33]. After an estimate command that interprets the projected values ​​as conditional proportions, such as a logit, logistic, probit, or generalized linear model, regpar can be used [33]. It calculates two scenario proportions: a baseline (“Scenario 0”) and a fantasy (“Scenario 1”), in which one or more exposure variables are assumed to be set to specific values ​​(usually zero) and all the other predictor variables in the model remain unchanged. It also calculates the difference between the proportions in scenario 0 and scenario 1. This difference is called population attributable risk (PAR) and shows the risk associated with living in scenario 0 rather than scenario 1. [33].