Determination and classification of pollutants in waste water

This paper deals with waste waters produced by industrial producers during recent three years. Its main purpose is to evaluate the data monitored from discharges of three leather plants where eight traditional variables (COD, BOD, insoluble matters, pH, and the content of ammonia, total nitrogen, chromium and sulphides) were regularly analyzed and quantified. Mutual relations of these variables in waste waters were discovered using statistical techniques, mainly multivariate data analysis, and some general conclusions were found regarding the trends of pollution with respect to its source as well as the year season.


Introduction
Pollution of the rivers and seas jeopardizes production of oxygen in water.A severe pollution source is the water waste pollution.Waste water is coming into environment as a consequence of the industrial and agricultural production and the town activities.It has an unhealthy influence on human and environment.Contaminations present in waste water contain various inorganic and organic matters.The most important variables, which characterize the water quality and indicate the extent of pollution are the concentrations of chromium (coded Cr in further text), total nitrogen (coded Ntot), ammonium (NH4), and sulphide (S2).Together with chemical oxygen demand, COD, (coded CHSK according to the Slovak designation), biological oxygen demand, BOD5, (coded BSK5 according to the Slovak designation), water acidity (pH), and amount of insoluble matters (IM) they were regularly monitored at the output from three industrial sites in the same Slovak town during the years 2006, 2007 and (partly) 2008, which has created the measurement basis for the performed research.
Due to their potential environmental danger, discharges from the leather plants have to be regularly monitored.The most dangerous environmental factor is here the content of chromium, which is very toxic in the oxidation state +VI, therefore most part of Cr(VI) is trapped in the technological process.Also further above mentioned factors must be regularly monitored not only by the plant laboratory but also by an independent accredited analytical laboratory.When the limiting values are exceeded, the level of the respective factor must be immediately properly adjusted.
Characterization and classification of various kinds of water samples from the environmental and metrological aspects was effectively performed using various chemometrical and statistical methods (VONČINA et al., 2007;ŠNUDERL et al., 2007;KANNEL et al., 2007;KRAIC et al., 2008).Therefore in this work several multidimensional (multivariate) techniques of data analysis were used.

Sampling
The waste water samples originated from three relatively similar industrial production sites.The sampling was done once a week by the analysts from a nearby accredited laboratory where the required measurements were made.The following 8 variables were monitored and analyzed: CHSK, BSK, pH, IM, Ntot, NH4, Cr and S2 (their codes are explained above).The sampling procedures and laboratory analyses were described in detail in our previous papers (KIRÁLYOVÁ et al., 2008a;2008b).To prevent changes in waste water samples, the analyses started two hours after sampling, which was faciliated by closeness of the analytical laboratory to the monitoring sites.

Analysis
COD (CHSK) was measured by the spectrophotometrical dichromate method; BOD5 (BSK5) was determined after treatment of analyzed water with manganese dioxide by iodometric thiosulphate determination of the present Mn(IV).The pH value was measured potentiometrically using a glass electrode.Insoluble matters (IM), separated on an appropriate filter and filtered off, were dried at 105 °C until the constant weight and determined gravimetrically.Ammonia was distilled off from the analyzed water and determined spectrophotometrically using the Nessler reagent.For the total nitrogen determination, inorganic and organic nitrogen was oxidized by persulphate and mineralized in sulphuric acid.The formed nitrate reacted with 2,6dimethylphenol (in sulphuric and phosphoric acid medium) and the resulted 4-nitro-2,6-dimethylphenol was determined spectrophotometrically.Chromium was selectively determined by inductively coupled plasma optical emission spectrometry.Sulphide anion was determined by back iodometric titration using thiosulphate standard solution.

Multidimensional data analysis
Statistical calculations were performed using the following techniques: correlation analysis, principal component analysis (PCA), cluster analysis (CA) and linear discriminant analysis (LDA).In the performed calculations three contemporary software commercial packages were used: STAGRAPHICS Plus 5.1, SPSS 15.0 and SAS JMP 7.0.

Correlation analysis
The output of correlation analysis is the correlation table, which contains the pair (or Pearson) correlation coefficients expressing the strength of correlation between all possible pairs of variables.Tab. 1 shows the calculated correlation table.The entries of this table are symmetrical according to diagonal.The following conclusions can be derived from the correlation table: (a) The highest correlation is between CHSK and BSK5.(b) Very high correlations exist between CHSK and Ntot, BSK5 and Ntot, as well as CHSK and IM.(c) Considerably significant correlations (r crit ≥ 0.258 at p ≤ 0.01) exist between IM and Ntot, BSK5 and IM, IM and pH, as well as NH4 and S2.(d) Significant correlations (at the 95 % or higher probability level, p ≤ 0.05) exist between BSK5 and pH, as well as IM and NH4 (where both dependences are inverse).All correlation coefficients in Table 1, which are larger or equal than r crit = 0.184 are marked by bold faces.(e) No significant correlation was proved in all other pairs.

Principal component analysis
In principal component analysis, PCA, some natural grouping of the objects (the waste water samples -in this work) and the studied variables might be seen.The principal components, PCs, are calculated as the linear combinations of original variables (SHARMA 1996;KHATTREE and NAIK 2000).According to the computed eigenvalues only three principal components (PCs) were found important; their value was larger than 1, which is usually considered as the criterion.Three kinds of graphical outputs are used in the PCA, namely the scatterplot showing the objects, the loadings plot showing the variables, and the biplot where all objects and variables are depicted together.The advantage of first two outputs is the possibility to obtain not only the 2D graph but also the 3D one, where usually the first two and three most important PCs are used as the axes, respectively.On the other hand, the biplot, even though plotted in two dimensions, provides more complex information about the studied problem.The opposite position of Cr with respect to pH testifies that they are antagonists; a high level of chromium is reached at lower pH, and vice versa.The achieved PCA results are in accord with the described outputs of cluster analysis introduced in further text.When looking at the sample positions in Fig. 1, it can be concluded that high sulphide and Cr concentration values are characteristic for the waste coming from the second plant.On the other side, high pH and NH4 values are characteristic mainly for the third and partly for the first plant.Since the samples belonging to the plant 2 are localized at low (mostly negative) values of CHSK, BSK and Ntot, it can be concluded that samples from plant 2 exhibit generally low values of these variables.The occurrence of negative variable values in the PCA is caused by the performed variable standardization, in which the corresponding mean is subtracted from the original variable values and the result is divided by the corresponding standard deviation (e.g. the zero value is achieved for the original mean value of the variable).

Cluster analysis
Generally, the clustering process in cluster analysis may be performed either with objects or variables (KHATTREE and NAIK 2000).Clustering in this work was made concerns the studied eight variables.The result of the performed cluster analysis is a dendrogram depicted in Fig. 2. The basis for the performed calculations were data on 81 objects representing 27 average month data measured at 3 sampling sites (discharges of three leather production plants) for each of eight investigated variables.In these calculations Ward's method of clustering and squared Euclidean distance were used.
In Fig. 2 three clusters of closely related variables can be seen.The first cluster is formed by CHSK, BSK, connected further to Ntot and IM.The second cluster is formed by pH and NH4, and the third cluster contains Cr and S2.The variables forming the same cluster are most similar; the measure of their mutual similarity is given by Distance, which represents the vertical axis of the dendrogram.The results of cluster analysis are in a good agreement with the outputs of correlation analysis and principal component analysis.

Linear discriminant analysis
Linear discriminant analysis, LDA, is a supervised learning method, in which the classification model is calculated using the data belonging to the training set where the categorization of all samples is known before the calculation starts.In the first step (training), the classification model is calculated, by which the studied objects (waste water samples in this case) are re-categorized so that the object (sample) category is either confirmed or the object is assigned to another category (SHARMA, 1996;VANDEGINSTE et al., 1998;KHATTREE and NAIK, 2000).The success in classification is calculated by the ratio of the correctly classified objects to their total number.Then, in the second step, the developed model is used for classification of the objects, which are not included into the training set.They belong either to the test data set, which is utilized for validation of the discriminant model, or the category of the investigated data is completely unknown so that the main goal of LDA is prediction of the object category.In this work, the samples were separated into three categories, assigned by the plant releasing waste water.The classification results are summarized in Table 2.The LDA results testify that the waste water from plants 1 and 3 have a similar level of pollutants therefore the categorization of the samples from these two sources is often interchanged, e.g. 10 samples belonging to plant 3 were classified into group 1 (plant 1), as shown in the first line of the table.On the other hand, 100 % of the samples from plant 2 were classified correctly when the training set is considered and a 96.3 % success was obtained when performing leave-one-out validation.According to the position of the samples belonging to three plant discharges with respect to the first discriminant function expressing the overall pollutant concentration in the LDA diagram (not shown here) it was proved that the worst pollutant is plant 1, the best cleaning procedures were applied in plant 2.
An attempt was also made to see the seasonal effect on the waste water pollution in the LDA output.Therefore all data were divided into four categories according to the year seasons: Spring (March, April, May), Summer (June, July, August), Autumn (September, October, November) and Winter (December, January, February).The nominal categorical variable Season was created with 4 classes Spring, Summer, Autumn and Winter and used as the discrimination criterion.The separation of the data by four seasons was not well demonstrated.However, the Winter and Autumn data were displayed at higher values of the first discriminant function, DF1, which reflects the extent of pollution since DF1 may be explained as the total concentration of the polluting substances.Much more successful was seasonal categorization of the data into two parts: the warmer part of the year (Spring and Summer) and the colder part (Autumn and Winter).In this case the success of discrimination was 76.5 % for the training set and 66.7 % for the leave-one-out validation.In both cases the discrimination results are unambiguously significant.At the same time it means that in cold months the total level of pollution of the waste water from all three investigated plants is larger than in the year season when the temperature is higher.

Conclusions
The waste water characteristic parameters, namely COD (CHSK), BOD (BSK5), insoluble matters, pH, ammonium, total nitrogen, chromium, and sulphides were monitored at the discharge of three industrial plants of the same town during almost three years.The obtained analytical values were statistically evaluated and mutual relations and the trends of the individual waste water descriptors were discovered.In total, the most important source of pollution was the industrial plant number 1.With respect to the year seasons, more polluted waste waters are expected in cold months.

Fig. 1
Fig.1exhibits the biplot, which simultaneously represents the samples, depicted here by the numbers, and eight originally utilized chemical descriptors, depicted by the rays starting from the origin and ending at the point determining the variable position.The samples are here categorized according to the sampling site located at the discharge of the plants 1, 2 and 3, respectively.From the position of variables in the plane PC2 -PC1 the following outcomes can be deduced: (a) Variables CHSK (COD), BSK5 (BOD5) and Ntot provide similar information about the sampled waste water.This is well understandable especially for the first two variables since they are highly correlated; the possibility to convert mutually their values has already been discovered.(b) Similarity (interdependence) of NH4 and pH proves the role of ammonia in changing the pH value of waste water.(c) The opposite position of Cr with respect to pH testifies that they are antagonists; a high level of chromium is reached at lower pH, and vice versa.The achieved PCA results are in accord with the described outputs of cluster analysis introduced in further text.

Table 1 .
Pearson correlation coefficients exhibiting the strength of correlation between individual pairs of variables Note: 81 studied objects, i.e. 27 averaged month values were measured at the discharge of three leather plants.Critical values of the correlation coefficient (absolute values): r crit = 0.144 (p = 0.1), r crit = 0.184 (p = 0.05), and r crit = 0.258 (p = 0.01) for n = 81.Significant correlations are marked bold.

Table 2 .
Evaluation of linear discriminant analysis classification of the waste water samples divided into categories given by the sampling location (plants 1, 2 and 3).In leave-one-out cross validation, each object was classified by the model derived from all objects other than that one.
a b In total

82.7 % of
original grouped cases were correctly classified.In total 75.3% of cross-validated grouped cases were correctly classified. c