16+
DOI: 10.18413/2518-1092-2016-1-4-21-24

ИССЛЕДОВАНИЕ ЧУВСТВИТЕЛЬНОСТИ НЕКОТОРЫХ МЕР КАЧЕСТВА СКРЫТИЯ ИНФОРМАЦИИ В АУДИОДАННЫХ

Aннотация

В статье представлено сравнение чувствительности некоторых мер различия между исходным сигналом и сигналом, полученным в результате добавления дополнительной информации. Сравнение основано на анализе результата реализации стеганографического метода расширения спектра. В данной статье рассмотрены результаты сравнения некоторых мер различия, основанные на анализе речевых сигналов при их разделении на отрезки равной длины.


К сожалению, текст статьи доступен только на Английском

The development of modern information and telecommunication systems is aimed at ensuring the possibility of providing natural human forms of information exchange. One of these forms, the most commonly used, which is convenient for a person, is speech. Modern information systems allow the storage and transmission of voice messages at a distance. The provision of such opportunity led to the rapid development of technology, to ensure the implementation in the audio records of additional information that will not be perceived by human senses. This can be a label date and time, label, confirming the copyright, etc. The introduction of additional information in such a way that the fact of implementation was discovered, is used in steganography. This aspect describes the basic principle of steganography [2].

In the case of the use as an object, which will be implemented information (container), the speech signal, the result of the implementation, i.e. the stego-container are (the container along with the embedded information), "hearing" should not differ from the original container.

Obviously, the most effective methods of change detection (identifying the degree of change) are the subjective assessment. However, the increasing demand for stego-algorithms and, as a consequence, the increase in processed speech data leads to the need for automating the process assessment of results of introduction of additional information.

This requires the use of objective methods in some numerical form to assess the degree of difference of speech signals before and after the introduction of additional information.

In addition, for methods that evaluate the quality of the attachment has the following requirements:

  • the method must allow expressing the sound quality of a quantitative measure;
  • the method should consider the properties of auditory perception;
  • the method should not need to use experts, but it is necessary that it should provide the best correlation with subjective evaluations.
  • the method should allow to determine the critical level (detection threshold) at which changes caused by the steganographic method of encoding will be noticeable to the ear;
  • the method should not depend on parameters of the analyzed signal (sample rate, bit count, etc.), it should equally respond to changes in the time and frequency domains.

Currently, the most widespread use of received methods of evaluating the differences of the compared signals, is based on the analysis of segments of speech signals in the time domain. Using such estimates of the differences as the mean square error (MSE), relative error, the signal-to-noise (SNR), the correlation coefficient (cor), measure the distance Itakura-Saito (distance maximum likelihood, ISD). Each of these assessments allows us to identify the differences in the signals. However, they have different sensitivity.

In particular, the mean square error (MSE) measures the absolute difference between the energy of segments signals in the time domain [7, 12, 3]:

,                                                                    (1)

where  – the amplitude of the initial data segment,  – the amplitude of the segment of data containing additional information, N – the number of samples of the compared segments of the signals.

This measure allows identifying the differences in the envelopes of the amplitudes of the segments of speech signals. The fewer changes can be made when introduction additional information, the closer the value for this score to zero.

However, this estimate does not take into account the energy of the signal itself, and this means that the choice of this evaluation has a difficulty of choosing a threshold. Therefore it is more likely to use the normalized estimate of the MSE to the norm of the original signal [2]:

.                                                       (2)

The reaction of this assessment is a similar reaction of MSE.

Also, to consider the extent of differences between the original signal and the result of the introduction of additional information it is necessary to make use of the assessment which is sensitive to the time alignment of the compared segments of the signals [7, 12, 3]:

.                                                                 (3)

The higher the SNR rating, the less changes were made. In case of equality of two segments (source and exposed to changes in coding) the evaluation will be equal to infinity (∞).

To assess the degree of similarity of two segments of data, they often use the mutual energy of these signals, determined by the correlation coefficient [7, 3]:

.                                                       (4)

The closer the correlation value to one, the higher the similarity of the segment of data containing the control information and the source.

All the above estimates calculate the extent of the differences used for comparison the values of samples in the time domain. However, along with changes in the time domain it is also necessary to account for differences in the frequency domain. To do this, we use a measure based on the distance Itakura-Saito [7, 12, 3]:

.                                                        (5)

It is known that the energy of the segment of the signal can be expressed as follows [2,11]:

,                                                       (6)

where  – the value of the energy of the frequency components of the segment signal.

Then measure based on the distance Itakura-Saito can be represented as:

,                                                                  (7)

where   – the value of the energy of the frequency components of the initial data segment,  – the value of the energy of the frequency components of a segment of data that contains additional information.

Measure is a sense of distance between spectra of the two signals, and estimates the discrepancy between the energy changed and the source of the segment data. In case of equality of the segments of data, the measure becomes zero.

The comparison of the sensitivity estimates was based on the use of one of the most common steganographic methods [12], taking into account the frequency characteristics of the voice signal – the spread-spectrum method.

The method involves adding to the segment of the original speech a signal pseudorandom sequence (SRP) in accordance with expression [4, 8]:

,                                                                       (8)

where  – the original segment of the data,  – interval corresponding pseudo-random sequence, αm – the weighting factor, em – a code mapping binary bits of the control information determined by the equation:

, ,                                                                      (9)

where em – bits of the control information in the binary system, , M – the amount of secretly encoded control information, em - a code mapping binary bits of the control information, , m – the sequence number of bits of control information.

Weight coefficient αm determines the secrecy of the system. In [10] it is proposed to choose is equal to:

.                                                                                   (10)

It should be noted that the use of non-mutual energy with the data  as a noise signal design  allows to increase the noise immunity steganographically encoded control information em, and the use of projection ratio αm increases the stealth of the control information.

Decoding bits of the control information from data is performed by determining the sign of the scalar product of the segment data and the pseudo-random sequence:

,                                                                               (11)

where sign( ) – the allocation operation of the sign.

Table 1 presents the results of the evaluation of the considered measures of differences for all sounds of Russian speech. For the analysis there was used the segments of speech signals recorded with a sampling frequency of 8 kHz and bit depth 16 bit. To implement spread-spectrum speech signals were divided into segments of the same duration, T=32мс. It is also important to note that the study of these measures were carried out during the implementation of the overlay of noise on the signal in the absence of cross-correlation and using a weight:

.                                                                                 (12)

The parameter Km was varied in the range from 0.0001 to of 0.2000.

From the above data, it is seen that the values of all the measures, except for measures based on distance Itakura-Saito, depend only on the coefficient Km. In turn, the value of a measure based on distance Itakura-Saito depends on the coefficient Km and the type of sound. So for the voiced sounds of Russian speech the addition of broadband noise causes more significant increasing measures, based on distance Itakura-Saito, than when adding the same fragment of the noise to hissing sounds. Thus, the measure based on distance Itakura-Saito takes into account the features of the energy distribution of the Russian speech sounds.

As shown by research, for evaluating speech quality it is necessary to use measures that take into account the distribution of the speech signal in the frequency band.

This is due to the perception of the speech signal by person, regardless of the language of communication. The methods that use psychoacoustics model [1] and the methods of prediction do not always provide ease of playing, because it have many settings [1, 2]. Thus, the use of measures based on distance Itakura-Saito, it is advisable to evaluate measures of the quality of hiding information in speech signals.

Table 1

Evaluation of differences of the original signal and implementation results using steganographic technique spreading (T=32ms)

Type of soundKmSDNSDSNRcorISD
А0,00010,00010,000180,00001,00000,0021
0,00020,00020,000273,97940,99990,0045
0,01000,01000,010040,00000,99500,4529
0,10000,10000,100020,00000,95246,3492
0,20000,20000,200013,97940,909113,3037
Ч0,00010,00010,000180,00001,00000,0002
0,00020,00020,000273,97940,99990,0005
0,01000,01000,010040,00000,99500,0182
0,10000,10000,100020,00000,95240,3009
0,20000,20000,200013,97940,90910,8142
Ш0,00010,00010,000180,00001,00000,0007
0,00020,00020,000273,97940,99990,0014
0,01000,01000,010040,00000,99500,0523
0,10000,10000,100020,00000,95240,6402
0,20000,20000,200013,97940,90911,5429

Список литературы

Список использованной литературы появится позже.