<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN" "http://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd">
<article article-type="research-article" dtd-version="1.2" xml:lang="ru" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><front><journal-meta><journal-id journal-id-type="issn">2518-1092</journal-id><journal-title-group><journal-title>Research result. Information technologies</journal-title></journal-title-group><issn pub-type="epub">2518-1092</issn></journal-meta><article-meta><article-id pub-id-type="doi">10.18413/2518-1092-2025-10-4-0-6</article-id><article-id pub-id-type="publisher-id">4016</article-id><article-categories><subj-group subj-group-type="heading"><subject>ARTIFICIAL INTELLIGENCE AND DECISION MAKING</subject></subj-group></article-categories><title-group><article-title>&lt;strong&gt;QUANTIZATION METHOD FOR DETECTION NEURAL NETWORKS ON EMBEDDED SYSTEMS&lt;/strong&gt;</article-title><trans-title-group xml:lang="en"><trans-title>&lt;strong&gt;QUANTIZATION METHOD FOR DETECTION NEURAL NETWORKS ON EMBEDDED SYSTEMS&lt;/strong&gt;</trans-title></trans-title-group></title-group><contrib-group><contrib contrib-type="author"><name-alternatives><name xml:lang="ru"><surname>Khrupin</surname><given-names>Danila Stanislavovich</given-names></name><name xml:lang="en"><surname>Khrupin</surname><given-names>Danila Stanislavovich</given-names></name></name-alternatives><email>Khrupin24@mail.ru</email></contrib><contrib contrib-type="author"><name-alternatives><name xml:lang="ru"><surname>Shaptsev</surname><given-names>Valeriy Alekseevich</given-names></name><name xml:lang="en"><surname>Shaptsev</surname><given-names>Valeriy Alekseevich</given-names></name></name-alternatives><email>vashaptsev@ya.ru</email></contrib></contrib-group><pub-date pub-type="epub"><year>2025</year></pub-date><volume>10</volume><issue>4</issue><fpage>0</fpage><lpage>0</lpage><self-uri content-type="pdf" xlink:href="/media/information/2025/4/ИТ_НР_10_4_6.pdf" /><abstract xml:lang="ru"><p>Model quantization is a key method for deploying high-performance neural network object detectors on resource-constrained devices. However, standard quantization approaches, such as PTQ, QAT, and even mixed-precision methods, optimize the distribution of bits based on the sensitivity of layers, ignoring the semantic specificity of the task. This leads to a significant decrease in accuracy when distinguishing between semantically similar classes, which is critical for many practical applications. The article proposes a new approach to mixed-precision quantization that takes into account the semantics of the task. A metric of semantic significance of network components that make a key contribution to the discrimination of difficult-to-distinguish classes is introduced. Based on it, a heterogeneous bit configuration is formed, which ensures high accuracy of critically important parts of the model, allowing aggressive compression of the rest. A plan for experimental validation of the approach on the task of determining the type of vehicle is presented. A significantly better compromise between accuracy and resource intensity of the modified neural network model is expected compared to standard quantization techniques.</p></abstract><trans-abstract xml:lang="en"><p>Model quantization is a key method for deploying high-performance neural network object detectors on resource-constrained devices. However, standard quantization approaches, such as PTQ, QAT, and even mixed-precision methods, optimize the distribution of bits based on the sensitivity of layers, ignoring the semantic specificity of the task. This leads to a significant decrease in accuracy when distinguishing between semantically similar classes, which is critical for many practical applications. The article proposes a new approach to mixed-precision quantization that takes into account the semantics of the task. A metric of semantic significance of network components that make a key contribution to the discrimination of difficult-to-distinguish classes is introduced. Based on it, a heterogeneous bit configuration is formed, which ensures high accuracy of critically important parts of the model, allowing aggressive compression of the rest. A plan for experimental validation of the approach on the task of determining the type of vehicle is presented. A significantly better compromise between accuracy and resource intensity of the modified neural network model is expected compared to standard quantization techniques.</p></trans-abstract><kwd-group xml:lang="ru"><kwd>neural network quantization</kwd><kwd>object recognition</kwd><kwd>embedded systems</kwd><kwd>deep learning</kwd><kwd>model compression</kwd><kwd>adaptive binary coefficient length</kwd><kwd>mixed precision</kwd><kwd>semantic significance metric</kwd></kwd-group><kwd-group xml:lang="en"><kwd>neural network quantization</kwd><kwd>object recognition</kwd><kwd>embedded systems</kwd><kwd>deep learning</kwd><kwd>model compression</kwd><kwd>adaptive binary coefficient length</kwd><kwd>mixed precision</kwd><kwd>semantic significance metric</kwd></kwd-group></article-meta></front><back><ref-list><title>Список литературы</title><ref id="B1"><mixed-citation>1. Ren S., He K., Girshick R., Sun J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks // arXiv. URL: https://arxiv.org/abs/1506.01497 (дата обращения: 11.04.2025).</mixed-citation></ref><ref id="B2"><mixed-citation>2. Redmon J., Farhadi A. YOLOv3: An Incremental Improvement // arXiv. URL: https://arxiv.org/abs/1804.02767 (дата обращения: 11.04.2025).</mixed-citation></ref><ref id="B3"><mixed-citation>3. Liu W., Anguelov D., Erhan D., Szegedy C., Reed S., Fu C., Berg A.C.&amp;nbsp; SSD: Single Shot MultiBox Detector&amp;nbsp;// arXiv. URL: https://arxiv.org/abs/1512.02325 (дата обращения: 11.04.2025)</mixed-citation></ref><ref id="B4"><mixed-citation>4. Tan M., Pang R., Le Q.V. EfficientDet: Scalable and Efficient Object Detection // arXiv. URL: https://arxiv.org/abs/1911.09070 (дата обращения: 11.04.2025).</mixed-citation></ref><ref id="B5"><mixed-citation>5. Chen Y., Krishna T., Emer J.S., Sze V. Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks // ResearchGate. URL: https://www.researchgate.net/publication/292869497_Eyeriss_An_Energy-Efficient_Reconfigurable_Accelerator_for_Deep_Convolutional_Neural_Networks#references (дата обращения: 11.04.2025).</mixed-citation></ref><ref id="B6"><mixed-citation>6. Jacob B., Kligys S., Chen B., Zhu M., Tang M., Howard A., Hartwig A., Kalenichenko D. Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference // arXiv. URL: https://arxiv.org/abs/1712.05877 (дата обращения: 11.04.2025).</mixed-citation></ref><ref id="B7"><mixed-citation>7. Raghuraman Krishnamoorthi. Quantizing deep convolutional networks for efficient inference: A whitepaper // arXiv. URL: https://arxiv.org/abs/1806.08342 (дата обращения: 11.04.2025).</mixed-citation></ref><ref id="B8"><mixed-citation>8. Jouppi, N. P., et al. In-Datacenter Performance Analysis of a Tensor Processing Unit // arXiv. URL: https://arxiv.org/abs/1704.04760 (дата обращения: 11.04.2025).</mixed-citation></ref><ref id="B9"><mixed-citation>9. Ron Banner, Yury Nahshan, Elad Hoffer, Daniel Soudry. Post-training 4-bit quantization of convolution networks for rapid-deployment // arXiv. URL: https://arxiv.org/abs/1810.05723 (дата обращения: 11.04.2025).</mixed-citation></ref><ref id="B10"><mixed-citation>10. Yelysei Bondarenko, Markus Nagel, Tijmen Blankevoort. Understanding and Overcoming the Challenges of Efficient Transformer Quantization // arXiv. URL: https://arxiv.org/abs/2109.12948 (дата обращения: 27.08.2025).</mixed-citation></ref><ref id="B11"><mixed-citation>11. Yaohui Cai, Zhewei Yao, Zhen Dong, Amir Gholami, Michael W. Mahoney, Kurt Keutzer. ZeroQ: A Novel Zero Shot Quantization Framework // arXiv. URL: https://arxiv.org/abs/2001.00281 (дата обращения: 11.04.2025).</mixed-citation></ref><ref id="B12"><mixed-citation>12. Rundong Li, Yan Wang. Fully Quantized Network for Object Detection // ResearchGate. URL: https://www.researchgate.net/publication/334729962_Fully_Quantized_Network_for_Object_Detection (дата обращения: 11.04.2025).</mixed-citation></ref><ref id="B13"><mixed-citation>13. Markus Nagel, Mart van Baalen, Tijmen Blankevoort, Max Welling. Data-Free Quantization Through Weight Equalization and Bias Correction // arXiv. URL: https://arxiv.org/abs/1906.04721 (дата обращения: 11.04.2025).</mixed-citation></ref><ref id="B14"><mixed-citation>14. Migacz, S. 8-bit inference with TensorRT. // GTC 2017.</mixed-citation></ref><ref id="B15"><mixed-citation>15. Shuchang Zhou, Yuxin Wu, Zekun Ni, Xinyu Zhou, He Wen, Yuheng Zou. DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients // arXiv. URL: https://arxiv.org/abs/1606.06160 (дата обращения: 11.04.2025).</mixed-citation></ref><ref id="B16"><mixed-citation>16. Kuan Wang, Zhijian Liu, Yujun Lin, Ji Lin, Song Han. HAQ: Hardware-Aware Automated Quantization with Mixed Precision // arXiv. URL: https://arxiv.org/abs/1811.08886 (дата обращения: 11.04.2025).</mixed-citation></ref><ref id="B17"><mixed-citation>17. Automatic Mixed Precision package &amp;ndash; torch.amp &amp;mdash; PyTorch 2.6 documentation // pytorch.org. URL: https://pytorch.org/docs/stable/amp.html (дата обращения: 10.04.2025).</mixed-citation></ref><ref id="B18"><mixed-citation>18. Steven K. Esser, Jeffrey L. McKinstry, Deepika Bablani, Rathinakumar Appuswamy, Dharmendra S. Modha. Learned Step Size Quantization // arXiv. URL: https://arxiv.org/abs/1902.08153 (дата обращения: 11.04.2025).</mixed-citation></ref><ref id="B19"><mixed-citation>19. Zhen Dong, Zhewei Yao, Yaohui Cai, Daiyaan Arfeen, Amir Gholami, Michael W. Mahoney, Kurt Keutzer. HAWQ-V2: Hessian Aware Trace-Weighted Quantization of Neural Networks // arXiv. URL: https://arxiv.org/abs/1911.03852 (дата обращения: 10.04.2025).</mixed-citation></ref><ref id="B20"><mixed-citation>20. Sambhav R. Jain, Albert Gural, Michael Wu, Chris H. Dick. Trained Quantization Thresholds for Accurate and Efficient Fixed-Point Inference of Deep Neural Networks // arXiv. URL: https://arxiv.org/abs/1903.08066 (дата обращения: 10.04.2025).</mixed-citation></ref><ref id="B21"><mixed-citation>21. Amir Gholami, Sehoon Kim, Zhen Dong, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer. A Survey of Quantization Methods for Efficient Neural Network Inference // arXiv. URL: https://arxiv.org/abs/2103.13630 (дата обращения: 11.04.2025).</mixed-citation></ref></ref-list></back></article>