{"id":744,"date":"2021-12-01T09:44:00","date_gmt":"2021-12-01T00:44:00","guid":{"rendered":"https:\/\/rfsec.ddns.net\/db\/?p=744"},"modified":"2021-12-01T18:26:09","modified_gmt":"2021-12-01T09:26:09","slug":"%e3%82%aa%e3%83%bc%e3%83%97%e3%83%b3%e3%82%bd%e3%83%bc%e3%82%b9%e3%81%ae%e6%96%87%e5%ad%97%e8%aa%8d%e8%ad%98tesseract%e3%82%92%e8%a9%a6%e3%81%97%e3%81%a6%e3%81%bf%e3%82%8b%e3%80%82","status":"publish","type":"post","link":"https:\/\/rfsec.ddns.net\/db\/?p=744","title":{"rendered":"\u30aa\u30fc\u30d7\u30f3\u30bd\u30fc\u30b9\u306e\u6587\u5b57\u8a8d\u8b58tesseract\u3092\u8a66\u3057\u3066\u307f\u308b\u3002"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">\u74b0\u5883<\/p>\n\n\n\n<div class=\"hcb_wrap\"><pre class=\"prism line-numbers lang-plain\"><code>$ uname -a\nLinux ps2 5.10.63-v7l+ #1457 SMP Tue Sep 28 11:26:14 BST 2021 armv7l GNU\/Linux\n\n$ cat \/etc\/os-release\nPRETTY_NAME=&quot;Raspbian GNU\/Linux 10 (buster)&quot;\nNAME=&quot;Raspbian GNU\/Linux&quot;\nVERSION_ID=&quot;10&quot;\nVERSION=&quot;10 (buster)&quot;\nVERSION_CODENAME=buster\nID=raspbian\nID_LIKE=debian<\/code><\/pre><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">\u30a4\u30f3\u30b9\u30c8\u30fc\u30eb\u306e\u53c2\u8003\u306b\u3057\u305f\u30b5\u30a4\u30c8<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/qiita.com\/Unagi_Create\/items\/0cdd7d0d48be5ae87361\" data-type=\"URL\" data-id=\"https:\/\/qiita.com\/Unagi_Create\/items\/0cdd7d0d48be5ae87361\" target=\"_blank\" rel=\"noreferrer noopener\">\u300cRaspberry Pi 3B+\u306b\u304a\u3051\u308b\u3001Tesseract\uff085.0.0 alpha\uff09\u306e\u30a4\u30f3\u30b9\u30c8\u30fc\u30eb\u65b9\u6cd5\u3068\u57fa\u672c\u64cd\u4f5c\u300d<\/a><\/p>\n\n\n\n<div class=\"hcb_wrap\"><pre class=\"prism line-numbers lang-plain\"><code>$ sudo apt-get install tesseract-ocr-script-jpan\n$ tesseract --version\ntesseract 4.0.0\n leptonica-1.76.0\n  libgif 5.1.4 : libjpeg 6b (libjpeg-turbo 1.5.2) : libpng 1.6.36 : libtiff 4.1.0 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0\n\n$  tesseract --list-langs\nList of available languages (3):\nJapanese\neng\nosd\n\n\u65e5\u672c\u8a9e\u306e\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u30c7\u30fc\u30bf\u53d6\u5f97\u3068\u30a4\u30f3\u30b9\u30c8\u30fc\u30eb\uff08\u7cbe\u5ea6\u512a\u5148\uff09\n$ git clone https:\/\/github.com\/tesseract-ocr\/tessdata_best.git\n$ sudo cp tessdata_best\/jpn* \/usr\/share\/tesseract-ocr\/4.00\/tessdata\/\n\n\u8ffd\u52a0\u3057\u305f\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u30c7\u30fc\u30bf\u306e\u78ba\u8a8d\n$  tesseract --list-langs\nList of available languages (5):\nJapanese\neng\njpn\njpn_vert\nosd\n<\/code><\/pre><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">\u30da\u30a4\u30f3\u30c8\u30d7\u30ed\u30b0\u30e9\u30e0\u3067\u66f8\u3044\u305fTEST-JPN.png\u30d5\u30a1\u30a4\u3092\u8a8d\u8b58\u3055\u305b\u3066\u307f\u308b\u3002<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u30d5\u30a9\u30f3\u30c8\u30b5\u30a4\u30ba(20\/12)\u3068\u66f8\u4f53\uff08BOLD)\u306e\u9055\u3044\u3082\u78ba\u8a8d\u3002<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/rfsec.ddns.net\/db\/wp-content\/uploads\/2021\/12\/TEST-JPN.png\" alt=\"\" class=\"wp-image-745\" width=\"368\" height=\"278\" srcset=\"https:\/\/rfsec.ddns.net\/db\/wp-content\/uploads\/2021\/12\/TEST-JPN.png 453w, https:\/\/rfsec.ddns.net\/db\/wp-content\/uploads\/2021\/12\/TEST-JPN-300x226.png 300w\" sizes=\"auto, (max-width: 368px) 100vw, 368px\" \/><\/figure>\n\n\n\n<div class=\"hcb_wrap\"><pre class=\"prism line-numbers lang-plain\"><code> $ time tesseract TEST-JPN.png stdout -l jpn\n0123456789\n0123456789\n\n0123456789\n\n\u672c \u65e5 \u306f \u6674\u5929 ? \u96e8\u964d\u308a \u3067 \u3059 \u3002 \u65e5 \u672c \u8a9e \u306e \u6587\u5b57 \u8a8d\u8b58 \u30c6\u30b9 \u30c8 \u3002\n\n\nreal    1m4.283s\nuser    1m43.622s\nsys     0m3.248s<\/code><\/pre><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">\u3053\u306e\u4f8b\u3067\u306f\u3001\u306a\u3093\u3068\u8a8d\u8b58\u7387\uff11\uff10\uff10\uff05\u3002<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u7f6b\u7dda\u306e\u4e2d\u306e\u6570\u5b57\u3092\u3082\u8a66\u3057\u3066\u307f\u307e\u3057\u305f\u3002(TEST-JPN3.png)<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"453\" height=\"342\" src=\"https:\/\/rfsec.ddns.net\/db\/wp-content\/uploads\/2021\/12\/TEST-JPN3.png\" alt=\"\" class=\"wp-image-756\" srcset=\"https:\/\/rfsec.ddns.net\/db\/wp-content\/uploads\/2021\/12\/TEST-JPN3.png 453w, https:\/\/rfsec.ddns.net\/db\/wp-content\/uploads\/2021\/12\/TEST-JPN3-300x226.png 300w\" sizes=\"auto, (max-width: 453px) 100vw, 453px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">\u8868\u7d44\u306b\u3059\u308b\u3068\u3001\u307e\u3063\u305f\u304f\u8a8d\u8b58\u3055\u308c\u306a\u3044\u7d50\u679c\u306b\uff01<\/p>\n\n\n\n<div class=\"hcb_wrap\"><pre class=\"prism line-numbers lang-bash\" data-lang=\"Bash\"><code> $ time tesseract TEST-JPN3.png stdout -l jpn\nEmpty page!!\nEmpty page!!\n\n\nreal    0m4.568s\nuser    0m2.977s\nsys     0m1.382s<\/code><\/pre><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">\u8868\u3092\u8a8d\u8b58\u3055\u305b\u308b\u8a18\u4e8b\u304c\u3042\u3063\u305f\u306e\u3067\u3001\u3053\u3061\u3089\u3082\u8a66\u3057\u3066\u307f\u305f\u304c\u3001<s>pytesseract.image_to_string\u3092\u547c\u3073\u51fa\u3057\u305f\u5f8c\u3067\u3001\u623b\u3063\u3066\u3053\u306a\u3044\u5370\u8c61\u3002<\/s>  \u623b\u3063\u3066\u304d\u3066\u3044\u307e\u3059\u304c\u3001\u7a7a\u6587\u5b57\u5217\u3068\u306a\u3063\u3066\u3001\u8a8d\u8b58\u306b\u5931\u6557\u3057\u3066\u3044\u308b\u3088\u3046\u3067\u3059\u3002\u524d\u8ff0\u306e\u30b3\u30de\u30f3\u30c9\u30e9\u30a4\u30f3\u3067\u8a8d\u8b58\u3067\u304d\u305f\u753b\u50cf\u306f\u3001python\u306e\u30b3\u30fc\u30c9\u3067\u3082\u554f\u984c\u306a\u304f\u8a8d\u8b58\u3002<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/chuckischarles.hatenablog.com\/entry\/2018\/11\/10\/095747\" data-type=\"URL\" data-id=\"https:\/\/chuckischarles.hatenablog.com\/entry\/2018\/11\/10\/095747\" target=\"_blank\" rel=\"noreferrer noopener\">\u300c\u300e\u8868\u306e\u6587\u5b57\u300f\u3068\u300e\u6b04\u5916\u306e\u6587\u5b57\u300f\u306e\u8a8d\u8b58(Python + Tesseract)\u300d<\/a><\/p>\n\n\n\n<div class=\"hcb_wrap\"><pre class=\"prism line-numbers lang-python\" data-lang=\"Python\"><code>\u3053\u3053\u304b\u3089\u3001jupyter notebook\u3067\u3000pytesseract \u3092\u30a4\u30f3\u30b9\u30c8\u30fc\u30eb\n!pip install pytesseract \n\nLooking in indexes: https:\/\/pypi.org\/simple, https:\/\/www.piwheels.org\/simple\nCollecting pytesseract\n  Downloading https:\/\/www.piwheels.org\/simple\/pytesseract\/pytesseract-0.3.8-py2.py3-none-any.whl (14 kB)\nRequirement already satisfied: Pillow in \/home\/mars\/.pyenv\/versions\/3.7.3\/lib\/python3.7\/site-packages (from pytesseract) (8.3.1)\nInstalling collected packages: pytesseract\nSuccessfully installed pytesseract-0.3.8  <\/code><\/pre><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">\u8a8d\u8b58\u306e\u30c6\u30b9\u30c8\u30b3\u30fc\u30c9<\/p>\n\n\n\n<div class=\"hcb_wrap\"><pre class=\"prism line-numbers lang-python\" data-lang=\"Python\"><code>###############################################################################\n# \u30e9\u30a4\u30d6\u30e9\u30ea\u30a4\u30f3\u30dd\u30fc\u30c8\n###############################################################################\nimport os                       # os \u306e\u60c5\u5831\u3092\u6271\u3046\u30e9\u30a4\u30d6\u30e9\u30ea\nimport pytesseract              # tesseract \u306e python \u7528\u30e9\u30a4\u30d6\u30e9\u30ea\nfrom PIL import Image           # \u753b\u50cf\u51e6\u7406\u30e9\u30a4\u30d6\u30e9\u30ea\nimport matplotlib.pyplot as plt # \u30c7\u30fc\u30bf\u30d7\u30ed\u30c3\u30c8\u7528\u30e9\u30a4\u30d6\u30e9\u30ea\nimport numpy as np              # \u30c7\u30fc\u30bf\u5206\u6790\u7528\u30e9\u30a4\u30d6\u30e9\u30ea\n\nimg=Image.open(&#39;\/home\/mars\/TEST-JPN3.png&#39;)\n# \u753b\u50cf\u3092\u914d\u5217\u306b\u5909\u63db\nim_list = np.array(img)\n \n# \u30c7\u30fc\u30bf\u30d7\u30ed\u30c3\u30c8\u30e9\u30a4\u30d6\u30e9\u30ea\u306b\u8cbc\u308a\u4ed8\u3051\nplt.imshow(im_list)\n \n# \u8868\u793a\nplt.show()\nprint(&#39;Start....&#39;) \n# \u30c6\u30ad\u30b9\u30c8\u62bd\u51fa\ntxt = pytesseract.image_to_string(img,lang=&quot;jpn&quot;)\n \n# \u62bd\u51fa\u3057\u305f\u30c6\u30ad\u30b9\u30c8\u306e\u51fa\u529b\nprint(&#39;Done.&#39;)\nprint(txt)\nprint()<\/code><\/pre><\/div>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>\u67a0\u7dda\u3092\u6d88\u3059\u51e6\u7406\u3092\u8ffd\u52a0<\/strong><\/p>\n\n\n\n<div class=\"hcb_wrap\"><pre class=\"prism line-numbers lang-python\" data-lang=\"Python\"><code>import cv2\nfrom PIL import Image           # \u753b\u50cf\u51e6\u7406\u30e9\u30a4\u30d6\u30e9\u30ea\nfrom matplotlib import pyplot as plt # \u30c7\u30fc\u30bf\u30d7\u30ed\u30c3\u30c8\u7528\u30e9\u30a4\u30d6\u30e9\u30ea\nimport numpy as np              # \u30c7\u30fc\u30bf\u5206\u6790\u7528\u30e9\u30a4\u30d6\u30e9\u30ea\nimport os                       # os \u306e\u60c5\u5831\u3092\u6271\u3046\u30e9\u30a4\u30d6\u30e9\u30ea\nimport pytesseract              # tesseract \u306e python \u7528\u30e9\u30a4\u30d6\u30e9\u30ea\n\n# \u51e6\u7406\u306e\u5bfe\u8c61\nimg = cv2.imread(&quot;\/home\/mars\/TEST-JPN4.png&quot;)\nimg2 = img.copy()\nimg3 = img.copy()\n\n# \u30b0\u30ec\u30fc\u30b9\u30b1\u30fc\u30eb\ngray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)\ngray_list = np.array(gray)\n \n# \u30c7\u30fc\u30bf\u30d7\u30ed\u30c3\u30c8\u30e9\u30a4\u30d6\u30e9\u30ea\u306b\u8cbc\u308a\u4ed8\u3051\n#plt.imshow(gray_list)\n#cv2.imwrite(&quot;calendar_mod.png&quot;, gray)\n\n## \u53cd\u8ee2 \u30cd\u30ac\u30dd\u30b8\u5909\u63db\ngray2 = cv2.bitwise_not(gray)\ngray2_list = np.array(gray2)\n#plt.imshow(gray2_list)\n#cv2.imwrite(&quot;calendar_mod2.png&quot;, gray2)\nlines = cv2.HoughLinesP(gray2, rho=1, theta=np.pi\/360, threshold=80, minLineLength=80, maxLineGap=5)\n\nfor line in lines:\n    x1, y1, x2, y2 = line[0]\n\n    # \u8d64\u7dda\u3092\u5f15\u304f\n    red_lines_img = cv2.line(img2, (x1,y1), (x2,y2), (0,0,255), 3)\n    red_lines_np=np.array( red_lines_img)\n    #cv2.imwrite(&quot;calendar_mod3.png&quot;, red_lines_img)\n\n    # \u7dda\u3092\u6d88\u3059(\u767d\u3067\u7dda\u3092\u5f15\u304f)\n    no_lines_img = cv2.line(img3, (x1,y1), (x2,y2), (255,255,255), 3)\n    no_lines=np.array( no_lines_img)\n    plt.imshow(no_lines)\n    #plt.show()\n    #cv2.imwrite(&quot;calendar_mod4.png&quot;, no_lines_img)\n\nprint(&#39;OCR start...&#39;)\n#txt = pytesseract.image_to_string(img, lang=&quot;jpn&quot;,config=&#39;osd --psm 6&#39;)\ntxt = pytesseract.image_to_string(no_lines_img, lang=&quot;jpn&quot;,config=&#39;--psm 6&#39;)\n\nplt.show()\n#cv2.imwrite(&quot;\/home\/mars\/line_erased.png&quot;,no_lines_img)\nprint(&#39;---------OCR:results-----&#39;)\nprint(txt)\nprint(&#39;---------OCR done.---------&#39;) <\/code><\/pre><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">\u7dda\u304c\u6d88\u3055\u308c\u305f\u753b\u50cf<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"453\" height=\"342\" src=\"https:\/\/rfsec.ddns.net\/db\/wp-content\/uploads\/2021\/12\/line_erased.png\" alt=\"\" class=\"wp-image-757\" srcset=\"https:\/\/rfsec.ddns.net\/db\/wp-content\/uploads\/2021\/12\/line_erased.png 453w, https:\/\/rfsec.ddns.net\/db\/wp-content\/uploads\/2021\/12\/line_erased-300x226.png 300w\" sizes=\"auto, (max-width: 453px) 100vw, 453px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">\u8a8d\u8b58\u7d50\u679c\u3002\u554f\u984c\u306a\u304f\u8a8d\u8b58\u3002\u305f\u3060\u3057\u3001\u4ee5\u4e0a\u306f\u7406\u60f3\u7684\uff08\uff1f\uff09\u306b\u304d\u308c\u3044\u306a\u753b\u50cf\u306e\u5834\u5408\u3002\u30ab\u30e1\u30e9\u3067\u64ae\u5f71\u3057\u305f\u753b\u50cf\u306b\u3064\u3044\u3066\u306f\u3001\u3055\u3089\u306b\u691c\u8a3c\u3084\u51e6\u7406\u306e\u8ffd\u52a0\u304c\u5fc5\u8981\u306b\u306a\u308b\u304b\u3082\uff1f\uff1f<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<div class=\"hcb_wrap\"><pre class=\"prism line-numbers lang-bash\" data-lang=\"Bash\"><code>---------OCR:results-----\n0123456789\n0123456789\n\n0123456789\n\n\u672c \u65e5 \u306f \u6674\u5929 ? \u96e8\u964d\u308a \u3067 \u3059 \u3002 \u65e5 \u672c \u8a9e \u306e \u6587\u5b57 \u8a8d\u8b58 \u30c6\u30b9 \u30c8 \u3002\n\f\n---------OCR done.-------<\/code><\/pre><\/div>\n\n\n\n<h1 class=\"wp-block-heading\"><\/h1>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n","protected":false},"excerpt":{"rendered":"<p>\u74b0\u5883 \u30a4\u30f3\u30b9\u30c8\u30fc\u30eb\u306e\u53c2\u8003\u306b\u3057\u305f\u30b5\u30a4\u30c8 \u300cRaspberry Pi 3B+\u306b\u304a\u3051\u308b\u3001Tesseract\uff085.0.0 alpha\uff09\u306e\u30a4\u30f3\u30b9\u30c8\u30fc\u30eb\u65b9\u6cd5\u3068\u57fa\u672c\u64cd\u4f5c\u300d \u30da\u30a4\u30f3\u30c8\u30d7\u30ed\u30b0\u30e9\u30e0\u3067\u66f8\u3044\u305fTEST-JPN.png\u30d5\u30a1\u30a4\u3092\u8a8d\u8b58 [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[1],"tags":[],"class_list":["post-744","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"featured_image_src":null,"author_info":{"display_name":"mars","author_link":"https:\/\/rfsec.ddns.net\/db\/?author=1"},"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/rfsec.ddns.net\/db\/index.php?rest_route=\/wp\/v2\/posts\/744","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/rfsec.ddns.net\/db\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/rfsec.ddns.net\/db\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/rfsec.ddns.net\/db\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/rfsec.ddns.net\/db\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=744"}],"version-history":[{"count":10,"href":"https:\/\/rfsec.ddns.net\/db\/index.php?rest_route=\/wp\/v2\/posts\/744\/revisions"}],"predecessor-version":[{"id":765,"href":"https:\/\/rfsec.ddns.net\/db\/index.php?rest_route=\/wp\/v2\/posts\/744\/revisions\/765"}],"wp:attachment":[{"href":"https:\/\/rfsec.ddns.net\/db\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=744"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/rfsec.ddns.net\/db\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=744"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/rfsec.ddns.net\/db\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=744"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}