My Tech Life

Memo by a Japanese Software Developer in his late 50s.

Entries from 2024-03-25 to 1 day

Decoding Error of "Windows-1254" in Japanese during Python web scraping.

When decoding UTF-8 text, it is being detected as Windows-1254, resulting in garbled characters. detect_enc = chardet.detect(temp)['encoding'] In that case, you can add an additional condition to handle the case when the text is mistakenly…

PythonのWebスクレイピング時に日本語で「Windows-1254」デコードエラー

UTF-8のテキストをデコードする時に、 Windows-1254と判定され、文字化けする現象。 detect_enc = chardet.detect(temp)['encoding'] 仕方ないので、条件分岐をひとつ増やして対応。 elif detect_enc == 'Windows-1254': detect_enc = 'utf-8' html_content …