In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-09-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
This article mainly shows you "how to solve PHP parsing html class library simple_html_dom transcoding bug", the content is easy to understand, well-organized, hope to help you solve your doubts, the following let Xiaobian lead you to study and learn "how to solve PHP parsing html class library simple_html_dom transcoding bug" this article.
I have been using simple_html_dom to grab some articles these days. The coding of different websites is basically gbk gb2312 utf-8 in China. Most of them were gb2312 and utf-8.
There is a way for my version of simple_html_dom that convert_text looks like this.
/ / PaperG-Function to convert the text from one character set to another if the two sets are not the same.
Function convert_text ($text)
{
Global $debug_object
If (is_object ($debug_object)) {$debug_object- > debug_log_entry (1);}
$converted_text = $text
$sourceCharset = ""
$targetCharset = ""
If ($this- > dom)
{
$sourceCharset = strtoupper ($this- > dom- > _ charset)
$targetCharset = strtoupper ($this- > dom- > _ target_charset)
}
If (is_object ($debug_object)) {$debug_object- > debug_log (3, "source charset:". $sourceCharset. "target charaset:" $targetCharset);}
If (! empty ($sourceCharset) & &! empty ($targetCharset) & & (strcasecmp ($sourceCharset, $targetCharset)! = 0))
{
/ / Check if the reported encoding could have been incorrect and the text is actually already UTF-8
If ((strcasecmp ($targetCharset, 'UTF-8') = = 0) & & ($this- > is_utf8 ($text))
{
$converted_text = $text
}
Else
{
$converted_text = iconv ($sourceCharset, $targetCharset, $text)
}
}
/ / Lets make sure that we don't have that silly BOM issue with any of the utf-8 text we output.
If ($targetCharset = = 'UTF-8')
{
If (substr ($converted_text, 0,3) = "\ xef\ xbb\ xbf")
{
$converted_text = substr ($converted_text, 3)
}
If (substr ($converted_text,-3) = "\ xef\ xbb\ xbf")
{
$converted_text = substr ($converted_text, 0,-3)
}
}
Return $converted_text
}
Let's take a look at this line:
The copy code is as follows:
$converted_text = iconv ($sourceCharset, $targetCharset, $text)
Can cause incorrect transcoding. For example, the text of gb2312 will be translated into:
The copy code is as follows:
24-year-old Han Zhuangzhuang not only got zero penalty points at the 2014 Langqin International Equestrian World Cup Chinese League qualifying match held at the FIFA Equestrian Park on April 26. Zhao Zhiwen, the seventh Olympic rider, scored a zero penalty in 77.07 seconds.
It is an established fact, which proves that the transcoding function has not been handled properly. Because I only want to use this simple_html_dom to build dom. I'm not going to take the time to deal with this bug well. But simply put
The copy code is as follows:
$converted_text = iconv ($sourceCharset, $targetCharset, $text)
Change to
The copy code is as follows:
$converted_text = $text
The above is all the contents of this article "how to solve the transcoding bug of PHP parsing html class library simple_html_dom". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
The market share of Chrome browser on the desktop has exceeded 70%, and users are complaining about
The world's first 2nm mobile chip: Samsung Exynos 2600 is ready for mass production.According to a r
A US federal judge has ruled that Google can keep its Chrome browser, but it will be prohibited from
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
About us Contact us Product review car news thenatureplanet
More Form oMedia: AutoTimes. Bestcoffee. SL News. Jarebook. Coffee Hunters. Sundaily. Modezone. NNB. Coffee. Game News. FrontStreet. GGAMEN
© 2024 shulou.com SLNews company. All rights reserved.