解决方案 »
- php-5.3.8-nts-Win32-VC9-x86.zip 如何手动安装在 iis5.1上面
- 错误码2147483647
- 小白问题怎么调用php....
- input框内输入内容,怎么实现回车后自动搜索数据库返回其他input框的内容
- 解答下列2段代码的不同之外以及优缺点
- Windows下PHP开发环境搭建求助!
- JS如何获得Ajax的值返回出来?为何在函数外无法获取到,但是是有的
- 一个基本的正则,为什么出错?
- 为什么我用不了获得ip的一大堆方法?:( 55555...
- 奇怪的cookie问题
- PHP 设置中文COOKIE乱码,英文未乱码,(UTF-8)难道PHP 不能设置中文的COOKIE??
- php直接发送短信
我这里的2万多张这样的网页,
我现在是要将这些网页中的某一段信息存入mysql的数据库中。直接说就是我是采集其它网站的信息,它们使用的是gb2312,我采集这些网页中的相关信息也得用gb2312,
不然出现乱码(不知道我这样说是不是有一点绝对了)。而我的网站是用utf-8,mysql数据为也是采用utf-8
所以,我现在用php读取我采集得到的这些网页文件,
并将它们当中的信息转换成utf-8然后,写入数据库中
(不转换 insert into 时出错,转换了可以写入,但是乱码 )。
我使用的转换是上述帖出来代码,“gb2312.txt”我也将它帖出来让大家看一下# gb2312.txt --
#
# GB2312 to Unicode table (modified)
# from:
# http://tcl.apache.org/sources/tcl/tools/encoding/gb2312.txt
# ftp://ftp.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/GB/GB2312.TXT
#
# Copyright (c) 1998-1999 by Scriptics Corporation.
#
# See the file "license.terms" for information on usage and redistribution
# of this file, and for a DISCLAIMER OF ALL WARRANTIES.
#
# RCS: @(#) $Id: gb2312.txt,v 1.2 1999/04/16 00:47:55 stanton Exp $
#
# NOTE: this table has been modified to include the 7-bit ASCII
# characters that are allowed in GB2312 files.
#
#
# Name: GB2312-80 to Unicode table (complete, hex format)
# Unicode version: 1.1
# Table version: 0.0d2
# Table format: Format A
# Date: 6 December 1993
# Author: Glenn Adams <[email protected]>
# John H. Jenkins <[email protected]>
#
# Copyright (c) 1991-1994 Unicode, Inc. All Rights reserved.
#
# This file is provided as-is by Unicode, Inc. (The Unicode Consortium).
# No claims are made as to fitness for any particular purpose. No
# warranties of any kind are expressed or implied. The recipient
# agrees to determine applicability of information provided. If this
# file has been provided on magnetic media by Unicode, Inc., the sole
# remedy for any claim will be exchange of defective media within 90
# days of receipt.
#
# Recipient is granted the right to make copies in any form for
# internal distribution and to freely use the information supplied
# in the creation of products supporting Unicode. Unicode, Inc.
# specifically excludes the right to re-distribute this file directly
# to third parties or other organizations whether for profit or not.
#
# General notes:
#
# This table contains the data Metis and Taligent currently have on how
# GB2312-80 characters map into Unicode.
#
# Format: Three tab-separated columns
# Column #1 is the GB2312 code (in hex as 0xXXXX)
# Column #2 is the Unicode (in hex as 0xXXXX)
# Column #3 the Unicode name (follows a comment sign, '#')
# The official names for Unicode characters U+4E00
# to U+9FA5, inclusive, is "CJK UNIFIED IDEOGRAPH-XXXX",
# where XXXX is the code point. Including all these
# names in this file increases its size substantially
# and needlessly. The token "<CJK>" is used for the
# name of these characters. If necessary, it can be
# expanded algorithmically by a parser or editor.
#
# The entries are in GB2312 order
#
# The following algorithms can be used to change the hex form
# of GB2312 to other standard forms:
#
# To change hex to EUC form, add 0x8080
# To change hex to kuten form, first subtract 0x2020. Then
# the high and low bytes correspond to the ku and ten of
# the kuten form. For example, 0x2121 -> 0x0101 -> 0101;
# 0x777E -> 0x575E -> 8794
#
# Any comments or problems, contact <[email protected]>
#
#
太长了,帖不完,下面是它的网址:
http://www.g569.com/special/gb2312.txt
写漏了点,关闭缓存后,用 iconv将获取的缓冲区内容编码转移
若实在想转码,则在插入数据库前作 $text = iconv('gbk', 'utf-8', $text);
也是用你们提示的iconv()转换了一下,已经写入数据库了,
现在剩下的工作就是生成静态网页了。
接分吧!