可以通过 ODBC Text Driver 读取 tab 分割的文本文件,
按文档说明(http://msdn.microsoft.com/en-us/library/ms974559.aspx),对于 tab 分割格式需要建立 schema.ini 来提供文件格式信息,
但是在我的测试无法正确识别:每行数据当作一个字段来读取,而不是4个字段!以下代码,只要将注册表中的默认配置 HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Text\Format 修改为 TabDelimited 就可以分别读取4个字段。还有需要哪个地方配置?
static void Main(string[] args)
{
/*
* 1. Issuse
* Let's say I have 4 fields delimited by tab in each line in the tab-delimited.txt file.
* I also have a schema.ini file in the same folder to indicate the format of the tab-delimited.txt file.
* But it seems the schema.ini file does not override the setting in the system registry(HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Text\Format).
* The text engines can not recognize tab as the delimiter. So it still reads the four fields together into one field.
*
* 2. Solution But Tedious
* Config the default format from CSVDelimited to TabDelimited in HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Text\Format, you will get the expected result.
*/
string dataSourcePath = Environment.CurrentDirectory + "\\";
string dataFileName = "tab-delimited.txt";
string connString = @"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + dataSourcePath + @";Extended Properties=""text;HDR=NO;FMT=TabDelimited""";
OleDbConnection conn = new OleDbConnection(connString);
OleDbCommand cmd = conn.CreateCommand();
cmd.CommandText = String.Format("SELECT * FROM [{0}]", dataFileName);
DataTable dt = new DataTable(Path.GetFileNameWithoutExtension(dataFileName));
OleDbDataAdapter da = new OleDbDataAdapter(cmd);
da.Fill(dt); // output as xml.
dt.WriteXml(dataFileName + ".xml");
// output into current console.
Console.WriteLine("The imported data has {0} column(s):", dt.Columns.Count);
foreach (DataColumn col in dt.Columns)
{
Console.Write(col.ColumnName);
Console.Write("\t");
}
Console.WriteLine(); foreach (DataRow row in dt.Rows)
{
foreach (DataColumn col in dt.Columns)
{
Console.Write(row[col]);
Console.Write("\t");
}
Console.WriteLine();
}
Console.WriteLine();
//
Console.Read();
}
完整测试代码以及样本文件下载:
http://www.cnblogs.com/Files/Jinglecat/ReadSeparatedValuesTest.rar
按文档说明(http://msdn.microsoft.com/en-us/library/ms974559.aspx),对于 tab 分割格式需要建立 schema.ini 来提供文件格式信息,
但是在我的测试无法正确识别:每行数据当作一个字段来读取,而不是4个字段!以下代码,只要将注册表中的默认配置 HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Text\Format 修改为 TabDelimited 就可以分别读取4个字段。还有需要哪个地方配置?
static void Main(string[] args)
{
/*
* 1. Issuse
* Let's say I have 4 fields delimited by tab in each line in the tab-delimited.txt file.
* I also have a schema.ini file in the same folder to indicate the format of the tab-delimited.txt file.
* But it seems the schema.ini file does not override the setting in the system registry(HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Text\Format).
* The text engines can not recognize tab as the delimiter. So it still reads the four fields together into one field.
*
* 2. Solution But Tedious
* Config the default format from CSVDelimited to TabDelimited in HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Text\Format, you will get the expected result.
*/
string dataSourcePath = Environment.CurrentDirectory + "\\";
string dataFileName = "tab-delimited.txt";
string connString = @"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + dataSourcePath + @";Extended Properties=""text;HDR=NO;FMT=TabDelimited""";
OleDbConnection conn = new OleDbConnection(connString);
OleDbCommand cmd = conn.CreateCommand();
cmd.CommandText = String.Format("SELECT * FROM [{0}]", dataFileName);
DataTable dt = new DataTable(Path.GetFileNameWithoutExtension(dataFileName));
OleDbDataAdapter da = new OleDbDataAdapter(cmd);
da.Fill(dt); // output as xml.
dt.WriteXml(dataFileName + ".xml");
// output into current console.
Console.WriteLine("The imported data has {0} column(s):", dt.Columns.Count);
foreach (DataColumn col in dt.Columns)
{
Console.Write(col.ColumnName);
Console.Write("\t");
}
Console.WriteLine(); foreach (DataRow row in dt.Rows)
{
foreach (DataColumn col in dt.Columns)
{
Console.Write(row[col]);
Console.Write("\t");
}
Console.WriteLine();
}
Console.WriteLine();
//
Console.Read();
}
完整测试代码以及样本文件下载:
http://www.cnblogs.com/Files/Jinglecat/ReadSeparatedValuesTest.rar
txt文件内容aa bb cc dd
112 333 3333 335
2 12 33 22schema.ini 内容
[123.txt]
ColNameHeader=True
Format=TabDelimited
MaxScanRows=0
CharacterSet=OEM
Col1=AA
Col2=BB
Col3=CC
Col4=DD
112 333 3333 335
2 12 33 22 schema.ini 内容
[123.txt]
ColNameHeader=True
Format=TabDelimited
MaxScanRows=25
CharacterSet=OEM
Col1=AA Integer
Col2=BB Integer
Col3=CC Integer
Col4=DD Integer
修改了一下你的代码,我这里可以看到结果了string dataSourcePath = @"D:\test";
//string dataFileName = "123.txt";
string connString = @"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + dataSourcePath + @";Extended Properties=""text;HDR=NO;FMT=TabDelimited""";
OleDbConnection conn = new OleDbConnection(connString);
OleDbCommand cmd = conn.CreateCommand();
//cmd.CommandText = String.Format("SELECT * FROM text1.txt", dataFileName);
cmd.CommandText = "SELECT * FROM 123.txt";
//DataTable dt = new DataTable(Path.GetFileNameWithoutExtension(dataFileName));
DataTable dt = new DataTable();
OleDbDataAdapter da = new OleDbDataAdapter(cmd);
da.Fill(dt);
schema.ini 的内容,第一行如何放空?
想弱弱的问一下:如果TXT里面的内容是以 "/"为分割的,那么 Format 的值如何修改!在EXCEL里面改入TXT格式的外部数,提Schema.ini方件失效