请教高手：关于100万条的大CSV文件，具有容错性的读取方法？

一个CSV文件大概有100万条数据需要处理，用分号；分割，比如下面的例子：
id;name;class;email;tel
000001;zhao;A;[email protected];12345677
000002;qian;A;[email protected];12345678
000003;sun;C;[email protected];12345679
000004;li;B;[email protected];12345680现在需要用程序把他们读出，（然后根据ID插入/更新到另一个系统）问题是100万条数据，用什么方法读取更高效？是一次性加载到一个LIST里面然后在执行插入/更新操作呢？还是一条一条的读取在执行插入/更形操作？还有一个很重要的要求，就是假若插入/更新操作有问题，那么要求系统记录下来出现问题的ID和出现的问题,并继续下一条操作。求源代码（插入/更新操作可以直接用文字注明）

解决方案 »

免费领取超大流量手机卡，每月29元包185G流量+100分钟通话, 中国电信官方发货

用内存映射，比如http://www.cnblogs.com/criedshy/archive/2010/06/13/1757826.html
感觉出错的记录应该还要包括错误条目在文件的偏移
“出现问题的ID和出现的问题”这个比较恶心，你去SQL版咨询下。别的问题点都很容易，先把这个关键点弄掉吧。
        static void Main(string[] args)
        {
            string path = "data.csv";
            try
            {
                StreamReader sr = new StreamReader(path);
                string line;
                while ((line = sr.ReadLine()) != null)
                {
                    string[] arrStr = line.Split(';');
                    if (arrStr.Length != 5)
                    {
                        throw new ApplicationException("列数有误：" + line);
                    }
                    string aaaa = arrStr[0];
                    if(!Regex.IsMatch(aaaa,"^\\d{6}$"))
                    {
                        throw new ApplicationException("id有误：" + arrStr[0]);
                    }
                    if (!Regex.IsMatch(arrStr[2], "^[A-Z]$"))
                    {
                        throw new ApplicationException("class有误：" + arrStr[2]);
                    }
                    if (!Regex.IsMatch(arrStr[3], @"^(\w)+(\.\w+)*@(\w)+((\.\w+)+)$"))
                    {
                        throw new ApplicationException("email有误：" + arrStr[3]);
                    }
                    // ......
                    Console.WriteLine(line);
                }                sr.Dispose();
                sr.Close(); // 关闭流
            }
            catch(Exception ex)
            {
                Console.WriteLine(ex.Message);
            }            Console.ReadKey();
        }
CSV是标准的odbc数据源，就是个比较差的文件型数据库...直接ADO.NET查询就是了，没啥难度...
应该可以，但是和我需求的不太一样，这样吧，把错误的id和message写在 error.csv 里面可以么？比如error.csverror id; error message
2; id有误
1034; email有误
...
楼主，3楼主的方法基本能解决读的问题！
有个问题，如果发现有你所说的错误，需要遍历时改过来吗？
如果需要把改写好好也存到csv里面，就还需要一个写的方法：        public static void WriteCSV(string filePathName,bool append, List<String[]> ls)
        {
            StreamWriter fileWriter=new StreamWriter(filePathName,append,Encoding.Default);
            foreach(String[] strArr in ls)
            {
                fileWriter.WriteLine(String.Join (“;",strArr) );
            }
            fileWriter.Flush();
            fileWriter.Close();

        }
那你把3楼和我的结合起来，就是全部的答案！
不过，我这类也有一个封装好的读的方法，给你参考：        public static List<String[]> ReadCSV(string filePathName)
        {
            List<String[]> ls = new List<String[]>();
            StreamReader fileReader=new   StreamReader(filePathName);
            string strLine="";
            while (strLine != null)
            {
                strLine = fileReader.ReadLine();
                if (strLine != null && strLine.Length>0)
                {
                    ls.Add(strLine.Split(','));
                    //Debug.WriteLine(strLine);
                }
            }
            fileReader.Close();
            return ls;
        }
不对啊，
这个csv文件的第一行是行名，应该被忽略掉的；
而且读取文件出错的时候，记录下出错的id和错误信息，然后继续读取下一行数据。目前的程序直接退出了。
我之前遇到过类似情况，使用的bulk Insert (具体用法你可以百度一下)
例如先把数据在程序里整理了一下，然后写入文件A.txt，然后数据库调用bulkinsert 一次性写入
经测试40万的数据量7秒就搞定
"不对啊，
这个csv文件的第一行是行名，应该被忽略掉的；"
第一行一看就知道是列名，读起来也没关系。 List<String[]> ls；
无非在你使用的时候从1的索引开始！列名称留着，说不定到时还有用途