Thats the problem - the text isn't quoted. So I'm using meta-data definitions
what sort of char patterns to expect in each field. But where the field is
and potentially includes the row delimiter (\r\n) in its data. As far as I
therefore, is to find the start of the next row. Originally I was using the
time. I can't think of anything else to do - apart from drowning the person
responsible for the data format. Won't solve the problem but I'll feel a lot
"shriop" wrote:
> I'm pretty sure this is right in line with what my parser is made to
> handle. If you find it can't handle it, then I'd like to see an example
> of the data so I can add the functionality to handle it. You can set
> the delimiter to anything you want. And end of row is commonly ended
> with a /r/n, so it handles that. the /r/n in the data is no problem as
> long as there are double quotes around the field itself, so the parser
> has something to base the logic on and know that this is data, and not
> the end of a row. I've got a free demo version that you can try it out
> with. You can also feel free to email me directly with any questions.
> This kind of data is why I generally recommend people not try to use
> regular expressions to parse the data, the expression just gets too
> hairy, leading to poor performance.
>
> Here's what I'm picturing you're data looking like, and what it can
> handle, although I don't know what specific delimiter you're using.
>
> 1,Bruce,Dunwiddie,"in here, I can have commas and
> \r\n"\r\n2,Bob,Jones,some other quote here
>
>
> Tim wrote:
> > When I said the csv data was rough, I meant it! As an example: it uses a
> > specific configurable field delimiter for all the fields in a record except
> > the last field in the record which uses a \r\n. BUT ... the last field (and
> > in fact any other field) may also contain \r\n in its data! My solution to
> > this was to use the regex definition of the format of the first field in each
> > record to find the end of the last field. Would I be able to handle this with
> > your csv parser?
> > Thanks
> > Tim
> >
> > "shriop" wrote:
> >
> > > Lots of info here, but I'll try to split it out.
> > >
> > > Regex's are a notoriously bad performance wise way to accomplish what
> > > you're trying to do.
> > >
http://www.codeproject.com/cs/database/CsvReader.asp > > >
http://www.csvreader.com/csv_benchmarks.html > > >
> > > I would suggest you either implement the code yourself using raw char
> > > array handling to find the commas, etc, etc, or purchase a commercial
> > > csv parser like the one I sell,
http://www.csvreader.com .
> > >
> > > bcp is not what I would recommend for this because it does not properly
> > > handle escape sequences, and does not necessarily have a performance
> > > benefit that I've seen over dts. I would recommend calling a dts job
> > > instead to import this data. If you don't need to do any data
> > > manipulation to the original csv file, then there's no reason to even
> > > do any csv parsing, just let dts do it for you. If you do want to try
> > > the batching of insert statements like one of the previous comments
> > > suggested, make sure you start a transaction before doing the chunks of
> > > inserts, then commit at the end of a chunk. Otherwise, you'll get a
> > > performance hit as the db tries to commit after every single insert.
> > >
> > > Scott wrote:
> > > > Tim wrote:
> > > >
> > > > > db. It all works fine, but the performance is just not good enough -
> > > > > I need to at least halve the time its taking to process the data.
> > > > > I've profiled the code and it looks like the overhead is more or less
> > > > > evenly split between the reading/parsing on the one hand and the data
> > > > > loading/bcp/Win32 api marshalling on the other hand - so I'm looking
> > > >
> > > > Are you using the Compile flag on the regex?
> > > >
> > > > I would also try to insert to the database directly from your code by
> > > > batching the statements: insert xxx xxxx xxx;insert xxx xxxx xxx;insert
> > > > xxx xxxx xxx;... I would start by batching into groups of a 100 and
> > > > then try to tweak from there.
> > > >
> > > > I've had very good performance success with these two strategies.
> > > >
> > > > Scott C.
> > >
> > >
>