You have posed a technical problem. There are two solutions:
1) chunk the data. I discussed this idea in detail (gory detail) on my blog
quite some time ago.
http://blogs.msdn.com/nickmalik/archive/2004/11/01/250883.aspx 2) don't solve the problem with web services that you write, but rather by
using packaged software. There are literally dozens of applications that
will handle document management for you, especially using the complex and
varied data management requirements you describe.
A very good backgrounder on this topic can be found at:
http://en.wikipedia.org/wiki/Content_management_system (incomplete and not well formatted) Lists of products
http://en.wikipedia.org/wiki/Comparison_of_content_management_systems I suggest that you look into a couple of different products:
Windows SharePoint Server:
http://support.microsoft.com/default.aspx?scid=kb;EN-US;830320 Documentum
http://software.emc.com/products/content_management/content_management.htm There are some open source products in this space as well. I haven't used
any of them and cannot comment on their capabilities, but some are
well-liked, like Alfresco.
Good luck
--
--- Nick Malik [Microsoft]
MCSD, CFPS, Certified Scrummaster
http://blogs.msdn.com/nickmalik Disclaimer: Opinions expressed in this forum are my own, and not
representative of my employer.
I do not answer questions on behalf of my employer. I'm just a
programmer helping programmers.
--
[quoted text, click to view] "Jeff Mason" <je.mason@comcast.net> wrote in message
news:jundf29c3qr4jdtflha21e9ttkeapos5lt@4ax.com...
>I hope I can get some advice on a design/technology question here.
>
> The app is a medical claims case management system. It is written in
> VB.NET and it
> works great. It is a Winforms app primarily; there is a web component,
> but that is
> not relevant to this discussion.
>
> A new requirement has recently surfaced which would require the managment
> of a large
> repository of document files. This repository contains several 100,000
> files of
> various types and sizes. The files arrive by a variety of means, such as
> scanned
> documents, fax images, file uploads, etc. Two file types are TIF image
> files and
> PDF's, both of which have files which can become 200MB or more in size.
> Most files
> are considerably smaller, though.
>
> Desktop users have the need to view, print, edit (e.g Word docs) or
> otherwise access
> one or more files from this repository. They will also add files to it.
> For
> security and audit reasons, the access to these files must be tightly
> controlled. All
> operations are logged. For certain files, editing of their contents is
> allowed. A
> user must "check out" the file for editting and then check it back in when
> they are
> done. The modified file will be added to the repository as a new version.
> While a
> file is checked out, no other user may check out or otherwise access the
> file, though
> they may view prior versions.
>
> Thus, there is a need to be able to efficiently transfer files both to and
> from a
> user's desktop. These transfers would be mediated, presumably, by some
> kind of file
> server/service which would authenticate the user, validate the operation
> being
> performed, create any log data, and transfer the file to/from the
> appropriate
> directory on the server. These requirements seem to suggest that business
> objects
> running on the client will cooperate with objects on a server somewhere to
> record the
> information as appropriate as well as effect the transfer of the files
> themselves.
>
> We discarded the idea of making the files available via direct access to a
> network
> share, since that would violate the security requirements - we can't have
> users
> messing around, outside of the app, in the repository directory tree.
> Though I think
> that would be by far the simplest (and fastest?) approach, we cannot allow
> direct
> access to the files; all access must be monitored and controlled. Indeed
> the users
> aren't really aware that there are files at all - they deal with cases and
> the case's
> supporting documents. They don't know or care what the filenames are.
>
> We have toyed with the idea of using a Web Service for this. The idea is
> that web
> service methods could be called with appropriate arguments for
> authentication as well
> as the operation being performed. For the file involved, a byte array by
> reference
> could be used as a argument to the service call. The byte array would
> "be" the file,
> and it would then be written as a temporary file on the user's local
> machine, or in
> the case of an upload by a user to the server, written to the appropriate
> server
> directory.
>
> We have developed some proof of concept code and it seems quite
> straightforward.
>
> But, the problem with this approach is, I think, the large files. While
> there aren't
> many of them, there are enough of them to force us to deal with them.
> Using Web
> Services means the byte array is serialized into an xml stream, increasing
> the size
> by, what, 50%? That is a significant overhead. Also, that would mean that
> the web
> site running the service would require that 200mb byte array to be
> resident in memory
> while being serialized and transferred, and if we had more than a few
> users doing
> that I suspect the web server would be overwhelmed. Indeed, in some of
> our tests we
> have had "Insufficient Resource" errors on the server when using a Binary
> Reader to
> load a large file into a byte array in preparation for returning that
> array to a
> caller.
>
> Does anyone have any thoughts on how to do this? Perhaps some sort of
> custom
> remoting to transfer the file? If the remoting were hosted in ISS (like
> the
> dataportal), then wouldn't the same resource problems exist with the large
> files? I
> saw an article somewhere (in the MS KB?) that showed how to write a
> service which
> would host the remote object, but isn't there still a problem with
> transferring 200MB
> in one big chunk? How would breaking a file into smaller chunks work using
> single-call remoting and how would that file be reassembled on the user's
> system?
>
> Or maybe somebody has an idea for some other approach entirely?
>
> Thanks for any help or insight anyone can offer.
>
> - jeff
>
> -- Jeff