Groups | Blog | Home
all groups > dotnet general > december 2004 >

dotnet general : Remote Webpage data extraction


Deepson Thomas
12/21/2004 9:41:02 PM
HI,
i have a question abt dot net. Let me tell you the sceneraio...i have a
asp.net page written in C#. there is a textbox and a button in the page thatz
it. the functionality i want is that whatever the URL i enter into that
textbox, my code should read that webpage and show that within my page. i
dont want to redirect the user to that page. instead of that when he hits the
submit in the backend my C# code should read that remote webpage and extract
the data and insert into my own page. if i didnt got the images that will do.
but i should all the text data.
Anybody have any idea how to do this. I cant use any third party component

Thanks in advance

Deepson
--
Cor Ligthert
12/22/2004 9:42:20 AM
Deepson,

The trouble with what you are doing is that HTML is a descriptionlanguage
which makes it possible to get text from a lot of resources. So what you see
on a page does not have to be on that page. It can be an url that uses an
url that uses an url.

As well there can be a lot of text in javascript and/including macromedia
pluggins, java pluggins or other pluggins.

That makes it in my opinion very difficult to do what you want.

Probably is some client side script that creates an Iframe in your page a
more proper way to go.

However in this just my thought,

Cor

"Deepson Thomas" <DeepsonThomas@discussions.microsoft.com>

[quoted text, click to view]

Joerg Jooss
12/22/2004 12:48:56 PM
[quoted text, click to view]

You can fetch web pages using either System.Net.WebClient or
System.Net.WebRequest, but this will only work if the page doesn't use
client-side scripting to render itself fully or partially.

Cheers,

--
Joerg Jooss
www.joergjooss.de
news@joergjooss.de

AddThis Social Bookmark Button