Welcome Guest Search | Active Topics | Sign In | Register

Pre-Sales - Web Scraping Options
badger60
Posted: Monday, October 19, 2015 4:20:09 AM
Rank: Newbie
Groups: Member

Joined: 10/19/2015
Posts: 2
Hi
I'd welcome your view on this.
I'm looking at replacing my existing web scraping component (iMacros embedded in my .NET app) with a browser control and to do the scraping myself. iMacros provides a useful abstraction wrapper around the MS browser control making development highly productive, but I'm getting compatibly issues with different O/S and IE versions.

My question is, given I know little about 'real' web scraping via a browser component, and I cant invest loads of time time learning a new object model:
1) Would EO.WebBrowser be a good tool to build a scrapper app for a novice/intermediate .NET programmer?
2) Does EO.WebBrowser have any specific functionality to support scraping?
3) Are there web scraping samples, in VB.Net?
4) I'm assuming it supports the type of functionality I need: My WinForms app dynamically acting as a user, clicking buttons and links, downloading files, entering text, printing pages, etc.

Thanks for your help!
eo_support
Posted: Monday, October 19, 2015 1:47:06 PM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 24,229
Hi,

The two core features that you will need to look into are:

1. Loader interface. This most important one to look into is WebView.LoadUrlAndWait. This allows you to load a page and waits until the loading is done. Since we have the source code of the browser engine, we hook into very low level to keep track of when a request has finished loading and it is much more reliable than a simple "IsLoading" property. For example, when you load page A and page A automatically redirects to page B, we will wait until B finishes loading. In the mean time IsLoading will change from true to false (when loader recieves redirection header entry from A) and then again to true and false (when the loader starts to load B);

http://www.essentialobjects.com/doc/eo.webbrowser.webview.loadurlandwait.aspx

2. JavaScript/.NET interops. We allow you to call any JavaScript code from .NET through EvalScript and allow you to call .NET from JavaScript. You can find more details here:

http://www.essentialobjects.com/doc/webbrowser/advanced/js.aspx
http://www.essentialobjects.com/doc/webbrowser/advanced/jsext.aspx

Using JavaScript interface is more reliable than simulate user event. For example, other solution might provide something like SendMouseClick(x, y) to simulate a mouse click, with our engine you can simple use JavaScript to get the button object and then call the button object's click event.

We do not have any samples specifically for scrapping since this are just one of the many different scnearios our component can be used for and it is not practically for us to produce a sample for every different scenario. However under the hood they all rely on one or more core features like listed above. You can take a look and see if you can use all the core features to acomplish what you need, or if there are any more questions about those features, please feel free to ask.

Thanks!
badger60
Posted: Tuesday, October 20, 2015 2:33:38 AM
Rank: Newbie
Groups: Member

Joined: 10/19/2015
Posts: 2
Hi, thanks for the info, I'm sure JavaScript EvalScript will be useful.

I've browsed the reference section of the help files but I'm having problems knowing where to start. I need to access and search the web pages and interact with the html components like clicking buttons and links, initiate downloads and fill in forms. Could you post some basic code snippets to get me started?

Or perhaps some of your forum members would like to try their hand?

Thanks in advance!
eo_support
Posted: Tuesday, October 20, 2015 10:39:57 AM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 24,229
Hi,

You basically do three things: create the WebView, load the page and and then interact with the page. For creating the WebView see these:

http://www.essentialobjects.com/doc/webbrowser/start/winform.aspx
http://www.essentialobjects.com/doc/webbrowser/start/winform.aspx

To load the page, you would call LoadUrl or LoadUrlAndWait. For interacting with the page (such as search contents, clicking button and links), it's all JavaScript.

Thanks!


You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.