Hi,
I want to scrape some JSP pages
I got the following tools suggestions:
- Scrapy
- Selenium
Etc…
I would like to hear recommendation from people in the community that using those types of tools
Hi,
I want to scrape some JSP pages
I got the following tools suggestions:
I would like to hear recommendation from people in the community that using those types of tools
If you are writing your own, I like BeautifulSoup module in python for general scraping.
Thank you @brolly33
@brolly33 does BeautifulSoup support parsing JSP Pages?
As I understand it, you need to work with headless browser to let the JSP page create an HTML page and then you can scrap this page
For example:
https://www.java.com/en/download/manual.jsp
BeautifulSoup parses any page, but the page content first has to be there to parse. JSP generates the content at the server, so straight-up scraping with BeautifulSoup might be fine. But if there’s page content generated after loading – usually by running JavaScript in the browser – that’s when you need a headless browser like Selenium to let the JS run and so that you can scrape the page after it’s finished. Rule of thumb: if you can’t use cURL to get the content you’re targeting, you’ll probably need something like Selenium to load the content before scraping with something like BeautifulSoup. HTH