Web Scraping JSP Pages

Hi,

I want to scrape some JSP pages
I got the following tools suggestions:

  • Scrapy
  • Selenium
    Etc…

I would like to hear recommendation from people in the community that using those types of tools

If you are writing your own, I like BeautifulSoup module in python for general scraping.

1 Like

Thank you @brolly33 :slight_smile:

@brolly33 does BeautifulSoup support parsing JSP Pages?

As I understand it, you need to work with headless browser to let the JSP page create an HTML page and then you can scrap this page

For example:
https://www.java.com/en/download/manual.jsp

BeautifulSoup parses any page, but the page content first has to be there to parse. JSP generates the content at the server, so straight-up scraping with BeautifulSoup might be fine. But if there’s page content generated after loading – usually by running JavaScript in the browser – that’s when you need a headless browser like Selenium to let the JS run and so that you can scrape the page after it’s finished. Rule of thumb: if you can’t use cURL to get the content you’re targeting, you’ll probably need something like Selenium to load the content before scraping with something like BeautifulSoup. HTH

1 Like