I was looking for a way to get regular updates from a job site about a particular category even though the site doesn’t offer any sort of feed.
Then I stumbled upon a site called Feedyes.com.
What I basically did was to have an RSS feed ready for the site. It’s pretty elementary with the help of the above site really. You don’t even need to register in order to create an RSS feed for a certain site.
Only problem was that I didn’t have the RSS feed in XML format. I had to go to the web site to view so. Also the feed couldn’t really be customized in any ways.
There’s another site named Page2rss.com which does pretty much the same. Mind you none of the above sites are perfect yet they do a reasonable job of it.
Here‘s what I came up with as an RSS feed version of this page. It lets you use ‘search patterns’ using regular expression and ‘output templates’. It’s a handy site even with all its limitations for unpaid package like polling intervals, maximum feed limit etc. Do give it a try.
What it does provide though is an RSS feed for searching blogs. Try this.
There’s another gem I figured which actually lets you run XPath query for scraping into a web page for RSS. It can be used to search in an HTML document in a pretty straightforward way.
Well this has been a very long ride for scraping your way to another site but what if you want to stop others doing the same :). Enough of RSS Scraping, Scavenging, Stealing, and Content Theft, no? Talk about having a dose of one’s own medicine, right?
To wrap things up, do remember there are words like Copyright and Intellectual property / Intellectual Property Protection in the dictionary :). So use it in a positive way and enjoy the Scrapventure!
Update on 9th April, 2009: It was unfair on my part to leave off tools like Yahoo! Pipes and Feedity.com. While Yahoo! Pipes is a less than straightforward means to achieving our objective, it has powerful features like Visual query development which are missing from the rest. But I think what makes Yahoo! Pipes unique is that you can chain together arbitrary number of previous queries (pipes) and thus mash them up into one which would have all your filters/queries. It also provides input facilities. More on Yahoo! Pipes later on subsequent post perhaps when I would guide you through the process. Feedity.com, on the other hand, is a very straightforward means to achieving what we want. It’s quite efficient and intelligent with parsing too. Give it a try.
Update on 16th April, 2009: Microsoft Popfly mashup creator is another candidate for honorable mention