Changing from feedparser.py to urllib & simplejson

With all the outages of Twitter recently, my back end system that retrieves the information from twitter was going haywire. Things kept going wrong, tweets were not retrieved, the works.

I initially coded this backend using feedparser, thinking that this code could be user later on to get rss feeds from other sites. It was a mistake to do so – I made that decision thinking that information via atom format or via json format would be similar, but this was not correct. I am *not* saying that feedparser is not good, it’s just not the right tool for the job it has to do !

The atom format that twitter returns (at current date of course – this might very well be fixed later on) is really a hodgepodge of information that is prodded and shaped into the atom format. Lots of info is repeated because really the atom format was made for larger articles of text that need a title intro, a body, etc.

All this means a much larger filesize return – certainly not enormous, but in the long run this adds up in data traffic.

Not all info that you get in json is correct in atom either. Iso_language_code which indicates the language the author primarily uses, was/is set to en-US all the time via atom format.

So with all those outages and checking and finding out that most json queries still returned correct results, I removed the feedparser lines and am now using urllib and simplejson to retrieve and parse the twitter data. Took me about 3 late evenings in a row to work through (I have a full time day job, so I only have time to do this on the train and in the evenings after 9 pm) but it’s running (almost) smooth now.

Still need to weed out a bug in my code, though – the last search does not seem to be processed… grrr.

Geef een antwoord

Het e-mailadres wordt niet gepubliceerd. Vereiste velden zijn gemarkeerd met *