How to Save All of Your Posts and The Post Comments On Google Plus

So, you want to scrape (save) everything from Google Plus, huh? True, if you don’t have much content you won’t need to do this. In that case, you can just ctrl-a to select it all, then control-c to copy it and control-v to paste it into Microsoft Word… however, even then you’ll lose all of the images when copied.

So, what do I need to do it? Adobe Acrobat if you want it in PDF. Plus Firefox, iMacros and a reasonably fast (8 gigs of RAM and an i5 / similar or better) computer. If you want it converted into DOC format, Corel PDF Fusion as well. Yeah, you can just copy and paste into a text editor and then parse the text with whatever technique you want to if you just want the text for data mining or re-use purposes. However, this is aimed more toward the person who has to close their social network account (Google Plus) before applying for a new job or school or someone who just wants it all saved for record-keeping for sentimental value, scrap-booking or hell, even someone who wants to copy someone’s entire Google Plus stream for a court case like in a divorce court.

First, open Firefox. Install iMacros from their Mozilla addon’s repository or go to the iOpus page: www.iopus.com/imacros and follow the instructions there to install it. Once it’s installed and Firefox has been restarted, using the iOpus instructions, create a new macro, open the editor and paste this first code in. Open your Google Plus page or someone else’s, then run it. It’ll open every “page” by clicking the “more” button. Set the macro to loop at 999 times… it generally won’t take that many, but it might if you have 4,000 paper-pages worth of content to grab. 800 “paper-pages” will take roughly 10 to 15 minutes to completely load if you have a reasonably fast PC and Internet connection (I have a 30 megabit cable modem connection), more if not. Everything after the first <–> and before the second <–>.

<–>

 

SET !ERRORIGNORE YES

TAG POS=1 TYPE=SPAN ATTR=CLASS:np

WAIT SECONDS=1

 

<–>

 

So, that code above will open all of the pages. The code below opens up all the posts that have more than one or two comments. Again, after the first <–> and before the second <–>.

 

<–>

 

SET !ERRORIGNORE YES

TAG POS=1 TYPE=SPAN ATTR=CLASS:a-o<SP>Ip<SP>Yz

WAIT SECONDS=1

 

<–>

 

So, the code above opens up all of the comments. The code below will be the last touch, expanding all of the comments.

 

<–>

 

SET !ERRORIGNORE YES

TAG POS=1 TYPE=SPAN ATTR=TABINDEX:0&&CLASS:a-o<SP>hn<SP>eg&&ROLE:button&&TXT:Expand<SP>this<SP>comment<SP>»

WAIT SECONDS=1

 

<–>

now, this code above is flawed because of the limitation of iMacros. essentially, they have no easy solution to batch jobs that require dynamic content in the macro itself without resorting to other programming languages, which is a problem with EVERY other platform out there… incorporating Regular Expressions for any part of the code would fix it, but that’s not what we’re talking about here. Back on topic, because that’s a limitation, the code has to be longer.

 

you’ll need a list of sequential numbers, as many as you conceive you might have to use for the number of comments you’ll have to expand. On 800 pages worth of content, that was around 400 in my case.

 

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

 

That’s what I mean by list of sequential numbers. Use this generator here at TextMechanic: textmechanic.com/Generate-List-of-Numbers.html , it’s easy and intuitive to use.

Then, use a RegEx text editor (my favorite is EditPad Pro) to add each parts of the code above around the numbers (RegEx replace ^ for before the start of the line, $ for at the end of the line). In iMacros, POS=<number here> tells the macro which instance of what it’s searching for to target. So, POS=1 is the first, POS=2 is the second, POS=3 is the third and so on. It’s a hacky solution, but the only way to go about it with iMacros. Which for any large batch job with iMacros is a severe problem because the editor seems to hate large files. Hell, even after editing the macro in EditPad and saving it with the .IIM extension for iMacros, iMacros prompted me to be sure I wanted to load such a “big” file at all of 2,000 lines of a macro…

 

So it’ll look something like this, but the POS will be numbered 1 through whatever you may need (200 or 1,000? possibly):

 

<–>

 

SET !ERRORIGNORE YES

TAG POS=45 TYPE=SPAN ATTR=TABINDEX:0&&CLASS:a-o<SP>hn<SP>eg&&ROLE:button&&TXT:Expand<SP>this<SP>comment<SP>»
WAIT SECONDS=1
TAG POS=46 TYPE=SPAN ATTR=TABINDEX:0&&CLASS:a-o<SP>hn<SP>eg&&ROLE:button&&TXT:Expand<SP>this<SP>comment<SP>»
WAIT SECONDS=1
TAG POS=47 TYPE=SPAN ATTR=TABINDEX:0&&CLASS:a-o<SP>hn<SP>eg&&ROLE:button&&TXT:Expand<SP>this<SP>comment<SP>»
WAIT SECONDS=1
TAG POS=48 TYPE=SPAN ATTR=TABINDEX:0&&CLASS:a-o<SP>hn<SP>eg&&ROLE:button&&TXT:Expand<SP>this<SP>comment<SP>»

(and so on…)

 

<–>

 

So, after all that’s done, “print” the huge ass webpage that you used iMacros to open up and expand all of the JQuery-run elements on. Use the Adobe Acrobat printing tool. If you want, after that you can open that PDF in Corel PDF Fusion and then save it as a DOC file. Or if you just want the next and not the images or people’s avatar icons next to their posts, you can just copy and paste it all into MS Word or a Text Editor. A note: If you have a lot of pages, Microsoft One Note won’t hack it, but I had limited success copying all of the images and text together, keeping the formatting for the most part as well. From One Note I was able to save as a DOCX file or export as a PDF. However, for accounts with even 1 post every few days and a fair amount (even 15) of commenters on many posts, it would be a poor solution, especially to save EVERYTHING and with the formatting you get when you look at your Google Plus page and post history. And while yes, I could’ve had iMacros save the text entries into a CSV file to use as a spreadsheet or into an organized text file, I wanted to save the entire set of web content pages so that they had the same impact as when originally posted and interacted with, effectively backing up some of the Google Plus experience as well. OCR can in the future pull out some of the text or if you want to save the text separately, ctrl-a and ctrl-c into a word document or text file. Keep in mind word documents are bigger than the PDFs will be. PDFs will be 65ish megabytes for even 800 pages of content (depending on compression and resolution of the capture, that’s what it was after output from Acrobat’s printing to file software) whereas word DOCs created from Corel PDF Fusion around 800 pages are 180-ish megabytes.

The End.

 

Also, here are my thoughts on the best of web scraping and automation platforms:

www.sc3ne.com/useful-software/web-automation-and-scraping-tools-my-top-7/

Web Automation and Scraping Tools (My Top 7)

VelocityScape: www.velocityscape.com/ Selenium: seleniumhq.org/ Awesomium (yeah, that’s it’s name, really): awesomium.com/ Zenno Poster: zennolab.com/en/products/zennoposter/  iOpus iMacros:Â iopus.com/imacros and of course, Python: www.python.org/ For things like scraping URLs or e-mails easily, Scrapebox: www.scrapebox.com And of course there are tons of custom tools and scripts, the majority in PHP, Javascript and Python….

Fixing WordPress Open Directories (Index Of) Security With .htaccess and 404-To-Start

If you’re running wordpress, there’s a VERY LIKELY chance that you have some of your files showing to Google. Especially if you’re running on a shared host and either don’t have a dedicated systems administrator for your wordpress site and server or you aren’t technically savvy yourself. .htaccess file contents (drop in any directory):   … Continue reading

Why no new work to harness cavitation energy with smart material in nuclear reactors?

I wonder why there’s been no new work to use cavitation energy in reactors? I’ve read up on bubble fusion and “the terrors of cavitation” in nuclear coolant systems, but in a secondary or tertiary coolant system, I think cavitation might actually be useful if harnessed as it is the product of some of the … Continue reading

Bulletproof Briefcases (and more)

www.israeli-weapons.com/store/bulletproof/bulletproof.htm   Israeli-Weapons.com offers a bulletproof faux-leather briefcase as well as a bulletproof attache case. The price is $550 each unless you buy 5 or more.

FYI, Best Windows Data Recovery Tool, Hands Down IMO

Stellar Phoenix Windows Data Recovery Professional Edition!   I’m pleased. It looks like I may recover 70%+ of my documents drive that died not too long ago. I tested 10 major softwares other than this including recuva, get data recover my files, easeus data recovery, icare, active file recovery, o&o recovery, runtime getdataback, wondershare data … Continue reading

File Signature Tables Links List

  Was doing RE for some file recovery of 7z files. Not my work but was hella useful. Enjoy. www.garykessler.net/library/file_sigs.html bit.ly/KuE7nw www.itwebsupport.com/blog/category/reverse-engineering www.hexacorn.com/blog/ theinterw3bs.com/wiki/index.php?title=File_Headers mano4.tistory.com/category/Info. www.sxlist.com/TECHREF/language/delphi/swag/ARCHIVES0022.html 7z signature: 37 7A BC AF 27 1C 7z offset: 0 FB EE 28 is TXT, at offset 0

People Are Stupid: We Know Our Nations Have Problems When…

    We know we have a problem with our countries when we’d rather spend time getting to know and following on social networks Google Developers and Game Developers rather than state and national elected officials like senators, congressmen, mayors and our president because those developers are the people who are most intelligent, most interesting … Continue reading

Decibel Level Damage / Decibel Measurement Fallacies

www.gcaudio.com/resources/howtos/loudness.html So, at around 90DB is when you start to get damage to your ears with long-term exposure. My question is, as should be assumed, but isn’t often I think, is that if there are more production-point sources of say, 100DB noise it will do more damage than 1 source or 100DB noise, right? Or … Continue reading

Home Security Re-Hashed

So, two friend of mines (seperate people, different residences) had their houses broken into. This is what I told them. $55 WiFi Pan & Tilt Camera w/ night vision & web interface + they’re smart phone compatible: www.ebay.com/itm/ws/eBayISAPI.dll?ViewItem&item=280835675327#ht_6721wt_1056 — Outdoor WiFi Version $70: www.ebay.com/itm/ws/eBayISAPI.dll?ViewItem&item=280835675327#ht_6721wt_1056 — free online surveillance storage: www.ovh.ie/products/video_surveillance_offers.xml // www.ovh.ie/products/video_surveillance_functioning.xml — that alone … Continue reading