- Thread starter
- #21
To get a list of urls from TIS you need (the list for 1996 lx450 is posted ealier so you need it only if you want another year/model like series 100):
You will need a good text editor and knowledge of regular expressions will help a lot.
1. For every type of publication (there is 15 types ex. Service Bulletins or Repair Manual or Wiring Diagrams - selectable from left frame on TIS website) you save the content of the frame which contains the list of all documents for this type. Pretty much you select one type at a time, click search, click find, and save content of the frame as html (I stress: html content of the frame not the whole parent html document)
2. Extract all lines with 'option' tag from the saved file. Just delete header and footer of the file. May need some adjustments to have one full 'option' tag per line. The 'options' contain links with .pl extensions.
3. Extract the urls from the options tags using 'replace' of your text document. One url per line.
4. Choose a random url from step 3 and try to download it. Use wget or even a browser with javascript turned off may work. You most likely need to be logged to TIS or use current TIS session ID if you use a downloader. Look at the content of the downloaded html and pick up a translated link from a javascript procedure inside the file.
5. Compare the format of urls from the list you prepared at step 3 with the format of a url from step 5. Transform urls from the list accordingly. Use 'replace' command of your text editor. This involves three spots
- url prefix
- file extension (from pl to pdf)
- add the extra directory, usually after a second level at the url path (it always ends with pdf) - this one is tricky and because of it you cannot process all publication types enlarge
6. Concatenate the lists for every publication. Now you have a one big list and you can feed it to wget.
The whole procedure is quick and easy just hard to explain. To write the above took me much more than prepare the list of urls.
You will need a good text editor and knowledge of regular expressions will help a lot.
1. For every type of publication (there is 15 types ex. Service Bulletins or Repair Manual or Wiring Diagrams - selectable from left frame on TIS website) you save the content of the frame which contains the list of all documents for this type. Pretty much you select one type at a time, click search, click find, and save content of the frame as html (I stress: html content of the frame not the whole parent html document)
2. Extract all lines with 'option' tag from the saved file. Just delete header and footer of the file. May need some adjustments to have one full 'option' tag per line. The 'options' contain links with .pl extensions.
3. Extract the urls from the options tags using 'replace' of your text document. One url per line.
4. Choose a random url from step 3 and try to download it. Use wget or even a browser with javascript turned off may work. You most likely need to be logged to TIS or use current TIS session ID if you use a downloader. Look at the content of the downloaded html and pick up a translated link from a javascript procedure inside the file.
5. Compare the format of urls from the list you prepared at step 3 with the format of a url from step 5. Transform urls from the list accordingly. Use 'replace' command of your text editor. This involves three spots
- url prefix
- file extension (from pl to pdf)
- add the extra directory, usually after a second level at the url path (it always ends with pdf) - this one is tricky and because of it you cannot process all publication types enlarge
6. Concatenate the lists for every publication. Now you have a one big list and you can feed it to wget.
The whole procedure is quick and easy just hard to explain. To write the above took me much more than prepare the list of urls.