->
Visions based document segmentation is a new patent obtained by Microsoft. The web pages have so much diversity in topics including content, links, labels and ads that search engines will find it really difficult to index and get the pages for the searchers.
This patent considers breaking down the web pages into a number of different blocks having meaning. This is done on the basis of how we see the page. These sections are identified as portions of the page with varying meanings and this might be completely unrelated to one another.
The existence of the blocks will be helpful to the search engine when they look for pages and will decide on the indexing based on the content found in that page. This will help the searchers to get more information on what they were looking for.
The page as per the patent is broken into visual blocks and the differences in the page sections are found out. This is used to create the visual separations between the sections. These blocks and the separators are used to create a content structure for that page. The pages are not directly ranked, but the sections and the blocks of the pages are used for ranking so that, it is easier to find out the right information for the searchers.
Subscribe to our blog to receive new posts and updates by Email
Skipping over a large number of details described in the patent and the papers, but one of the most important takeaways from this patent is that the indexing of content on web pages may be based on parts of pages, rather than the whole page.
So, VIPS, (VIsion based Page Segmentation, is going to break down the quality of the information in the block using visual cues from the page structure.
Like on this page, it would be separated into about 13 sections. The header would be, (as an example), VIPS-01; the next level below, (3 sections separated as SEO Services, Subscribe, and Recommended Reading), we’ll call those VIPS-02,03,04; The next section is the article body and we’ll call that VIPS-05; the next section starts right next to it, (the link sections), and we’ll call it VIPS-06,07,08,09,10,11; The bottom section, (recent visitors, get a feed, and friends would be VIPS-12,13.
Each section will be weighted separately. I need somebody smarter than I am to explain how it really works. I read Microsoft’s whitepaper on it but had to stop at the first sign of math about halfway down.
As I understand it, though, it looks like you can emphasize keywords and use H tags, etc. “Thank the maker” for CSS.
And thanks for the post.
David